* Re: [PATCH 3/3] [POWERPC] Add docs for Freescale DMA & DMA channel device tree nodes
From: Timur Tabi @ 2007-11-22 0:51 UTC (permalink / raw)
To: Scott Wood, Kumar Gala, linuxppc-dev
In-Reply-To: <20071121222822.GB19445@localhost.localdomain>
David Gibson wrote:
> Indeed, indexing or writing into shared registers is exactly what
> cell-index is for.
I don't care whether it's cell-index or device-id, but I need to know which
DMA controller is #0 and which one is #1, and I need to know which channel is
#0, which one is #1, etc. Dividing register offsets by 0x80 is not
acceptable, because what if we have an elo-plus-plus that has 0x100 bytes per
register, where the additional 0x20 bytes are for enhanced features?
--
Timur Tabi
Linux Kernel Developer @ Freescale
^ permalink raw reply
* Re: [PATCH 3/3] [POWERPC] Add docs for Freescale DMA & DMA channel device tree nodes
From: Timur Tabi @ 2007-11-22 0:49 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20071121173540.GC4413@loki.buserror.net>
Scott Wood wrote:
> I don't see any justification for having such a property in the parent node,
> though.
The SSI needs to know which DMA controller is #0 and which one is #1.
I literally program the SSI and the GUTS registers with the DMA controller and
channels numbers. I need to know which one is which!
--
Timur Tabi
Linux Kernel Developer @ Freescale
^ permalink raw reply
* Re: [PATCH 3/3] [POWERPC] Add docs for Freescale DMA & DMA channel device tree nodes
From: Timur Tabi @ 2007-11-22 0:48 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev
In-Reply-To: <E9CED991-4E10-44AE-A446-B6CB151FB9EE@kernel.crashing.org>
Kumar Gala wrote:
>> Shouldn't we put some text somewhere that we're calling it the Elo
>> controller even though that word isn't used in the reference manual?
>
> we don't really have a place to put that. its effectively documented
> right here.
I still think we need something. Otherwise, people are going to be confused.
I know I would. I'd be searching the RM for the string "ELO" and wonder why
it wasn't there.
>>> + Example:
>>> + dma@21000 {
>>
>> Shouldn't this be dma@21300?
>
> its an example that has not basis is reality :)
Eh?
>> The DMA controller and the DMA channels need a "device-id", so that
>> they can be identified by number. Some peripherals, like the SSI, can
>> only use the controller and channel number. This is what I have in my
>> 8610 DTS:
>
> Why not use reg for this? I don't see any reason to add another "unique
> id" when there is already one.
There isn't one. Why should the driver assume that reg/80 == channel #?
Besides, I still can't differentiate between DMA controller 0 and DMA
controller 1 that way. No, we need a device-id.
--
Timur Tabi
Linux Kernel Developer @ Freescale
^ permalink raw reply
* dtc: Merge refs and labels into single "markers" list
From: David Gibson @ 2007-11-22 0:37 UTC (permalink / raw)
To: Jon Loeliger; +Cc: linuxppc-dev
Currently, every 'data' object, used to represent property values, has
two lists of fixup structures - one for labels and one for references.
Sometimes we want to look at them separately, but other times we need
to consider both types of fixup.
I'm planning to implement string references, where a full path rather
than a phandle is substituted into a property value. Adding yet
another list of fixups for that would start to get messy. So, this
patch merges the "refs" and "labels" lists into a single list of
"markers", each of which has a type field indicating if it represents
a label or a phandle reference. String references or any other new
type of in-data marker will then just need a new type value - merging
data blocks and other common manipulations will just work.
While I was at it I made some cleanups to the handling of fixups which
simplify things further.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
Applies after my patch with the new checking/fixup infrastructure.
checks.c | 22 +++++-------
data.c | 101 ++++++++++++++++-------------------------------------------
dtc-parser.y | 11 +++---
dtc.h | 24 +++++++++-----
flattree.c | 11 +++---
treesource.c | 62 ++++++++++++++++++------------------
6 files changed, 98 insertions(+), 133 deletions(-)
Index: dtc/checks.c
===================================================================
--- dtc.orig/checks.c 2007-11-22 10:04:46.000000000 +1100
+++ dtc/checks.c 2007-11-22 10:30:40.000000000 +1100
@@ -224,33 +224,29 @@
static void fixup_references(struct check *c, struct node *dt,
struct node *node, struct property *prop)
{
- struct fixup *f = prop->val.refs;
+ struct marker *m = prop->val.markers;
struct node *refnode;
cell_t phandle;
- while (f) {
- if (f->ref[0] == '/') {
+ for_each_marker_of_type(m, REF_PHANDLE) {
+ if (m->ref[0] == '/') {
/* Reference to full path */
- refnode = get_node_by_path(dt, f->ref);
+ refnode = get_node_by_path(dt, m->ref);
if (! refnode)
FAIL(c, "Reference to non-existent node \"%s\"\n",
- f->ref);
+ m->ref);
} else {
- refnode = get_node_by_label(dt, f->ref);
+ refnode = get_node_by_label(dt, m->ref);
if (! refnode)
FAIL(c, "Reference to non-existent node label \"%s\"\n",
- f->ref);
+ m->ref);
}
phandle = get_node_phandle(dt, refnode);
- assert(f->offset + sizeof(cell_t) <= prop->val.len);
+ assert(m->offset + sizeof(cell_t) <= prop->val.len);
- *((cell_t *)(prop->val.val + f->offset)) = cpu_to_be32(phandle);
-
- prop->val.refs = f->next;
- fixup_free(f);
- f = prop->val.refs;
+ *((cell_t *)(prop->val.val + m->offset)) = cpu_to_be32(phandle);
}
}
CHECK(references, NULL, NULL, fixup_references, NULL, ERROR,
Index: dtc/data.c
===================================================================
--- dtc.orig/data.c 2007-11-21 17:55:50.000000000 +1100
+++ dtc/data.c 2007-11-22 10:51:09.000000000 +1100
@@ -20,28 +20,16 @@
#include "dtc.h"
-void fixup_free(struct fixup *f)
-{
- free(f->ref);
- free(f);
-}
-
void data_free(struct data d)
{
- struct fixup *f, *nf;
+ struct marker *m, *nm;
- f = d.refs;
- while (f) {
- nf = f->next;
- fixup_free(f);
- f = nf;
- }
-
- f = d.labels;
- while (f) {
- nf = f->next;
- fixup_free(f);
- f = nf;
+ m = d.markers;
+ while (m) {
+ nm = m->next;
+ free(m->ref);
+ free(m);
+ m = nm;
}
assert(!d.val || d.asize);
@@ -214,37 +202,29 @@
return d;
}
-void fixup_merge(struct fixup **fd, struct fixup **fd2, int d1_len)
+struct data data_append_markers(struct data d, struct marker *m)
{
- struct fixup **ff;
- struct fixup *f, *f2;
-
- /* Extract d2's fixups */
- f2 = *fd2;
- *fd2 = NULL;
-
- /* Tack them onto d's list of fixups */
- ff = fd;
- while (*ff)
- ff = &((*ff)->next);
- *ff = f2;
-
- /* And correct them for their new position */
- for (f = f2; f; f = f->next)
- f->offset += d1_len;
-
+ struct marker **mp = &d.markers;
+ /* Find the end of the markerlist */
+ while (*mp)
+ mp = &((*mp)->next);
+ *mp = m;
+ return d;
}
struct data data_merge(struct data d1, struct data d2)
{
struct data d;
+ struct marker *m2 = d2.markers;
- d = data_append_data(d1, d2.val, d2.len);
+ d = data_append_markers(data_append_data(d1, d2.val, d2.len), m2);
- fixup_merge(&d.refs, &d2.refs, d1.len);
- fixup_merge(&d.labels, &d2.labels, d1.len);
+ /* Adjust for the length of d1 */
+ for_each_marker(m2)
+ m2->offset += d1.len;
+ d2.markers = NULL; /* So data_free() doesn't clobber them */
data_free(d2);
return d;
@@ -294,42 +274,17 @@
return data_append_zeroes(d, newlen - d.len);
}
-struct data data_add_fixup(struct data d, char *ref)
+struct data data_add_marker(struct data d, enum markertype type, char *ref)
{
- struct fixup *f;
- struct data nd;
+ struct marker *m;
- f = xmalloc(sizeof(*f));
- f->offset = d.len;
- f->ref = ref;
- f->next = d.refs;
+ m = xmalloc(sizeof(*m));
+ m->offset = d.len;
+ m->type = type;
+ m->ref = ref;
+ m->next = NULL;
- nd = d;
- nd.refs = f;
-
- return nd;
-}
-
-struct data data_add_label(struct data d, char *label)
-{
- struct fixup *f, **p;
- struct data nd;
-
- f = xmalloc(sizeof(*f));
- f->offset = d.len;
- f->ref = label;
-
- nd = d;
- p = &nd.labels;
-
- /* adding to end keeps them sorted */
- while (*p)
- p = &((*p)->next);
-
- f->next = *p;
- *p = f;
-
- return nd;
+ return data_append_markers(d, m);
}
int data_is_one_string(struct data d)
Index: dtc/dtc-parser.y
===================================================================
--- dtc.orig/dtc-parser.y 2007-11-21 17:55:50.000000000 +1100
+++ dtc/dtc-parser.y 2007-11-22 10:24:17.000000000 +1100
@@ -194,7 +194,7 @@
}
| propdata DT_LABEL
{
- $$ = data_add_label($1, $2);
+ $$ = data_add_marker($1, LABEL, $2);
}
;
@@ -209,7 +209,7 @@
}
| propdataprefix DT_LABEL
{
- $$ = data_add_label($1, $2);
+ $$ = data_add_marker($1, LABEL, $2);
}
;
@@ -224,11 +224,12 @@
}
| celllist DT_REF
{
- $$ = data_append_cell(data_add_fixup($1, $2), -1);
+ $$ = data_append_cell(data_add_marker($1, REF_PHANDLE,
+ $2), -1);
}
| celllist DT_LABEL
{
- $$ = data_add_label($1, $2);
+ $$ = data_add_marker($1, LABEL, $2);
}
;
@@ -262,7 +263,7 @@
}
| bytestring DT_LABEL
{
- $$ = data_add_label($1, $2);
+ $$ = data_add_marker($1, LABEL, $2);
}
;
Index: dtc/dtc.h
===================================================================
--- dtc.orig/dtc.h 2007-11-22 10:04:46.000000000 +1100
+++ dtc/dtc.h 2007-11-22 10:52:01.000000000 +1100
@@ -101,23 +101,34 @@
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
/* Data blobs */
-struct fixup {
+enum markertype {
+ REF_PHANDLE,
+ LABEL,
+};
+
+struct marker {
+ enum markertype type;
int offset;
char *ref;
- struct fixup *next;
+ struct marker *next;
};
struct data {
int len;
char *val;
int asize;
- struct fixup *refs;
- struct fixup *labels;
+ struct marker *markers;
};
+
#define empty_data ((struct data){ /* all .members = 0 or NULL */ })
-void fixup_free(struct fixup *f);
+#define for_each_marker(m) \
+ for (; (m); (m) = (m)->next)
+#define for_each_marker_of_type(m, t) \
+ for_each_marker(m) \
+ if ((m)->type == (t))
+
void data_free(struct data d);
struct data data_grow_for(struct data d, int xlen);
@@ -135,8 +146,7 @@
struct data data_append_zeroes(struct data d, int len);
struct data data_append_align(struct data d, int align);
-struct data data_add_fixup(struct data d, char *ref);
-struct data data_add_label(struct data d, char *label);
+struct data data_add_marker(struct data d, enum markertype type, char *ref);
int data_is_one_string(struct data d);
Index: dtc/flattree.c
===================================================================
--- dtc.orig/flattree.c 2007-11-21 17:55:50.000000000 +1100
+++ dtc/flattree.c 2007-11-22 10:04:48.000000000 +1100
@@ -162,12 +162,13 @@
{
FILE *f = e;
int off = 0;
- struct fixup *l;
+ struct marker *m;
- l = d.labels;
- while (l) {
- emit_offset_label(f, l->ref, l->offset);
- l = l->next;
+ m = d.markers;
+ while (m) {
+ if (m->type == LABEL)
+ emit_offset_label(f, m->ref, m->offset);
+ m = m->next;
}
while ((d.len - off) >= sizeof(u32)) {
Index: dtc/treesource.c
===================================================================
--- dtc.orig/treesource.c 2007-11-21 17:55:50.000000000 +1100
+++ dtc/treesource.c 2007-11-22 10:52:59.000000000 +1100
@@ -61,7 +61,7 @@
char *str = val.val;
int i;
int newchunk = 1;
- struct fixup *l = val.labels;
+ struct marker *m = val.markers;
assert(str[val.len-1] == '\0');
@@ -69,10 +69,12 @@
char c = str[i];
if (newchunk) {
- while (l && (l->offset <= i)) {
- assert(l->offset == i);
- fprintf(f, "%s: ", l->ref);
- l = l->next;
+ while (m && (m->offset <= i)) {
+ if (m->type == LABEL) {
+ assert(m->offset == i);
+ fprintf(f, "%s: ", m->ref);
+ }
+ m = m->next;
}
fprintf(f, "\"");
newchunk = 0;
@@ -120,10 +122,9 @@
fprintf(f, "\"");
/* Wrap up any labels at the end of the value */
- while (l) {
- assert (l->offset == val.len);
- fprintf(f, " %s:", l->ref);
- l = l->next;
+ for_each_marker_of_type(m, LABEL) {
+ assert (m->offset == val.len);
+ fprintf(f, " %s:", m->ref);
}
}
@@ -131,14 +132,16 @@
{
void *propend = val.val + val.len;
cell_t *cp = (cell_t *)val.val;
- struct fixup *l = val.labels;
+ struct marker *m = val.markers;
fprintf(f, "<");
for (;;) {
- while (l && (l->offset <= ((char *)cp - val.val))) {
- assert(l->offset == ((char *)cp - val.val));
- fprintf(f, "%s: ", l->ref);
- l = l->next;
+ while (m && (m->offset <= ((char *)cp - val.val))) {
+ if (m->type == LABEL) {
+ assert(m->offset == ((char *)cp - val.val));
+ fprintf(f, "%s: ", m->ref);
+ }
+ m = m->next;
}
fprintf(f, "0x%x", be32_to_cpu(*cp++));
@@ -148,10 +151,9 @@
}
/* Wrap up any labels at the end of the value */
- while (l) {
- assert (l->offset == val.len);
- fprintf(f, " %s:", l->ref);
- l = l->next;
+ for_each_marker_of_type(m, LABEL) {
+ assert (m->offset == val.len);
+ fprintf(f, " %s:", m->ref);
}
fprintf(f, ">");
}
@@ -160,13 +162,14 @@
{
void *propend = val.val + val.len;
char *bp = val.val;
- struct fixup *l = val.labels;
+ struct marker *m = val.markers;
fprintf(f, "[");
for (;;) {
- while (l && (l->offset == (bp-val.val))) {
- fprintf(f, "%s: ", l->ref);
- l = l->next;
+ while (m && (m->offset == (bp-val.val))) {
+ if (m->type == LABEL)
+ fprintf(f, "%s: ", m->ref);
+ m = m->next;
}
fprintf(f, "%02hhx", *bp++);
@@ -176,10 +179,9 @@
}
/* Wrap up any labels at the end of the value */
- while (l) {
- assert (l->offset == val.len);
- fprintf(f, " %s:", l->ref);
- l = l->next;
+ for_each_marker_of_type(m, LABEL) {
+ assert (m->offset == val.len);
+ fprintf(f, " %s:", m->ref);
}
fprintf(f, "]");
}
@@ -188,7 +190,7 @@
{
int len = prop->val.len;
char *p = prop->val.val;
- struct fixup *l;
+ struct marker *m = prop->val.markers;
int nnotstring = 0, nnul = 0;
int nnotstringlbl = 0, nnotcelllbl = 0;
int i;
@@ -205,10 +207,10 @@
nnul++;
}
- for (l = prop->val.labels; l; l = l->next) {
- if ((l->offset > 0) && (prop->val.val[l->offset - 1] != '\0'))
+ for_each_marker_of_type(m, LABEL) {
+ if ((m->offset > 0) && (prop->val.val[m->offset - 1] != '\0'))
nnotstringlbl++;
- if ((l->offset % sizeof(cell_t)) != 0)
+ if ((m->offset % sizeof(cell_t)) != 0)
nnotcelllbl++;
}
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply
* Re: [RFC/PATCH 12/14] powerpc: Add early udbg support for 40x processors
From: David Gibson @ 2007-11-22 0:22 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1195689615.6970.117.camel@pasglop>
On Thu, Nov 22, 2007 at 11:00:15AM +1100, Benjamin Herrenschmidt wrote:
>
> On Thu, 2007-11-22 at 09:58 +1100, David Gibson wrote:
> > On Wed, Nov 21, 2007 at 05:16:30PM +1100, Benjamin Herrenschmidt wrote:
> > > This adds some basic real mode based early udbg support for 40x
> > > in order to debug things more easily
> >
> > Shouldn't we be able to share code with the Maple realmode udbg()?
>
> Do you really care ?
Not very much, no.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply
* Re: [PATCH 12/14] powerpc: Add early udbg support for 40x processors
From: Grant Likely @ 2007-11-22 0:20 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
In-Reply-To: <1195689601.6970.115.camel@pasglop>
On 11/21/07, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> On Wed, 2007-11-21 at 16:47 -0700, Grant Likely wrote:
> > On 11/20/07, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > > This adds some basic real mode based early udbg support for 40x
> > > in order to debug things more easily
> > >
> > > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > > ---
> > > --- linux-work.orig/arch/powerpc/platforms/Kconfig.cputype 2007-11-21 12:50:16.000000000 +1100
> > > +++ linux-work/arch/powerpc/platforms/Kconfig.cputype 2007-11-21 12:50:18.000000000 +1100
> > > @@ -43,6 +43,7 @@ config 40x
> > > bool "AMCC 40x"
> > > select PPC_DCR_NATIVE
> > > select WANT_DEVICE_TREE
> > > + select PPC_UDBG_16550
> >
> > Unfortunately, this isn't always true. The Xilinx Virtex parts us
> > config 40x, but not all FPGA bitstreams have a 16550 serial port.
> > Sometimes it's a uartlite instead.
>
> What does uartlite looks like ?
fixed speed
4 registers: rx, tx, status & control
rx & tx are... well... rx and tx registers
status has a number of bits reporting fifos full/empty etc.
control has three bits; reset tx, reset rx and interrupt enable.
See the top of drivers/serial/uartlite.c
Very simple stuff; but definitely not 16550.
g.
--
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
grant.likely@secretlab.ca
(403) 399-0195
^ permalink raw reply
* Re: [RFC/PATCH 12/14] powerpc: Add early udbg support for 40x processors
From: Benjamin Herrenschmidt @ 2007-11-22 0:00 UTC (permalink / raw)
To: David Gibson; +Cc: linuxppc-dev
In-Reply-To: <20071121225838.GC19445@localhost.localdomain>
On Thu, 2007-11-22 at 09:58 +1100, David Gibson wrote:
> On Wed, Nov 21, 2007 at 05:16:30PM +1100, Benjamin Herrenschmidt wrote:
> > This adds some basic real mode based early udbg support for 40x
> > in order to debug things more easily
>
> Shouldn't we be able to share code with the Maple realmode udbg()?
Do you really care ?
Ben.
^ permalink raw reply
* Re: [PATCH 12/14] powerpc: Add early udbg support for 40x processors
From: Benjamin Herrenschmidt @ 2007-11-22 0:00 UTC (permalink / raw)
To: Grant Likely; +Cc: linuxppc-dev
In-Reply-To: <fa686aa40711211547nf9d58c4t1bc7a512c699cb02@mail.gmail.com>
On Wed, 2007-11-21 at 16:47 -0700, Grant Likely wrote:
> On 11/20/07, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > This adds some basic real mode based early udbg support for 40x
> > in order to debug things more easily
> >
> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > ---
> > --- linux-work.orig/arch/powerpc/platforms/Kconfig.cputype 2007-11-21 12:50:16.000000000 +1100
> > +++ linux-work/arch/powerpc/platforms/Kconfig.cputype 2007-11-21 12:50:18.000000000 +1100
> > @@ -43,6 +43,7 @@ config 40x
> > bool "AMCC 40x"
> > select PPC_DCR_NATIVE
> > select WANT_DEVICE_TREE
> > + select PPC_UDBG_16550
>
> Unfortunately, this isn't always true. The Xilinx Virtex parts us
> config 40x, but not all FPGA bitstreams have a 16550 serial port.
> Sometimes it's a uartlite instead.
What does uartlite looks like ?
Ben.
^ permalink raw reply
* Re: [PATCH 12/14] powerpc: Add early udbg support for 40x processors
From: Grant Likely @ 2007-11-21 23:47 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <20071121061555.55B06DDFA8@ozlabs.org>
On 11/20/07, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> This adds some basic real mode based early udbg support for 40x
> in order to debug things more easily
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> --- linux-work.orig/arch/powerpc/platforms/Kconfig.cputype 2007-11-21 12:50:16.000000000 +1100
> +++ linux-work/arch/powerpc/platforms/Kconfig.cputype 2007-11-21 12:50:18.000000000 +1100
> @@ -43,6 +43,7 @@ config 40x
> bool "AMCC 40x"
> select PPC_DCR_NATIVE
> select WANT_DEVICE_TREE
> + select PPC_UDBG_16550
Unfortunately, this isn't always true. The Xilinx Virtex parts us
config 40x, but not all FPGA bitstreams have a 16550 serial port.
Sometimes it's a uartlite instead.
Cheers,
g.
--
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
grant.likely@secretlab.ca
(403) 399-0195
^ permalink raw reply
* Re: [RFC/PATCH 12/14] powerpc: Add early udbg support for 40x processors
From: David Gibson @ 2007-11-21 22:58 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <20071121061720.83D5ADEBE3@ozlabs.org>
On Wed, Nov 21, 2007 at 05:16:30PM +1100, Benjamin Herrenschmidt wrote:
> This adds some basic real mode based early udbg support for 40x
> in order to debug things more easily
Shouldn't we be able to share code with the Maple realmode udbg()?
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply
* Re: 2.6.24-rc3-mm1- powerpc link failure
From: Stephen Rothwell @ 2007-11-21 22:52 UTC (permalink / raw)
To: Kamalesh Babulal; +Cc: linuxppc-dev, Andrew Morton, Balbir Singh, linux-kernel
In-Reply-To: <4743E706.6010504@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 681 bytes --]
On Wed, 21 Nov 2007 13:36:30 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>
> The kernel build fails on powerpc while linking,
Only for allyesconfig (or maybe some other config that builds a lot of
stuff in.
> AS .tmp_kallsyms3.o
> LD vmlinux.o
> ld: TOC section size exceeds 64k
> make: *** [vmlinux.o] Error 1
>
> The patch posted at http://lkml.org/lkml/2007/11/13/414, solves this
> failure.
However, that patch needs more testing especially to figure out what
performance effects it has. i.e. not for merging, yet.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
* Re: annoying prinkts during vmemmap initialization
From: Christoph Hellwig @ 2007-11-21 22:49 UTC (permalink / raw)
To: Stephen Rothwell; +Cc: linuxppc-dev, Christoph Hellwig
In-Reply-To: <20071122094145.e79e1084.sfr@canb.auug.org.au>
On Thu, Nov 22, 2007 at 09:41:45AM +1100, Stephen Rothwell wrote:
> > Any reason to keep this? And if yes can we please make it conditional
> > on some kind of vmemmap_debug boot option?
>
> These have been changed to pr_debug() in 2.6.24-rc3 kernel.
Ah, sorry for not checking. Looks like the spufs tree lags a little
behind.
^ permalink raw reply
* [PATCH/RFC 6/6]: phyp dump: debugging print routines.
From: Linas Vepstas @ 2007-11-21 22:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Provide some basic debugging support.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepsts <linas@austin.ibm.com>
-----
arch/powerpc/platforms/pseries/phyp_dump.c | 51 +++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:12:21.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:12:46.000000000 -0600
@@ -139,6 +139,51 @@ static unsigned long init_dump_header(st
return addr_offset;
}
+#ifdef DEBUG
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+ printk(KERN_INFO "dump header:\n");
+ /* setup some ph->sections required */
+ printk(KERN_INFO "version = %d\n", ph->version);
+ printk(KERN_INFO "Sections = %d\n", ph->num_of_sections);
+ printk(KERN_INFO "Status = 0x%x\n", ph->status);
+
+ /* No ph->disk, so all should be set to 0 */
+ printk(KERN_INFO "Offset to first section 0x%x\n", ph->first_offset_section);
+ printk(KERN_INFO "dump disk sections should be zero\n");
+ printk(KERN_INFO "dump disk section = %d\n",ph->dump_disk_section);
+ printk(KERN_INFO "block num = %ld\n",ph->block_num_dd);
+ printk(KERN_INFO "number of blocks = %ld\n",ph->num_of_blocks_dd);
+ printk(KERN_INFO "dump disk offset = %d\n",ph->offset_dd);
+ printk(KERN_INFO "Max auto time= %d\n",ph->maxtime_to_auto);
+
+ /*set cpu state and hpte states as well scratch pad area */
+ printk(KERN_INFO " CPU AREA \n");
+ printk(KERN_INFO "cpu dump_flags =%d\n",ph->cpu_data.dump_flags);
+ printk(KERN_INFO "cpu source_type =%d\n",ph->cpu_data.source_type);
+ printk(KERN_INFO "cpu error_flags =%d\n",ph->cpu_data.error_flags);
+ printk(KERN_INFO "cpu source_address =%lx\n",ph->cpu_data.source_address);
+ printk(KERN_INFO "cpu source_length =%lx\n",ph->cpu_data.source_length);
+ printk(KERN_INFO "cpu length_copied =%lx\n",ph->cpu_data.length_copied);
+
+ printk(KERN_INFO " HPTE AREA \n");
+ printk(KERN_INFO "HPTE dump_flags =%d\n",ph->hpte_data.dump_flags);
+ printk(KERN_INFO "HPTE source_type =%d\n",ph->hpte_data.source_type);
+ printk(KERN_INFO "HPTE error_flags =%d\n",ph->hpte_data.error_flags);
+ printk(KERN_INFO "HPTE source_address =%lx\n",ph->hpte_data.source_address);
+ printk(KERN_INFO "HPTE source_length =%lx\n",ph->hpte_data.source_length);
+ printk(KERN_INFO "HPTE length_copied =%lx\n",ph->hpte_data.length_copied);
+
+ printk(KERN_INFO " SRSD AREA \n");
+ printk(KERN_INFO "SRSD dump_flags =%d\n",ph->kernel_data.dump_flags);
+ printk(KERN_INFO "SRSD source_type =%d\n",ph->kernel_data.source_type);
+ printk(KERN_INFO "SRSD error_flags =%d\n",ph->kernel_data.error_flags);
+ printk(KERN_INFO "SRSD source_address =%lx\n",ph->kernel_data.source_address);
+ printk(KERN_INFO "SRSD source_length =%lx\n",ph->kernel_data.source_length);
+ printk(KERN_INFO "SRSD length_copied =%lx\n",ph->kernel_data.length_copied);
+}
+#endif
+
static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
{
int rc;
@@ -154,6 +199,9 @@ static void register_dump_area(struct ph
if (rc)
{
printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc);
+#ifdef DEBUG
+ print_dump_header (ph);
+#endif
}
}
@@ -292,6 +340,9 @@ static int __init phyp_dump_setup(void)
register_dump_area (&phdr, dump_area_start);
goto release_mem;
}
+#ifdef DEBUG
+ print_dump_header (dump_header);
+#endif
/* Don't allow user to release the 256MB scratch area */
phyp_dump_info->init_reserve_size = free_area_length;
^ permalink raw reply
* [PATCH/RFC 5/6]: phyp dump: register the dump area
From: Linas Vepstas @ 2007-11-21 22:43 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Set up the actual dump header, register it with the hypervisor.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
------
arch/powerpc/platforms/pseries/phyp_dump.c | 169 +++++++++++++++++++++++++++--
1 file changed, 163 insertions(+), 6 deletions(-)
Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 15:55:37.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:06:52.000000000 -0600
@@ -30,6 +30,134 @@ struct phyp_dump *phyp_dump_info = &phyp
static int ibm_configure_kernel_dump;
/* ------------------------------------------------- */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+ u32 dump_flags;
+ u16 source_type;
+ u16 error_flags;
+ u64 source_address;
+ u64 source_length;
+ u64 length_copied;
+ u64 destination_address;
+};
+
+struct phyp_dump_header {
+ u32 version;
+ u16 num_of_sections;
+ u16 status;
+
+ u32 first_offset_section;
+ u32 dump_disk_section;
+ u64 block_num_dd;
+ u64 num_of_blocks_dd;
+ u32 offset_dd;
+ u32 maxtime_to_auto;
+ /* No dump disk path string used */
+
+ struct dump_section cpu_data;
+ struct dump_section hpte_data;
+ struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO 0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+ struct device_node *rtas;
+ const unsigned int *sizes;
+ int len;
+ unsigned long cpu_state_size = 0;
+ unsigned long hpte_region_size = 0;
+ unsigned long addr_offset = 0;
+
+ /* Get the required dump region sizes */
+ rtas = of_find_node_by_path("/rtas");
+ sizes = of_get_property(rtas, "ibm,configure-kernel-dump-sizes", &len);
+ if (!sizes || len < 20)
+ return 0;
+
+ if (sizes[0] == 1)
+ cpu_state_size = *((unsigned long *) &sizes[1]);
+
+ if (sizes[3] == 2)
+ hpte_region_size = *((unsigned long *) &sizes[4]);
+
+ /* Set up the dump header */
+ ph->version = DUMP_HEADER_VERSION;
+ ph->num_of_sections = NUM_DUMP_SECTIONS;
+ ph->status = 0;
+
+ ph->first_offset_section =
+ (u32) &(((struct phyp_dump_header *) 0)->cpu_data);
+ ph->dump_disk_section = 0;
+ ph->block_num_dd = 0;
+ ph->num_of_blocks_dd = 0;
+ ph->offset_dd = 0;
+
+ ph->maxtime_to_auto = 0; /* disabled */
+
+ /* The first two sections are mandatory */
+ ph->cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->cpu_data.source_type = DUMP_SOURCE_CPU;
+ ph->cpu_data.source_address = 0;
+ ph->cpu_data.source_length = cpu_state_size;
+ ph->cpu_data.destination_address = addr_offset;
+ addr_offset += cpu_state_size;
+
+ ph->hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->hpte_data.source_type = DUMP_SOURCE_HPTE;
+ ph->hpte_data.source_address = 0;
+ ph->hpte_data.source_length = hpte_region_size;
+ ph->hpte_data.destination_address = addr_offset;
+ addr_offset += hpte_region_size;
+
+ /* This section describes the low kernel region */
+ ph->kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->kernel_data.source_type = DUMP_SOURCE_RMO;
+ ph->kernel_data.source_address = PHYP_DUMP_RMR_START;
+ ph->kernel_data.source_length = PHYP_DUMP_RMR_END;
+ ph->kernel_data.destination_address = addr_offset;
+ addr_offset += ph->kernel_data.source_length;
+
+ return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+ int rc;
+ ph->cpu_data.destination_address += addr;
+ ph->hpte_data.destination_address += addr;
+ ph->kernel_data.destination_address += addr;
+
+ do {
+ rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+ 1, ph, sizeof(struct phyp_dump_header));
+ } while (rtas_busy_delay(rc));
+
+ if (rc)
+ {
+ printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc);
+ }
+}
+
+/* ------------------------------------------------- */
/**
* release_memory_range -- release memory previously lmb_reserved
* @start_pfn: starting physical frame number
@@ -125,7 +253,11 @@ static void release_all (void)
static int __init phyp_dump_setup(void)
{
struct device_node *rtas;
- const int *dump_header;
+ const struct phyp_dump_header *dump_header;
+ unsigned long dump_area_start;
+ unsigned long dump_area_length;
+ unsigned long free_area_length;
+ unsigned long start_pfn, nr_pages;
int header_len = 0;
int rc;
@@ -140,22 +272,47 @@ static int __init phyp_dump_setup(void)
return -ENOSYS;
}
- /* Is there dump data waiting for us? */
+ /* Is there dump data waiting for us? If there isn't,
+ * then register a new dump area, and release all of
+ * the rest of the reserved ram.
+ *
+ * The /rtas/ibm,kernel-dump rtas node is present only
+ * if there is dump data waiting for us.
+ */
rtas = of_find_node_by_path("/rtas");
dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len);
+
+ dump_area_length = init_dump_header (&phdr);
+ free_area_length = phyp_dump_info->init_reserve_size - dump_area_length;
+ dump_area_start = phyp_dump_info->init_reserve_start + free_area_length;
+ dump_area_start = dump_area_start & PAGE_MASK; /* align down */
+ free_area_length = dump_area_start - phyp_dump_info->init_reserve_start;
+
if (dump_header == NULL) {
- release_all();
- return 0;
+ register_dump_area (&phdr, dump_area_start);
+ goto release_mem;
}
+ /* Don't allow user to release the 256MB scratch area */
+ phyp_dump_info->init_reserve_size = free_area_length;
+
/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
rc = subsys_create_file(&kernel_subsys, &rr);
if (rc) {
printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc);
- release_all();
- return 0;
+ goto release_mem;
}
+ /* ToDo: re-register the dump area, for next time. */
+
+ return 0;
+
+release_mem:
+ /* release everything except the top 256 MB scratch area */
+ start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+ nr_pages = PFN_DOWN(free_area_length);
+ release_memory_range(start_pfn, nr_pages);
+
return 0;
}
^ permalink raw reply
* Re: annoying prinkts during vmemmap initialization
From: Stephen Rothwell @ 2007-11-21 22:41 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linuxppc-dev
In-Reply-To: <20071121153526.GA23589@lst.de>
[-- Attachment #1: Type: text/plain, Size: 861 bytes --]
Hi Christoph,
On Wed, 21 Nov 2007 16:35:26 +0100 Christoph Hellwig <hch@lst.de> wrote:
>
> Hi Andi,
>
> your patch 'ppc64: SPARSEMEM_VMEMMAP support' adds the following two lines:
>
> + printk(KERN_WARNING "vmemmap %08lx allocated at %p, "
> + "physical %p.\n", start, p, __pa(p));
>
> in a loop around basically every page. That's a lot of flooding (with
> the wrong printk level, btw) and really slows down booting my cell blade
> a lot (these only have a very slow serial over lan console).
>
> Any reason to keep this? And if yes can we please make it conditional
> on some kind of vmemmap_debug boot option?
These have been changed to pr_debug() in 2.6.24-rc3 kernel.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
* [PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem
From: Linas Vepstas @ 2007-11-21 22:41 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space
finishes dumping a section, it must release that memory
by writing to sysfs. For example,
echo "0x40000000 0x10000000" > /sys/kernel/release_region
will release 256MB starting at the 1GB. The released memory
becomes free for general use.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
------
arch/powerpc/platforms/pseries/phyp_dump.c | 101 +++++++++++++++++++++++++++--
1 file changed, 96 insertions(+), 5 deletions(-)
Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:15:05.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:24:30.000000000 -0600
@@ -12,17 +12,24 @@
*/
#include <linux/init.h>
+#include <linux/kobject.h>
#include <linux/mm.h>
+#include <linux/of.h>
#include <linux/pfn.h>
#include <linux/swap.h>
+#include <linux/sysfs.h>
#include <asm/page.h>
#include <asm/phyp_dump.h>
+#include <asm/rtas.h>
/* Global, used to communicate data between early boot and late boot */
static struct phyp_dump phyp_dump_global;
struct phyp_dump *phyp_dump_info = &phyp_dump_global;
+static int ibm_configure_kernel_dump;
+
+/* ------------------------------------------------- */
/**
* release_memory_range -- release memory previously lmb_reserved
* @start_pfn: starting physical frame number
@@ -52,18 +59,102 @@ release_memory_range(unsigned long start
}
}
-static int __init phyp_dump_setup(void)
+/* ------------------------------------------------- */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ * "echo <start addr> <length> > /sys/kernel/release_region"
+ *
+ * Example:
+ * "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t
+store_release_region(struct kset *kset, const char *buf, size_t count)
{
+ unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
+ ssize_t ret;
- /* If no memory was reserved in early boot, there is nothing to do */
- if (phyp_dump_info->init_reserve_size == 0)
- return 0;
+ ret = sscanf(buf, "%lx %lx", &start_addr, &length);
+ if (ret != 2)
+ return -EINVAL;
+
+ /* Range-check - don't free any reserved memory that
+ * wasn't reserved for phyp-dump */
+ if (start_addr < phyp_dump_info->init_reserve_start)
+ start_addr = phyp_dump_info->init_reserve_start;
+
+ end_addr = phyp_dump_info->init_reserve_start +
+ phyp_dump_info->init_reserve_size;
+ if (start_addr+length > end_addr)
+ length = end_addr - start_addr;
+
+ /* Release the region of memory assed in by user */
+ start_pfn = PFN_DOWN(start_addr);
+ nr_pages = PFN_DOWN(length);
+ release_memory_range (start_pfn, nr_pages);
+
+ return count;
+}
+
+static ssize_t
+show_release_region(struct kset * kset, char *buf)
+{
+ return sprintf(buf, "ola\n");
+}
+
+static struct subsys_attribute rr = __ATTR(release_region, 0600,
+ show_release_region,
+ store_release_region);
+
+/* ------------------------------------------------- */
+
+static void release_all (void)
+{
+ unsigned long start_pfn, nr_pages;
- /* Release memory that was reserved in early boot */
+ /* Release all memory that was reserved in early boot */
start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
release_memory_range(start_pfn, nr_pages);
+}
+
+static int __init phyp_dump_setup(void)
+{
+ struct device_node *rtas;
+ const int *dump_header;
+ int header_len = 0;
+ int rc;
+
+ /* If no memory was reserved in early boot, there is nothing to do */
+ if (phyp_dump_info->init_reserve_size == 0)
+ return 0;
+
+ /* Return if phyp dump not supported */
+ ibm_configure_kernel_dump = rtas_token("ibm,configure-kernel-dump");
+ if (ibm_configure_kernel_dump == RTAS_UNKNOWN_SERVICE) {
+ release_all();
+ return -ENOSYS;
+ }
+
+ /* Is there dump data waiting for us? */
+ rtas = of_find_node_by_path("/rtas");
+ dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len);
+ if (dump_header == NULL) {
+ release_all();
+ return 0;
+ }
+
+ /* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+ rc = subsys_create_file(&kernel_subsys, &rr);
+ if (rc) {
+ printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc);
+ release_all();
+ return 0;
+ }
return 0;
}
^ permalink raw reply
* [PATCH/RFC 3/6]: phyp dump: reserve-release proof-of-concept
From: Linas Vepstas @ 2007-11-21 22:40 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Initial rough-in/proof of concept of reserving memory in
early boot, and freeing it later. If the previous boot
had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
arch/powerpc/kernel/prom.c | 33 +++++++++++++
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/phyp_dump.c | 71 +++++++++++++++++++++++++++++
include/asm-powerpc/phyp_dump.h | 32 +++++++++++++
4 files changed, 137 insertions(+)
Index: linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h 2007-11-19 17:44:21.000000000 -0600
@@ -0,0 +1,32 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyright (c) 2007 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END (1UL<<28)
+
+struct phyp_dump {
+ /* Memory that is reserved during very early boot. */
+ unsigned long init_reserve_start;
+ unsigned long init_reserve_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-19 19:07:49.000000000 -0600
@@ -0,0 +1,71 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyrhgit (c) 2007 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/pfn.h>
+#include <linux/swap.h>
+
+#include <asm/page.h>
+#include <asm/phyp_dump.h>
+
+/* Global, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_global;
+struct phyp_dump *phyp_dump_info = &phyp_dump_global;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+ struct page *rpage;
+ unsigned long end_pfn;
+ long i;
+
+ end_pfn = start_pfn + nr_pages;
+
+ for (i=start_pfn; i <= end_pfn; i++) {
+ rpage = pfn_to_page(i);
+ if (PageReserved(rpage)) {
+ ClearPageReserved(rpage);
+ init_page_count(rpage);
+ __free_page(rpage);
+ totalram_pages++;
+ }
+ }
+}
+
+static int __init phyp_dump_setup(void)
+{
+ unsigned long start_pfn, nr_pages;
+
+ /* If no memory was reserved in early boot, there is nothing to do */
+ if (phyp_dump_info->init_reserve_size == 0)
+ return 0;
+
+ /* Release memory that was reserved in early boot */
+ start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+ nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
+ release_memory_range(start_pfn, nr_pages);
+
+ return 0;
+}
+
+subsys_initcall(phyp_dump_setup);
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:43:52.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:44:21.000000000 -0600
@@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu
obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o
obj-$(CONFIG_HVCS) += hvcserver.o
obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o
+obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o
Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/kernel/prom.c 2007-11-19 17:43:52.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c 2007-11-19 17:44:21.000000000 -0600
@@ -51,6 +51,7 @@
#include <asm/machdep.h>
#include <asm/pSeries_reconfig.h>
#include <asm/pci-bridge.h>
+#include <asm/phyp_dump.h>
#include <asm/kexec.h>
#ifdef DEBUG
@@ -1011,6 +1012,37 @@ static void __init early_reserve_mem(voi
#endif
}
+#ifdef CONFIG_PHYP_DUMP
+
+/**
+ * reserve_crashed_mem() - reserve all not-yet-dumped mmemory
+ *
+ * This routine will reserve almost all of the memory in the
+ * system, except for a few hundred megabytes used to boot the
+ * new kernel. As the reserved memory is dumped to the dump
+ * device (by userland tools), it will be freed and made available.
+ */
+static void __init reserve_crashed_mem(void)
+{
+ unsigned long crashed_base, crashed_size;
+
+ /* Reserve *everything* above the RMR. We'll free this real soon. */
+ crashed_base = PHYP_DUMP_RMR_END;
+ crashed_size = lmb_end_of_DRAM() - crashed_base;
+
+ /* XXX crashed_ram_end is wrong, since it may be beyond
+ * the memory_limit, it will need to be adjusted. */
+ lmb_reserve(crashed_base, crashed_size);
+
+ phyp_dump_info->init_reserve_start = crashed_base;
+ phyp_dump_info->init_reserve_size = crashed_size;
+}
+
+#else
+static inline void __init reserve_crashed_mem(void) {}
+#endif /* CONFIG_PHYP_DUMP */
+
+
void __init early_init_devtree(void *params)
{
DBG(" -> early_init_devtree(%p)\n", params);
@@ -1043,6 +1075,7 @@ void __init early_init_devtree(void *par
reserve_kdump_trampoline();
reserve_crashkernel();
early_reserve_mem();
+ reserve_crashed_mem();
lmb_enforce_memory_limit(memory_limit);
lmb_analyze();
^ permalink raw reply
* [PATCH/RFC 2/6]: phyp dump: config file
From: Linas Vepstas @ 2007-11-21 22:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Add hypervisor-assisted dump to kernel config
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
-----
arch/powerpc/Kconfig | 11 +++++++++++
1 file changed, 11 insertions(+)
Index: linux-2.6.24-rc2-git4/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/Kconfig 2007-11-14 16:39:20.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/Kconfig 2007-11-15 14:27:33.000000000 -0600
@@ -261,6 +261,17 @@ config CRASH_DUMP
Don't change this unless you know what you are doing.
+config PHYP_DUMP
+ bool "Hypervisor-assisted dump (EXPERIMENTAL)"
+ depends on PPC_PSERIES && EXPERIMENTAL
+ default y
+ help
+ Hypervisor-assisted dump is meant to be a kdump replacement
+ offering robustness and speed not possible without system
+ hypervisor assistence.
+
+ If unsure, say "Y"
+
config PPCBUG_NVRAM
bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC
default y if PPC_PREP
^ permalink raw reply
* [PATCH/RFC 1/6]: phyp dump: Documentation
From: Linas Vepstas @ 2007-11-21 22:37 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Basic documentation for hypervisor-assisted dump.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
Documentation/powerpc/phyp-assisted-dump.txt | 126 +++++++++++++++++++++++++++
1 file changed, 126 insertions(+)
Index: linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt 2007-11-21 16:26:44.000000000 -0600
@@ -0,0 +1,126 @@
+
+ Hypervisor-Assisted Dump
+ ------------------------
+ November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+ with a fresh copy of the kernel. In particular,
+ PCI and I/O devices have been reinitialized and are
+ in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+ immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+ required; the system will be fully usable, and running
+ in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+ the low 256MB of RAM to a previously registered
+ save region. It will also save system state, system
+ registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+ hypervisor will reset PCI and other hardware state.
+ It will *not* clear RAM. It will then launch the
+ bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+ is a new node (ibm,dump-kernel) in the device tree,
+ indicating that there is crash data available from
+ a previous boot. It will boot into only 256MB of RAM,
+ reserving the rest of system memory.
+
+-- Userspace tools will read /proc/kcore to obtain the
+ contents of memory, which holds the previous crashed
+ kernel. The userspace tools may copy this info to
+ disk, or network, nas, san, iscsi, etc. as desired.
+
+-- As the userspace tools complete saving a portion of
+ dump, they echo an offset and size to
+ /sys/kernel/release_region to release the reserved
+ memory back to general use.
+
+ An example of this is:
+ "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+In order for this scheme to work, memory needs to be reserved
+quite early in the boot cycle. However, access to the device
+tree this early in the boot cycle is difficult, and device-tree
+access is needed to determine if there is a crash data waiting.
+To work around this problem, all but 256MB of RAM is reserved
+during early boot. A short while later in boot, a check is made
+to determine if there is dump data waiting. If there isn't,
+then the reserved memory is released to general kernel use.
+If there is dump data, then the /sys/kernel/release_region
+file is created, and the reserved memory is held.
+
+If there is no waiting dump data, then all but 256MB of the
+reserved ram will be released for general kernel use. The
+highest 256 MB of RAM will *not* be released: this region
+will be kept permanently reserved, so that it can act as
+a receptacle for a copy of the low 256MB in the case a crash
+does occur. See, however, "open issues" below, as to whether
+such a reserved region is really needed.
+
+General notes:
+--------------
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues:
+------------
+ o User-space dump tool integration is completely unresolved.
+
+ o The various code paths that tell the hypervisor that a crash
+ occurred, vs. it simply being a normal reboot, should be
+ reviewed, and possibly clarified/fixed.
+
+ o The real-virtual mapping is awkward and unaddressed. There
+ is currently no clear way of matching up the contents of
+ /proc/kcore to the values that need to be sent to
+ /sys/kernel/release_region
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+ instead? There is a dump_subsys being created by the s390 code,
+ perhaps the pseries code should use a similar layout as well.
+
+ o Saved system registers and HPTE tables will be located in high
+ memory. There is currently no way of telling user-space where
+ these are located.
+
+ o The post-dump procedures are incomplete. In particular,
+ after a dump as been taken, the system should re-register
+ with the hypervisor, so that a subsequent crash can be handled.
+
+ o The hypervisor may have an error preserving the dump data.
+ The current code does not check for this error, and does
+ not handle it.
+
+ o Is reserving a 256MB region really required? The goal of
+ reserving a 256MB scratch area is to make sure that no
+ important crash data is clobbered when the hypervisor
+ save low mem to the scratch area. But, if one could assure
+ that nothing important is located in some 256MB area, then
+ it would not need to be reserved.
+
^ permalink raw reply
* [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump
From: Linas Vepstas @ 2007-11-21 22:36 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides
documentation explaining what this is :-). Yes, its supposed
to be an improvement over kdump.
The patches mostly sort-of work; a list of open issues
is inculded in the documentation. It also appears that
the not-yet-released firmware versions this was tested
on are still, ahem, incomplete; this work is also pending.
-- Linas & Manish
^ permalink raw reply
* Re: [PATCH 3/3] [POWERPC] Add docs for Freescale DMA & DMA channel device tree nodes
From: David Gibson @ 2007-11-21 22:28 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev, Timur Tabi
In-Reply-To: <47448687.6010106@freescale.com>
On Wed, Nov 21, 2007 at 01:27:03PM -0600, Scott Wood wrote:
> Kumar Gala wrote:
> > On Nov 21, 2007, at 11:35 AM, Scott Wood wrote:
> >> A cell-index property would be useful here for indexing into the summary
> >> status register.
> >
> > Divide by 0x80.
>
> :-P
>
> Using cell-index for things like this is reasonably common, and endorsed
> by current ePAPR drafts.
Indeed, indexing or writing into shared registers is exactly what
cell-index is for.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply
* Re: pseries (power3) boot hang (pageblock_nr_pages==0)
From: Mel Gorman @ 2007-11-21 22:03 UTC (permalink / raw)
To: Will Schmidt; +Cc: linuxppc-dev, Stephen Rothwell, Linux Memory Management List
In-Reply-To: <1195682111.4421.23.camel@farscape.rchland.ibm.com>
On (21/11/07 15:55), Will Schmidt didst pronounce:
> Hi Folks,
>
> I've been seeing a boot hang/crash on power3 systems for a few weeks.
> (hangs on a 270, drops to SP on a p610). This afternoon I got around
> to tracking it down to the changes in
>
> commit d9c2340052278d8eb2ffb16b0484f8f794def4de
> Do not depend on MAX_ORDER when grouping pages by mobility
>
> cpu 0x0: Vector: 100 (System Reset) at [c00000006e803ae0]
> pc: c00000000009bf50: .setup_per_zone_pages_min+0x298/0x34c
> lr: c00000000009be38: .setup_per_zone_pages_min+0x180/0x34c
> [c00000006e803e20] c0000000005e3898 .init_per_zone_pages_min+0x80/0xa0
> [c00000006e803ea0] c0000000005c9c04 .kernel_init+0x214/0x3d8
> [c00000006e803f90] c000000000026cac .kernel_thread+0x4c/0x68
>
> I narrowed it down to the for loop within setup_zone_migrate_reserve(),
> called by setup_per_zone_pages_min(). The loop spins forever due to
> pageblock_nr_pages being 0.
>
> I imagine this would be properly fixed with something similar to the
> change for iSeries.
Have you tried with the patch that fixed the iSeries boot problem?
Thanks for tracking down the problem to such a specific place.
Here it the iSeries fix in case it applies to this as well.
======
Ordinarily, the size of a pageblock is determined at compile-time based on
the hugepage size. On PPC64, the hugepage size is determined at runtime based
on what is supported by the machine. On legacy machines such as iSeries which
do not support hugepages, HPAGE_SHIFT is 0. This results in pageblock_order
being set to -PAGE_SHIFT and a crash results shortly afterwards.
This patch checks that HPAGE_SHIFT is a sensible value before using the
hugepage size. If it is 0, MAX_ORDER-1 is used instead as this is a sensible
value of pageblock_order.
This is a fix for 2.6.24.
Credit goes to Stephen Rothwell for identifying the bug and testing on
iSeries. Additional credit goes to David Gibson for testing with the
libhugetlbfs test suite.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
arch/powerpc/Kconfig | 5 +++++
mm/page_alloc.c | 11 ++++++++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 18f397c..232c298 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -187,6 +187,11 @@ config FORCE_MAX_ZONEORDER
default "9" if PPC_64K_PAGES
default "13"
+config HUGETLB_PAGE_SIZE_VARIABLE
+ bool
+ depends on HUGETLB_PAGE
+ default y
+
config MATH_EMULATION
bool "Math emulation"
depends on 4xx || 8xx || E200 || PPC_MPC832x || E500
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index da69d83..14e0ac3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3386,7 +3386,16 @@ static void __meminit free_area_init_core(struct pglist_data *pgdat,
if (!size)
continue;
- set_pageblock_order(HUGETLB_PAGE_ORDER);
+ /*
+ * If HPAGE_SHIFT is a sensible value, base the size of a
+ * pageblock on the hugepage size. Otherwise MAX_ORDER-1
+ * is a sensible choice
+ */
+ if (HPAGE_SHIFT > PAGE_SHIFT)
+ set_pageblock_order(HUGETLB_PAGE_ORDER);
+ else
+ set_pageblock_order(MAX_ORDER-1);
+
setup_usemap(pgdat, zone, size);
ret = init_currently_empty_zone(zone, zone_start_pfn,
size, MEMMAP_EARLY);
--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply related
* pseries (power3) boot hang (pageblock_nr_pages==0)
From: Will Schmidt @ 2007-11-21 21:55 UTC (permalink / raw)
To: Mel Gorman, Stephen Rothwell, Linux Memory Management List,
linuxppc-dev
Hi Folks,
I've been seeing a boot hang/crash on power3 systems for a few weeks.
(hangs on a 270, drops to SP on a p610). This afternoon I got around
to tracking it down to the changes in
commit d9c2340052278d8eb2ffb16b0484f8f794def4de
Do not depend on MAX_ORDER when grouping pages by mobility
cpu 0x0: Vector: 100 (System Reset) at [c00000006e803ae0]
pc: c00000000009bf50: .setup_per_zone_pages_min+0x298/0x34c
lr: c00000000009be38: .setup_per_zone_pages_min+0x180/0x34c
[c00000006e803e20] c0000000005e3898 .init_per_zone_pages_min+0x80/0xa0
[c00000006e803ea0] c0000000005c9c04 .kernel_init+0x214/0x3d8
[c00000006e803f90] c000000000026cac .kernel_thread+0x4c/0x68
I narrowed it down to the for loop within setup_zone_migrate_reserve(),
called by setup_per_zone_pages_min(). The loop spins forever due to
pageblock_nr_pages being 0.
I imagine this would be properly fixed with something similar to the
change for iSeries. Depending on how obvious, quick and easy it is for
the experts to come up with a proper fix, I'll be able to do additional
debug and hacking after turkey-day. :-)
For the moment, I've hacked it with the following patch. (tested on
both the 270 and the p610):
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2454,6 +2454,9 @@ static void setup_zone_migrate_reserve(struct zone
*zone)
reserve = roundup(zone->pages_min, pageblock_nr_pages) >>
pageblock_order;
+/* this is a cheap and dirty bailout, probally not a proper fix. */
+ if (pageblock_nr_pages==0) return;
+
for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages)
{
if (!pfn_valid(pfn))
continue;
^ permalink raw reply
* Re: [PATCH] [POWERPC] Emulate isel (Integer Select) instruction
From: Kim Phillips @ 2007-11-21 21:48 UTC (permalink / raw)
To: Scott Wood; +Cc: Geert Uytterhoeven, linuxppc-dev, Paul Mackerras
In-Reply-To: <4744A5EC.1090201@freescale.com>
On Wed, 21 Nov 2007 15:41:00 -0600
Scott Wood <scottwood@freescale.com> wrote:
> Paul Mackerras wrote:
> > Geert Uytterhoeven writes:
> >
> >> +#define WARN_EMULATE(type) \
> >> + do { \
> >> + static unsigned int count; \
> >> + if (count++ < 10) \
> >> + pr_warning("%s used emulated %s instruction\n", \
> >> + current->comm, type); \
> >
> > Thinking about this a bit more, if an instruction gets emulated 10
> > times then I don't care, since it's probably only cost me 10
> > microseconds or so. If it gets emulated a million times then I might
> > want to look at it. So in fact this approach doesn't give me the
> > information I need to know whether there is a real problem or not.
>
> Maybe print the first time, then when it's happened 10 times, then 100,
> then 1000, etc.
>
or just use printk_ratelimit().
Kim
^ permalink raw reply
* Re: [PATCH] [POWERPC] Emulate isel (Integer Select) instruction
From: Scott Wood @ 2007-11-21 21:41 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Geert Uytterhoeven, linuxppc-dev
In-Reply-To: <18244.42181.426804.662877@cargo.ozlabs.ibm.com>
Paul Mackerras wrote:
> Geert Uytterhoeven writes:
>
>> +#define WARN_EMULATE(type) \
>> + do { \
>> + static unsigned int count; \
>> + if (count++ < 10) \
>> + pr_warning("%s used emulated %s instruction\n", \
>> + current->comm, type); \
>
> Thinking about this a bit more, if an instruction gets emulated 10
> times then I don't care, since it's probably only cost me 10
> microseconds or so. If it gets emulated a million times then I might
> want to look at it. So in fact this approach doesn't give me the
> information I need to know whether there is a real problem or not.
Maybe print the first time, then when it's happened 10 times, then 100,
then 1000, etc.
-Scott
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox