All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
@ 2008-01-08  0:13 ` Manish Ahuja
  2008-01-09  4:29   ` Nathan Lynch
  2008-02-12  6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
  1 sibling, 1 reply; 59+ messages in thread
From: Manish Ahuja @ 2008-01-08  0:13 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake


Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>

----
 Documentation/powerpc/phyp-assisted-dump.txt |  129 +++++++++++++++++++++++++++
 1 file changed, 129 insertions(+)

Index: 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt	2008-01-07 18:05:46.000000000 -0600
@@ -0,0 +1,129 @@
+
+                   Hypervisor-Assisted Dump
+                   ------------------------
+                       November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+   and read /proc/vmcore to obtain the contents of memory,
+   which holds the previous crashed kernel. The userspace
+   tools may copy this info to disk, or network, nas, san,
+   iscsi, etc. as desired.
+
+   For Example: the values in /sys/kernel/release-region
+   would look something like this (address-range pairs).
+   CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
+   DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+     "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+In order for this scheme to work, memory needs to be reserved
+quite early in the boot cycle. However, access to the device
+tree this early in the boot cycle is difficult, and device-tree
+access is needed to determine if there is a crash data waiting.
+To work around this problem, all but 256MB of RAM is reserved
+during early boot. A short while later in boot, a check is made
+to determine if there is dump data waiting. If there isn't,
+then the reserved memory is released to general kernel use.
+If there is dump data, then the /sys/kernel/release_region
+file is created, and the reserved memory is held.
+
+If there is no waiting dump data, then all but 256MB of the
+reserved ram will be released for general kernel use. The
+highest 256 MB of RAM will *not* be released: this region
+will be kept permanently reserved, so that it can act as
+a receptacle for a copy of the low 256MB in the case a crash
+does occur. See, however, "open issues" below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+
+General notes:
+--------------
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+------------
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead? There is a dump_subsys being created by the s390 code,
+   perhaps the pseries code should use a similar layout as well.
+
+ o Is reserving a 256MB region really required? The goal of
+   reserving a 256MB scratch area is to make sure that no
+   important crash data is clobbered when the hypervisor
+   save low mem to the scratch area. But, if one could assure
+   that nothing important is located in some 256MB area, then
+   it would not need to be reserved. Something that can be
+   improved in subsequent versions.
+
+ o Still working the kdump team to integrate this with kdump,
+   some work remains but this would not affect the current
+   patches.
+
+ o Still need to write a shell script, to copy the dump away.
+   Currently I am parsing it manually.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-08  0:13 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
@ 2008-01-09  4:29   ` Nathan Lynch
  2008-01-09  4:58     ` Michael Ellerman
  2008-01-09 15:31     ` Linas Vepstas
  0 siblings, 2 replies; 59+ messages in thread
From: Nathan Lynch @ 2008-01-09  4:29 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, lkessler, strosake

Manish Ahuja wrote:
> +
> +                   Hypervisor-Assisted Dump
> +                   ------------------------
> +                       November 2007

Date is unneeded (and, uhm, dated :)


> +The goal of hypervisor-assisted dump is to enable the dump of
> +a crashed system, and to do so from a fully-reset system, and
> +to minimize the total elapsed time until the system is back
> +in production use.

Is it actually faster than kdump?


> +As compared to kdump or other strategies, hypervisor-assisted
> +dump offers several strong, practical advantages:
> +
> +-- Unlike kdump, the system has been reset, and loaded
> +   with a fresh copy of the kernel.  In particular,
> +   PCI and I/O devices have been reinitialized and are
> +   in a clean, consistent state.
> +-- As the dump is performed, the dumped memory becomes
> +   immediately available to the system for normal use.
> +-- After the dump is completed, no further reboots are
> +   required; the system will be fully usable, and running
> +   in it's normal, production mode on it normal kernel.
> +
> +The above can only be accomplished by coordination with,
> +and assistance from the hypervisor. The procedure is
> +as follows:
> +
> +-- When a system crashes, the hypervisor will save
> +   the low 256MB of RAM to a previously registered
> +   save region. It will also save system state, system
> +   registers, and hardware PTE's.
> +
> +-- After the low 256MB area has been saved, the
> +   hypervisor will reset PCI and other hardware state.
> +   It will *not* clear RAM. It will then launch the
> +   bootloader, as normal.
> +
> +-- The freshly booted kernel will notice that there
> +   is a new node (ibm,dump-kernel) in the device tree,
> +   indicating that there is crash data available from
> +   a previous boot. It will boot into only 256MB of RAM,
> +   reserving the rest of system memory.
> +
> +-- Userspace tools will parse /sys/kernel/release_region
> +   and read /proc/vmcore to obtain the contents of memory,
> +   which holds the previous crashed kernel. The userspace
> +   tools may copy this info to disk, or network, nas, san,
> +   iscsi, etc. as desired.
> +
> +   For Example: the values in /sys/kernel/release-region
> +   would look something like this (address-range pairs).
> +   CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
> +   DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
> +
> +-- As the userspace tools complete saving a portion of
> +   dump, they echo an offset and size to
> +   /sys/kernel/release_region to release the reserved
> +   memory back to general use.
> +
> +   An example of this is:
> +     "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
> +   which will release 256MB at the 1GB boundary.

This violates the "one file, one value" rule of sysfs, but nobody
really takes that seriously, I guess.  In any case, consider
documenting this in Documentation/ABI.


> +
> +Please note that the hypervisor-assisted dump feature
> +is only available on Power6-based systems with recent
> +firmware versions.

This statement will of course become dated/incorrect so I recommend
removing it.


> +
> +Implementation details:
> +----------------------
> +In order for this scheme to work, memory needs to be reserved
> +quite early in the boot cycle. However, access to the device
> +tree this early in the boot cycle is difficult, and device-tree
> +access is needed to determine if there is a crash data waiting.

I don't think this bit about early device tree access is correct.  By
the time your code is reserving memory (from early_init_devtree(), I
think), RTAS has been instantiated and you are able to test for the
existence of /rtas/ibm,dump-kernel.


> +To work around this problem, all but 256MB of RAM is reserved
> +during early boot. A short while later in boot, a check is made
> +to determine if there is dump data waiting. If there isn't,
> +then the reserved memory is released to general kernel use.

So I think these gymnastics are unneeded -- unless I'm
misunderstanding something, you should be able to determine very early
whether to reserve that memory.


> +If there is dump data, then the /sys/kernel/release_region
> +file is created, and the reserved memory is held.
> +
> +If there is no waiting dump data, then all but 256MB of the
> +reserved ram will be released for general kernel use. The
> +highest 256 MB of RAM will *not* be released: this region
> +will be kept permanently reserved, so that it can act as
> +a receptacle for a copy of the low 256MB in the case a crash
> +does occur. See, however, "open issues" below, as to whether
> +such a reserved region is really needed.
> +
> +Currently the dump will be copied from /proc/vmcore to a
> +a new file upon user intervention. The starting address
> +to be read and the range for each data point in provided
                                               ^is

> +in /sys/kernel/release_region.
> +
> +The tools to examine the dump will be same as the ones
> +used for kdump.
> +
> +
> +General notes:
> +--------------
> +Security: please note that there are potential security issues
> +with any sort of dump mechanism. In particular, plaintext
> +(unencrypted) data, and possibly passwords, may be present in
> +the dump data. Userspace tools must take adequate precautions to
> +preserve security.
> +
> +Open issues/ToDo:
> +------------
> + o The various code paths that tell the hypervisor that a crash
> +   occurred, vs. it simply being a normal reboot, should be
> +   reviewed, and possibly clarified/fixed.
> +
> + o Instead of using /sys/kernel, should there be a /sys/dump
> +   instead? There is a dump_subsys being created by the s390 code,
> +   perhaps the pseries code should use a similar layout as well.

Well, it seems to me that there's little reason to duplicate the s390
layout unless we can actually share code.

FWIW, I've been thinking about making a /sys/firmware/phyp hierarchy
which could contain much of the System P-specific functions (DLPAR,
lparcfg, other crud in /proc/ppc64)... seems suited to this
platform-specific dump mechanism.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-09  4:29   ` Nathan Lynch
@ 2008-01-09  4:58     ` Michael Ellerman
  2008-01-09 15:31     ` Linas Vepstas
  1 sibling, 0 replies; 59+ messages in thread
From: Michael Ellerman @ 2008-01-09  4:58 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: lkessler, linuxppc-dev, mahuja, linasvepstas, strosake

[-- Attachment #1: Type: text/plain, Size: 4886 bytes --]

On Tue, 2008-01-08 at 22:29 -0600, Nathan Lynch wrote:
> Manish Ahuja wrote:
> > +
> > +                   Hypervisor-Assisted Dump
> > +                   ------------------------
> > +                       November 2007
> 
> Date is unneeded (and, uhm, dated :)
> 
> 
> > +The goal of hypervisor-assisted dump is to enable the dump of
> > +a crashed system, and to do so from a fully-reset system, and
> > +to minimize the total elapsed time until the system is back
> > +in production use.
> 
> Is it actually faster than kdump?
> 
> 
> > +As compared to kdump or other strategies, hypervisor-assisted
> > +dump offers several strong, practical advantages:
> > +
> > +-- Unlike kdump, the system has been reset, and loaded
> > +   with a fresh copy of the kernel.  In particular,
> > +   PCI and I/O devices have been reinitialized and are
> > +   in a clean, consistent state.
> > +-- As the dump is performed, the dumped memory becomes
> > +   immediately available to the system for normal use.
> > +-- After the dump is completed, no further reboots are
> > +   required; the system will be fully usable, and running
> > +   in it's normal, production mode on it normal kernel.
> > +
> > +The above can only be accomplished by coordination with,
> > +and assistance from the hypervisor. The procedure is
> > +as follows:
> > +
> > +-- When a system crashes, the hypervisor will save
> > +   the low 256MB of RAM to a previously registered
> > +   save region. It will also save system state, system
> > +   registers, and hardware PTE's.
> > +
> > +-- After the low 256MB area has been saved, the
> > +   hypervisor will reset PCI and other hardware state.
> > +   It will *not* clear RAM. It will then launch the
> > +   bootloader, as normal.
> > +
> > +-- The freshly booted kernel will notice that there
> > +   is a new node (ibm,dump-kernel) in the device tree,
> > +   indicating that there is crash data available from
> > +   a previous boot. It will boot into only 256MB of RAM,
> > +   reserving the rest of system memory.
> > +
> > +-- Userspace tools will parse /sys/kernel/release_region
> > +   and read /proc/vmcore to obtain the contents of memory,
> > +   which holds the previous crashed kernel. The userspace
> > +   tools may copy this info to disk, or network, nas, san,
> > +   iscsi, etc. as desired.
> > +
> > +   For Example: the values in /sys/kernel/release-region
> > +   would look something like this (address-range pairs).
> > +   CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
> > +   DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
> > +
> > +-- As the userspace tools complete saving a portion of
> > +   dump, they echo an offset and size to
> > +   /sys/kernel/release_region to release the reserved
> > +   memory back to general use.
> > +
> > +   An example of this is:
> > +     "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
> > +   which will release 256MB at the 1GB boundary.
> 
> This violates the "one file, one value" rule of sysfs, but nobody
> really takes that seriously, I guess.  In any case, consider
> documenting this in Documentation/ABI.
> 
> 
> > +
> > +Please note that the hypervisor-assisted dump feature
> > +is only available on Power6-based systems with recent
> > +firmware versions.
> 
> This statement will of course become dated/incorrect so I recommend
> removing it.
> 
> 
> > +
> > +Implementation details:
> > +----------------------
> > +In order for this scheme to work, memory needs to be reserved
> > +quite early in the boot cycle. However, access to the device
> > +tree this early in the boot cycle is difficult, and device-tree
> > +access is needed to determine if there is a crash data waiting.
> 
> I don't think this bit about early device tree access is correct.  By
> the time your code is reserving memory (from early_init_devtree(), I
> think), RTAS has been instantiated and you are able to test for the
> existence of /rtas/ibm,dump-kernel.

Yep it's early_init_devtree(), and yes it's fairly easy to access the
(flattened) device tree at that point.

> > +To work around this problem, all but 256MB of RAM is reserved
> > +during early boot. A short while later in boot, a check is made
> > +to determine if there is dump data waiting. If there isn't,
> > +then the reserved memory is released to general kernel use.
> 
> So I think these gymnastics are unneeded -- unless I'm
> misunderstanding something, you should be able to determine very early
> whether to reserve that memory.

I agree.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-09  4:29   ` Nathan Lynch
  2008-01-09  4:58     ` Michael Ellerman
@ 2008-01-09 15:31     ` Linas Vepstas
  2008-01-09 18:44       ` Nathan Lynch
  1 sibling, 1 reply; 59+ messages in thread
From: Linas Vepstas @ 2008-01-09 15:31 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: mahuja, linuxppc-dev, lkessler, strosake

Hi Nathan,

Thank you for looking at this.

On 08/01/2008, Nathan Lynch <ntl@pobox.com> wrote:
> Manish Ahuja wrote:
> > +
> > +                   Hypervisor-Assisted Dump
> > +                   ------------------------
> > +                       November 2007
>
> Date is unneeded (and, uhm, dated :)

I'll argue very very strongly against this. One of the worst things
I deal with is undated documents. Google pulls up multiple
versions of a document, and you start wondering: "which one
of these is the *right one*?"  Its only after trying some suggested
sequence of commands that aren't working, and more time googling
and effort wasted, that you finallly realize that you are reading
the version 0.1 of documentation from 1999 that some idiot
did not have the decency to put a date on. (A version number
is acceptable).  Documentation is often in mediocre state,
don't compound the problem be leaving off critical info.

> > +The goal of hypervisor-assisted dump is to enable the dump of
> > +a crashed system, and to do so from a fully-reset system, and
> > +to minimize the total elapsed time until the system is back
> > +in production use.
>
> Is it actually faster than kdump?

This is a basic presumption; it should blow it away, as, after all,
it requires one less reboot!  As a side effect, the system is in
production *while* the dump is being taken; with kdump,
you can't go into production until after the dump is finished,
and the system has been rebooted a second time.  On
systems with terabytes of RAM, the time difference can be
hours.

> > +-- As the userspace tools complete saving a portion of
> > +   dump, they echo an offset and size to
> > +   /sys/kernel/release_region to release the reserved
> > +   memory back to general use.
> > +
> > +   An example of this is:
> > +     "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
> > +   which will release 256MB at the 1GB boundary.
>
> This violates the "one file, one value" rule of sysfs, but nobody
> really takes that seriously, I guess.  In any case, consider
> documenting this in Documentation/ABI.

This interface will almost certainly be changed, in order to allow
phyp dump to work with the kdump user-space tools.  Its provisional
right now, until the details of user-space is hammered out.

> > +Please note that the hypervisor-assisted dump feature
> > +is only available on Power6-based systems with recent
> > +firmware versions.
>
> This statement will of course become dated/incorrect so I recommend
> removing it.

A provisional statement. I figure it should get taken out after
a few years.

> > +Implementation details:
> > +----------------------
> > +In order for this scheme to work, memory needs to be reserved
> > +quite early in the boot cycle. However, access to the device
> > +tree this early in the boot cycle is difficult, and device-tree
> > +access is needed to determine if there is a crash data waiting.
>
> I don't think this bit about early device tree access is correct.  By
> the time your code is reserving memory (from early_init_devtree(), I
> think), RTAS has been instantiated and you are able to test for the
> existence of /rtas/ibm,dump-kernel.

If I remember right, it was still too early to look up this token directly,
so we wrote some code to crawl the flat device tree to find it.  But
not only was that a lot of work, but I somehow decided that doing this
to the flat tree was wrong, as otherwise someone would surely have
written the access code.  If this can be made to work, that would be
great, but we couldn't make it work at the time.

> > +To work around this problem, all but 256MB of RAM is reserved
> > +during early boot. A short while later in boot, a check is made
> > +to determine if there is dump data waiting. If there isn't,
> > +then the reserved memory is released to general kernel use.
>
> So I think these gymnastics are unneeded -- unless I'm
> misunderstanding something, you should be able to determine very early
> whether to reserve that memory.

Only if you can get at rtas, but you can't get at rtas at that point.
I think we would have been the earliest users of rtas at that point.
Possibly the order of how things are initialized could be changed.

> > +Open issues/ToDo:
> > +------------
> > + o The various code paths that tell the hypervisor that a crash
> > +   occurred, vs. it simply being a normal reboot, should be
> > +   reviewed, and possibly clarified/fixed.
> > +
> > + o Instead of using /sys/kernel, should there be a /sys/dump
> > +   instead? There is a dump_subsys being created by the s390 code,
> > +   perhaps the pseries code should use a similar layout as well.
>
> Well, it seems to me that there's little reason to duplicate the s390
> layout unless we can actually share code.

I'm thinking "user-space tools".  This could probably be decided very
quickly after a short chat with the s390 architects about tools and
standardization and dump formats and the support process. In the
end, its some third-level support people who will be dissecting the
dump using god-knows-what tool (a hacked kdgb ???), so things
should get standardized as much as possible so that we don't
duplicate work.

> FWIW, I've been thinking about making a /sys/firmware/phyp hierarchy
> which could contain much of the System P-specific functions (DLPAR,
> lparcfg, other crud in /proc/ppc64)... seems suited to this
> platform-specific dump mechanism.

Works for me. I was always unclear what /sys/firmware was supposed
to contain.

--linas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-09 15:31     ` Linas Vepstas
@ 2008-01-09 18:44       ` Nathan Lynch
  2008-01-09 19:28         ` Manish Ahuja
                           ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Nathan Lynch @ 2008-01-09 18:44 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: mahuja, linuxppc-dev, lkessler, strosake

Hi Linas,

Linas Vepstas wrote:
> 
> On 08/01/2008, Nathan Lynch <ntl@pobox.com> wrote:
> > Manish Ahuja wrote:
> > > +
> > > +The goal of hypervisor-assisted dump is to enable the dump of
> > > +a crashed system, and to do so from a fully-reset system, and
> > > +to minimize the total elapsed time until the system is back
> > > +in production use.
> >
> > Is it actually faster than kdump?
> 
> This is a basic presumption;

I used the word "actually".  I already know that it is intended to be
faster.  :)


> it should blow it away, as, after all,
> it requires one less reboot!

There's more than rebooting going on during system dump processing.
Depending on the system type, booting may not be where most time is
spent.


> As a side effect, the system is in
> production *while* the dump is being taken;

A dubious feature IMO.  Seems that the design potentially trades
reliability of first failure data capture for availability.
E.g. system crashes, reboots, resumes processing while copying dump,
crashes again before dump procedure is complete.  How is that handled,
if at all?


> with kdump,
> you can't go into production until after the dump is finished,
> and the system has been rebooted a second time.  On
> systems with terabytes of RAM, the time difference can be
> hours.

The difference in time it takes to resume the normal workload may be
significant, yes.  But the time it takes to get a usable dump image
would seem to be the basically the same.

Since you bring up large systems... a system with terabytes of RAM is
practically guaranteed to be a NUMA configuration with dozens of cpus.
When processing a dump on such a system, I wonder how well we fare:
can we successfully boot with (say) 128 cpus and 256MB of usable
memory?  Do we have to hot-online nodes as system memory is freed up
(and does that even work)?  We need to be able to restore the system
to its optimal topology when the dump is finished; if the best we can
do is a degraded configuration, the workload will suffer and the
system admin is likely to just reboot the machine again so the kernel
will have the right NUMA topology.


> > > +Implementation details:
> > > +----------------------
> > > +In order for this scheme to work, memory needs to be reserved
> > > +quite early in the boot cycle. However, access to the device
> > > +tree this early in the boot cycle is difficult, and device-tree
> > > +access is needed to determine if there is a crash data waiting.
> >
> > I don't think this bit about early device tree access is correct.  By
> > the time your code is reserving memory (from early_init_devtree(), I
> > think), RTAS has been instantiated and you are able to test for the
> > existence of /rtas/ibm,dump-kernel.
> 
> If I remember right, it was still too early to look up this token directly,
> so we wrote some code to crawl the flat device tree to find it.  But
> not only was that a lot of work, but I somehow decided that doing this
> to the flat tree was wrong, as otherwise someone would surely have
> written the access code.  If this can be made to work, that would be
> great, but we couldn't make it work at the time.
> 
> > > +To work around this problem, all but 256MB of RAM is reserved
> > > +during early boot. A short while later in boot, a check is made
> > > +to determine if there is dump data waiting. If there isn't,
> > > +then the reserved memory is released to general kernel use.
> >
> > So I think these gymnastics are unneeded -- unless I'm
> > misunderstanding something, you should be able to determine very early
> > whether to reserve that memory.
> 
> Only if you can get at rtas, but you can't get at rtas at that point.

Sorry, but I think you are mistaken (see Michael's earlier reply).

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-09 18:44       ` Nathan Lynch
@ 2008-01-09 19:28         ` Manish Ahuja
  2008-01-09 22:59         ` Michael Ellerman
  2008-01-10  2:33         ` Linas Vepstas
  2 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-01-09 19:28 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: mahuja, linuxppc-dev, Linas Vepstas, lkessler, strosake

> 
> I used the word "actually".  I already know that it is intended to be
> faster.  :)
> 
>> it should blow it away, as, after all,
>> it requires one less reboot!
> 
> There's more than rebooting going on during system dump processing.
> Depending on the system type, booting may not be where most time is
> spent.
> 
> 
>> As a side effect, the system is in
>> production *while* the dump is being taken;
> 
> A dubious feature IMO.  Seems that the design potentially trades
> reliability of first failure data capture for availability.
> E.g. system crashes, reboots, resumes processing while copying dump,
> crashes again before dump procedure is complete.  How is that handled,
> if at all?

This is a simple version. The intent was not to have a complex dump taking
mechanism in version 1. Subsequent versions will see planned improvement
on the way the pages are tracked and freed. 

Also it is very easily possible now, to register for another dump as soon as the
scratch area is copied to a user designated region. But for now this simple implementation
exists. 

It is also possible to extend this further to only preserve pages that are
kernel pages and free the non required pages like user/data pages etc. This
would reduce the space preserved and would prevent any issues that are
caused by reserving everything in memory except for the first 256 MB. 

Improvements and future versions are planned to make this efficient. But for
now the intent is to get this off the ground and handle simple cases.

> 
> 
>> with kdump,
>> you can't go into production until after the dump is finished,
>> and the system has been rebooted a second time.  On
>> systems with terabytes of RAM, the time difference can be
>> hours.
> 
> The difference in time it takes to resume the normal workload may be
> significant, yes.  But the time it takes to get a usable dump image
> would seem to be the basically the same.
> 
> Since you bring up large systems... a system with terabytes of RAM is
> practically guaranteed to be a NUMA configuration with dozens of cpus.
> When processing a dump on such a system, I wonder how well we fare:
> can we successfully boot with (say) 128 cpus and 256MB of usable
> memory?  Do we have to hot-online nodes as system memory is freed up
> (and does that even work)?  We need to be able to restore the system
> to its optimal topology when the dump is finished; if the best we can
> do is a degraded configuration, the workload will suffer and the
> system admin is likely to just reboot the machine again so the kernel
> will have the right NUMA topology.
> 
> 
>>>> +Implementation details:
>>>> +----------------------
>>>> +In order for this scheme to work, memory needs to be reserved
>>>> +quite early in the boot cycle. However, access to the device
>>>> +tree this early in the boot cycle is difficult, and device-tree
>>>> +access is needed to determine if there is a crash data waiting.
>>> I don't think this bit about early device tree access is correct.  By
>>> the time your code is reserving memory (from early_init_devtree(), I
>>> think), RTAS has been instantiated and you are able to test for the
>>> existence of /rtas/ibm,dump-kernel.
>> If I remember right, it was still too early to look up this token directly,
>> so we wrote some code to crawl the flat device tree to find it.  But
>> not only was that a lot of work, but I somehow decided that doing this
>> to the flat tree was wrong, as otherwise someone would surely have
>> written the access code.  If this can be made to work, that would be
>> great, but we couldn't make it work at the time.
>>
>>>> +To work around this problem, all but 256MB of RAM is reserved
>>>> +during early boot. A short while later in boot, a check is made
>>>> +to determine if there is dump data waiting. If there isn't,
>>>> +then the reserved memory is released to general kernel use.
>>> So I think these gymnastics are unneeded -- unless I'm
>>> misunderstanding something, you should be able to determine very early
>>> whether to reserve that memory.
>> Only if you can get at rtas, but you can't get at rtas at that point.
> 
> Sorry, but I think you are mistaken (see Michael's earlier reply).
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-09 18:44       ` Nathan Lynch
  2008-01-09 19:28         ` Manish Ahuja
@ 2008-01-09 22:59         ` Michael Ellerman
  2008-01-09 23:18           ` Manish Ahuja
  2008-01-10  2:47           ` Linas Vepstas
  2008-01-10  2:33         ` Linas Vepstas
  2 siblings, 2 replies; 59+ messages in thread
From: Michael Ellerman @ 2008-01-09 22:59 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: mahuja, linuxppc-dev, Linas Vepstas, lkessler, strosake

[-- Attachment #1: Type: text/plain, Size: 4146 bytes --]

On Wed, 2008-01-09 at 12:44 -0600, Nathan Lynch wrote:
> Hi Linas,
> 
> Linas Vepstas wrote:
> > 
> > On 08/01/2008, Nathan Lynch <ntl@pobox.com> wrote:
> > > Manish Ahuja wrote:
> > > > +
> > > > +The goal of hypervisor-assisted dump is to enable the dump of
> > > > +a crashed system, and to do so from a fully-reset system, and
> > > > +to minimize the total elapsed time until the system is back
> > > > +in production use.
> > >
> > > Is it actually faster than kdump?
> > 
> > This is a basic presumption;
> 
> > As a side effect, the system is in
> > production *while* the dump is being taken;

It's in "production" with 256MB of RAM? Err. Sure as the dump progresses
more RAM will be freed, but that's hardly production. I think Nathan's
right, any sysadmin who wants predictability will probably double reboot
anyway.

> > with kdump,
> > you can't go into production until after the dump is finished,
> > and the system has been rebooted a second time.  On
> > systems with terabytes of RAM, the time difference can be
> > hours.

> Since you bring up large systems... a system with terabytes of RAM is
> practically guaranteed to be a NUMA configuration with dozens of cpus.
> When processing a dump on such a system, I wonder how well we fare:
> can we successfully boot with (say) 128 cpus and 256MB of usable
> memory?  Do we have to hot-online nodes as system memory is freed up
> (and does that even work)?  We need to be able to restore the system
> to its optimal topology when the dump is finished; if the best we can
> do is a degraded configuration, the workload will suffer and the
> system admin is likely to just reboot the machine again so the kernel
> will have the right NUMA topology.

Yeah that's a good question. Even if the hot-onlining works, there's
still kernel data structures allocated at boot which want to be
node-local. So the end result will be != a "production" boot.

> > > > +Implementation details:
> > > > +----------------------
> > > > +In order for this scheme to work, memory needs to be reserved
> > > > +quite early in the boot cycle. However, access to the device
> > > > +tree this early in the boot cycle is difficult, and device-tree
> > > > +access is needed to determine if there is a crash data waiting.
> > >
> > > I don't think this bit about early device tree access is correct.  By
> > > the time your code is reserving memory (from early_init_devtree(), I
> > > think), RTAS has been instantiated and you are able to test for the
> > > existence of /rtas/ibm,dump-kernel.
> > 
> > If I remember right, it was still too early to look up this token directly,
> > so we wrote some code to crawl the flat device tree to find it.  But
> > not only was that a lot of work, but I somehow decided that doing this
> > to the flat tree was wrong, as otherwise someone would surely have
> > written the access code.  If this can be made to work, that would be
> > great, but we couldn't make it work at the time.
> > 
> > > > +To work around this problem, all but 256MB of RAM is reserved
> > > > +during early boot. A short while later in boot, a check is made
> > > > +to determine if there is dump data waiting. If there isn't,
> > > > +then the reserved memory is released to general kernel use.
> > >
> > > So I think these gymnastics are unneeded -- unless I'm
> > > misunderstanding something, you should be able to determine very early
> > > whether to reserve that memory.
> > 
> > Only if you can get at rtas, but you can't get at rtas at that point.

AFAICT you don't need to get at RTAS, you just need to look at the
device tree to see if the property is present, and that is trivial.

You probably just need to add a check in early_init_dt_scan_rtas() which
sets a flag for the PHYP dump stuff, or add your own scan routine if you
need.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-09 22:59         ` Michael Ellerman
@ 2008-01-09 23:18           ` Manish Ahuja
  2008-01-10  2:47           ` Linas Vepstas
  1 sibling, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-01-09 23:18 UTC (permalink / raw)
  To: michael
  Cc: lkessler, linuxppc-dev, Nathan Lynch, mahuja, Linas Vepstas,
	strosake

> It's in "production" with 256MB of RAM? Err. Sure as the dump progresses
> more RAM will be freed, but that's hardly production. I think Nathan's
> right, any sysadmin who wants predictability will probably double reboot
> anyway.

Thats a changeable parameter. Its something we chose for now. It by no means
is set in stone. Its not a design parameter. If you like to allocate 1GB we can.
But that is something we did for now. we expect this to be a variable value
dependent upon the size of the system. So if you have 128 GB system and you 
can spare 10 gb, you should be able to have 10 GB to boot with. 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-09 18:44       ` Nathan Lynch
  2008-01-09 19:28         ` Manish Ahuja
  2008-01-09 22:59         ` Michael Ellerman
@ 2008-01-10  2:33         ` Linas Vepstas
  2008-01-10  3:17           ` Olof Johansson
  2 siblings, 1 reply; 59+ messages in thread
From: Linas Vepstas @ 2008-01-10  2:33 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: mahuja, linuxppc-dev, lkessler, strosake

On 09/01/2008, Nathan Lynch <ntl@pobox.com> wrote:
> Hi Linas,
>
> Linas Vepstas wrote:
> >
> > As a side effect, the system is in
> > production *while* the dump is being taken;
>
> A dubious feature IMO.

Hmm.  Take it up with Ken Rozendal, this is supposed to be
one of the two main selling points of this thing.

> Seems that the design potentially trades
> reliability of first failure data capture for availability.
> E.g. system crashes, reboots, resumes processing while copying dump,
> crashes again before dump procedure is complete.  How is that handled,
> if at all?

Its handled by the hypervisor.  phyp maintains the copy of the
RMO of  first crash, until such time that the OS declares the
dump of the RMO to be complete. So you'll always have
the RMO of the first crash.

For the rest of RAM, it will come in two parts: some portion
will have been dumped already. The rest has not yet been dumped,
and it will still be there, preserved across the second crash.

So you get both RMO and all of RAM from the first crash.

> > with kdump,
> > you can't go into production until after the dump is finished,
> > and the system has been rebooted a second time.  On
> > systems with terabytes of RAM, the time difference can be
> > hours.
>
> The difference in time it takes to resume the normal workload may be
> significant, yes.  But the time it takes to get a usable dump image
> would seem to be the basically the same.

Yes.

> Since you bring up large systems... a system with terabytes of RAM is
> practically guaranteed to be a NUMA configuration with dozens of cpus.
> When processing a dump on such a system, I wonder how well we fare:
> can we successfully boot with (say) 128 cpus and 256MB of usable
> memory?  Do we have to hot-online nodes as system memory is freed up
> (and does that even work)?  We need to be able to restore the system
> to its optimal topology when the dump is finished; if the best we can
> do is a degraded configuration, the workload will suffer and the
> system admin is likely to just reboot the machine again so the kernel
> will have the right NUMA topology.

Heh. That's the elbow-grease of this thing.  The easy part is to get
the core function working. The hard part is to test these various configs,
and when they don't work, figure out what went wrong. That will take
perseverence and brains.

--linas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-09 22:59         ` Michael Ellerman
  2008-01-09 23:18           ` Manish Ahuja
@ 2008-01-10  2:47           ` Linas Vepstas
  2008-01-10  3:55             ` Michael Ellerman
  1 sibling, 1 reply; 59+ messages in thread
From: Linas Vepstas @ 2008-01-10  2:47 UTC (permalink / raw)
  To: michael; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake

On 09/01/2008, Michael Ellerman <michael@ellerman.id.au> wrote:
>
> > > Only if you can get at rtas, but you can't get at rtas at that point.
>
> AFAICT you don't need to get at RTAS, you just need to look at the
> device tree to see if the property is present, and that is trivial.
>
> You probably just need to add a check in early_init_dt_scan_rtas() which
> sets a flag for the PHYP dump stuff, or add your own scan routine if you
> need.

I no longer remember the details. I do remember spending a lot of time
trying to figure out how to do this. I know I didn't want to write my own scan
routine; maybe that's what stopped me.  As it happens, we also did most
of the development on a broken phyp which simply did not even have
this property, no matter what, and so that may have brain-damaged me.

I went for the "most elegant" solution, where "most elegant" is defined
as "fewest lines of code", "least effort", etc.

Manish may need some hands-on help to extract this token during
early boot.  Hopefully, he'll let us know.

--linas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-10  2:33         ` Linas Vepstas
@ 2008-01-10  3:17           ` Olof Johansson
  2008-01-10  4:12             ` Linas Vepstas
  0 siblings, 1 reply; 59+ messages in thread
From: Olof Johansson @ 2008-01-10  3:17 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake

On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote:

> Heh. That's the elbow-grease of this thing.  The easy part is to get
> the core function working. The hard part is to test these various configs,
> and when they don't work, figure out what went wrong. That will take
> perseverence and brains.

This just sounds like a whole lot of extra work to get a feature that
already exists. Also, features like these seem to just get tested when the
next enterprise distro is released, so they're broken for long stretches
of time in mainline.

There's a bunch of problems like the NUMA ones, which would by far be
easiest to solve by just doing another reboot or kexec, wouldn't they?


-Olof

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-10  2:47           ` Linas Vepstas
@ 2008-01-10  3:55             ` Michael Ellerman
  0 siblings, 0 replies; 59+ messages in thread
From: Michael Ellerman @ 2008-01-10  3:55 UTC (permalink / raw)
  To: linasvepstas; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake

[-- Attachment #1: Type: text/plain, Size: 3344 bytes --]

On Wed, 2008-01-09 at 20:47 -0600, Linas Vepstas wrote:
> On 09/01/2008, Michael Ellerman <michael@ellerman.id.au> wrote:
> >
> > > > Only if you can get at rtas, but you can't get at rtas at that point.
> >
> > AFAICT you don't need to get at RTAS, you just need to look at the
> > device tree to see if the property is present, and that is trivial.
> >
> > You probably just need to add a check in early_init_dt_scan_rtas() which
> > sets a flag for the PHYP dump stuff, or add your own scan routine if you
> > need.
> 
> I no longer remember the details. I do remember spending a lot of time
> trying to figure out how to do this. I know I didn't want to write my own scan
> routine; maybe that's what stopped me.  As it happens, we also did most
> of the development on a broken phyp which simply did not even have
> this property, no matter what, and so that may have brain-damaged me.

Sure, the API docs for the kernel are a little lacking ;)

> I went for the "most elegant" solution, where "most elegant" is defined
> as "fewest lines of code", "least effort", etc.
> 
> Manish may need some hands-on help to extract this token during
> early boot.  Hopefully, he'll let us know.

It would just be something like:

--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -901,6 +901,11 @@ int __init early_init_dt_scan_rtas(unsigned long node,
                rtas.size = *sizep;
        }
 
+#ifdef CONFIG_PHYP_DUMP
+       if (of_get_flat_dt_prop(node, "ibm,dump-kernel", NULL))
+               phyp_dump_is_active++;
+#endif
+
 #ifdef CONFIG_UDBG_RTAS_CONSOLE
        basep = of_get_flat_dt_prop(node, "put-term-char", NULL);
        if (basep)


Or to do your own scan routine:


diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index acc0d24..442134e 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -1022,6 +1022,7 @@ void __init early_init_devtree(void *params)
        /* Some machines might need RTAS info for debugging, grab it now. */
        of_scan_flat_dt(early_init_dt_scan_rtas, NULL);
 #endif
+       of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL);
 
        /* Retrieve various informations from the /chosen node of the
         * device-tree, including the platform type, initrd location and
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 52e95c2..af2b6e8 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -883,6 +883,19 @@ void __init rtas_initialize(void)
 #endif
 }
 
+int __init early_init_dt_scan_phyp_dump(unsigned long node,
+               const char *uname, int depth, void *data)
+{
+#ifdef CONFIG_PHYP_DUMP
+       if (depth != 1 || strcmp(uname, "rtas") != 0)
+               return 0;
+
+       if (of_get_flat_dt_prop(node, "ibm,dump-kernel", NULL))
+               phyp_dump_is_active++;
+#endif
+       return 1;
+}
+
 int __init early_init_dt_scan_rtas(unsigned long node,
                const char *uname, int depth, void *data)
 {


cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-10  3:17           ` Olof Johansson
@ 2008-01-10  4:12             ` Linas Vepstas
  2008-01-10  4:52               ` Michael Ellerman
  2008-01-10 16:21               ` Olof Johansson
  0 siblings, 2 replies; 59+ messages in thread
From: Linas Vepstas @ 2008-01-10  4:12 UTC (permalink / raw)
  To: Olof Johansson; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake

On 09/01/2008, Olof Johansson <olof@lixom.net> wrote:
> On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote:
>
> > Heh. That's the elbow-grease of this thing.  The easy part is to get
> > the core function working. The hard part is to test these various configs,
> > and when they don't work, figure out what went wrong. That will take
> > perseverence and brains.
>
> This just sounds like a whole lot of extra work to get a feature that
> already exists.

Well, no. kexec is horribly ill-behaved with respect to PCI. The
kexec kernel starts running with PCI devices in some random
state; maybe they're DMA'ing or who knows what. kexec tries
real hard to whack a few needed pci devices into submission
but it has been hit-n-miss, and the source of 90% of the kexec
headaches and debugging effort. Its not pretty.

If all pci-host bridges could shut-down or settle the bus, and
raise the #RST line high, and then if all BIOS'es supported
this, you'd be right. But they can't ....

--linas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-10  4:12             ` Linas Vepstas
@ 2008-01-10  4:52               ` Michael Ellerman
  2008-01-10 16:21               ` Olof Johansson
  1 sibling, 0 replies; 59+ messages in thread
From: Michael Ellerman @ 2008-01-10  4:52 UTC (permalink / raw)
  To: linasvepstas
  Cc: linuxppc-dev, lkessler, mahuja, Nathan Lynch, Olof Johansson,
	strosake

[-- Attachment #1: Type: text/plain, Size: 1242 bytes --]

On Wed, 2008-01-09 at 22:12 -0600, Linas Vepstas wrote:
> On 09/01/2008, Olof Johansson <olof@lixom.net> wrote:
> > On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote:
> >
> > > Heh. That's the elbow-grease of this thing.  The easy part is to get
> > > the core function working. The hard part is to test these various configs,
> > > and when they don't work, figure out what went wrong. That will take
> > > perseverence and brains.
> >
> > This just sounds like a whole lot of extra work to get a feature that
> > already exists.
> 
> Well, no. kexec is horribly ill-behaved with respect to PCI. The
> kexec kernel starts running with PCI devices in some random
> state; maybe they're DMA'ing or who knows what. kexec tries
> real hard to whack a few needed pci devices into submission
> but it has been hit-n-miss, and the source of 90% of the kexec
> headaches and debugging effort. Its not pretty.

Isn't that what EEH and the IOMMU are for? :)

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-10  4:12             ` Linas Vepstas
  2008-01-10  4:52               ` Michael Ellerman
@ 2008-01-10 16:21               ` Olof Johansson
  2008-01-10 16:34                 ` Linas Vepstas
  1 sibling, 1 reply; 59+ messages in thread
From: Olof Johansson @ 2008-01-10 16:21 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake

On Wed, Jan 09, 2008 at 10:12:13PM -0600, Linas Vepstas wrote:
> On 09/01/2008, Olof Johansson <olof@lixom.net> wrote:
> > On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote:
> >
> > > Heh. That's the elbow-grease of this thing.  The easy part is to get
> > > the core function working. The hard part is to test these various configs,
> > > and when they don't work, figure out what went wrong. That will take
> > > perseverence and brains.
> >
> > This just sounds like a whole lot of extra work to get a feature that
> > already exists.
> 
> Well, no. kexec is horribly ill-behaved with respect to PCI. The
> kexec kernel starts running with PCI devices in some random
> state; maybe they're DMA'ing or who knows what. kexec tries
> real hard to whack a few needed pci devices into submission
> but it has been hit-n-miss, and the source of 90% of the kexec
> headaches and debugging effort. Its not pretty.

It surprises me that this hasn't been possible to resolve with less than
architecting a completely new interface, given that the platform has
all this fancy support for isolating and resetting adapters. After all,
the exact same thing has to be done by the hypervisor before rebooting
the partition.

> If all pci-host bridges could shut-down or settle the bus, and
> raise the #RST line high, and then if all BIOS'es supported
> this, you'd be right. But they can't ....

This argument doesn't hold. We're not talking about some generic PC with
a crappy BIOS here, we're specifically talking about POWER6 PHYP. It
certainly already has ways to reset adapters in it, or EEH wouldn't
work. Actually, the whole phyp dump feature wouldn't work either, since
it's exactly what the firmware has to do under the covers as well.


-Olof

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-10 16:21               ` Olof Johansson
@ 2008-01-10 16:34                 ` Linas Vepstas
  2008-01-10 21:46                   ` Mike Strosaker
  0 siblings, 1 reply; 59+ messages in thread
From: Linas Vepstas @ 2008-01-10 16:34 UTC (permalink / raw)
  To: Olof Johansson; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake

On 10/01/2008, Olof Johansson <olof@lixom.net> wrote:
> On Wed, Jan 09, 2008 at 10:12:13PM -0600, Linas Vepstas wrote:
> > On 09/01/2008, Olof Johansson <olof@lixom.net> wrote:
> > > On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote:
> > >
> > > > Heh. That's the elbow-grease of this thing.  The easy part is to get
> > > > the core function working. The hard part is to test these various configs,
> > > > and when they don't work, figure out what went wrong. That will take
> > > > perseverence and brains.
> > >
> > > This just sounds like a whole lot of extra work to get a feature that
> > > already exists.
> >
> > Well, no. kexec is horribly ill-behaved with respect to PCI. The
> > kexec kernel starts running with PCI devices in some random
> > state; maybe they're DMA'ing or who knows what. kexec tries
> > real hard to whack a few needed pci devices into submission
> > but it has been hit-n-miss, and the source of 90% of the kexec
> > headaches and debugging effort. Its not pretty.
>
> It surprises me that this hasn't been possible to resolve with less than
> architecting a completely new interface, given that the platform has
> all this fancy support for isolating and resetting adapters. After all,
> the exact same thing has to be done by the hypervisor before rebooting
> the partition.

OK, point taken.

-- The phyp interfaces are there for AIX, which I guess must
   not have kexec-like ability. So this is a case of Linux leveraging
  a feature architected for AIX.

-- There's also this idea, somewhat weak, that the crash may
   have corrupted the ram where the  kexec kernel sits.
   For someone who is used to seeing crashes due to
   null pointer deref's, this seems fairly unlikely. But perhaps
   crashes in production systems are more mind-bending.
   (we did have a case where a USB stick used for boot
   continued to scribble on memory long after it was
   supposed to be quiet and unused. This resulted in
   a very hard to debug crash.)

   A solution to a corrupted
   kexec kernel would be to disable memory access to
   where kexec sits, e.g un-mapping or making r/o the
   pages where it lies. This begs the questions of "who
   unhides the kexec kernel", and "what if this 'who' gets
   corrupted"?

   In short, the kexec kernel does not boot
   exactly the same as a cold boot, and so this opens
   a can of worms about "well, what's different, how do
   we minimize these differences, etc." and I think that
   lead AIX to punt, and say "lets just use one single,
   well-known boot loader/ boot sequence instead of
   inventing a new one", thus leading to the phyp design.

   But that's just my guess.. :-)

--linas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-10 16:34                 ` Linas Vepstas
@ 2008-01-10 21:46                   ` Mike Strosaker
  2008-01-11  1:26                     ` Nathan Lynch
  0 siblings, 1 reply; 59+ messages in thread
From: Mike Strosaker @ 2008-01-10 21:46 UTC (permalink / raw)
  To: linasvepstas
  Cc: linuxppc-dev, lkessler, mahuja, Nathan Lynch, Olof Johansson,
	strosake

Linas Vepstas wrote:
> On 10/01/2008, Olof Johansson <olof@lixom.net> wrote:
> 
>>On Wed, Jan 09, 2008 at 10:12:13PM -0600, Linas Vepstas wrote:
>>
>>>On 09/01/2008, Olof Johansson <olof@lixom.net> wrote:
>>>
>>>>On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote:
>>>>
>>>>
>>>>>Heh. That's the elbow-grease of this thing.  The easy part is to get
>>>>>the core function working. The hard part is to test these various configs,
>>>>>and when they don't work, figure out what went wrong. That will take
>>>>>perseverence and brains.
>>>>
>>>>This just sounds like a whole lot of extra work to get a feature that
>>>>already exists.
>>>
>>>Well, no. kexec is horribly ill-behaved with respect to PCI. The
>>>kexec kernel starts running with PCI devices in some random
>>>state; maybe they're DMA'ing or who knows what. kexec tries
>>>real hard to whack a few needed pci devices into submission
>>>but it has been hit-n-miss, and the source of 90% of the kexec
>>>headaches and debugging effort. Its not pretty.
>>
>>It surprises me that this hasn't been possible to resolve with less than
>>architecting a completely new interface, given that the platform has
>>all this fancy support for isolating and resetting adapters. After all,
>>the exact same thing has to be done by the hypervisor before rebooting
>>the partition.
> 
> 
> OK, point taken.
> 
> -- The phyp interfaces are there for AIX, which I guess must
>    not have kexec-like ability. So this is a case of Linux leveraging
>   a feature architected for AIX.

Certainly AIX was in a more difficult position at the time, because they don't 
have a kexec equivalent, and thus were collecting dump data with a potentially 
faulty kernel.  It makes sense to have something outside the partition collect or 
maintain the data; ideally, some kind of service partition would extract dump 
data from a failed partition, but giving one partition total access to the memory 
of another is clearly risky.  Both the PHYP-assistance method and the kexec 
method are ways to simulate that without the risk.

At the risk of repeating what others have already said, the PHYP-assistance 
method provides some advantages that the kexec method cannot:
  - Availability of the system for production use before the dump data is 
collected.  As was mentioned before, some production systems may choose not to 
operate with the limited memory initially available after the reboot, but it sure 
is nice to provide the option.
  - Ensuring that the devices are in a good state.  PHYP doesn't expose a method 
to force adapters into a frozen state, (which I agree would be useful), and I 
don't know of any plans to do so.  What we are starting to see is that some 
drivers need modifications in order to work correctly with kdump [1].  Supporting 
PHYP-assisted dump would eliminate those issues.
  - The small possibility that the kexec area could have been munged by the 
failing kernel, preventing it from being able to collect a dump.

The NUMA issues are daunting, but not insurmountable.  Release early, release 
often, n'est-ce pas?

Mike

[1] http://ozlabs.org/pipermail/linuxppc-dev/2007-November/045663.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-10 21:46                   ` Mike Strosaker
@ 2008-01-11  1:26                     ` Nathan Lynch
  2008-01-11 16:57                       ` Linas Vepstas
  0 siblings, 1 reply; 59+ messages in thread
From: Nathan Lynch @ 2008-01-11  1:26 UTC (permalink / raw)
  To: Mike Strosaker
  Cc: linuxppc-dev, lkessler, mahuja, Olof Johansson, linasvepstas,
	strosake

Mike Strosaker wrote:
> 
> At the risk of repeating what others have already said, the PHYP-assistance 
> method provides some advantages that the kexec method cannot:
>  - Availability of the system for production use before the dump data is 
> collected.  As was mentioned before, some production systems may choose not 
> to operate with the limited memory initially available after the reboot, 
> but it sure is nice to provide the option.

I'm more concerned that this design encourages the user to resume a
workload *which is almost certainly known to result in a system crash*
before collection of crash data is complete.  Maybe the gamble will
pay off most of the time, but I wouldn't want to be working support
when it doesn't.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-11  1:26                     ` Nathan Lynch
@ 2008-01-11 16:57                       ` Linas Vepstas
  2008-01-14  5:24                         ` Olof Johansson
  0 siblings, 1 reply; 59+ messages in thread
From: Linas Vepstas @ 2008-01-11 16:57 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: linuxppc-dev, lkessler, mahuja, Olof Johansson, strosake

On 10/01/2008, Nathan Lynch <ntl@pobox.com> wrote:
> Mike Strosaker wrote:
> >
> > At the risk of repeating what others have already said, the PHYP-assistance
> > method provides some advantages that the kexec method cannot:
> >  - Availability of the system for production use before the dump data is
> > collected.  As was mentioned before, some production systems may choose not
> > to operate with the limited memory initially available after the reboot,
> > but it sure is nice to provide the option.
>
> I'm more concerned that this design encourages the user to resume a
> workload *which is almost certainly known to result in a system crash*
> before collection of crash data is complete.  Maybe the gamble will
> pay off most of the time, but I wouldn't want to be working support
> when it doesn't.

Workloads that cause crashes within hours of startup tend to be
weeded-out/discovered during pre-production test of the system
to be deployed. Since its pre-production test, dumps can be
taken in a leisurely manner. Heck, even a session at the
xmon prompt can be contemplated.

The problem is when the crash only reproduces after days or
weeks of uptime, on a production machine.  Since the machine
is in production, its got to be brought back up ASAP.  Since
its crashing only after days/weeks, the dump should have
plenty of time to complete.  (And if it crashes quickly after
that reboot ... well, support people always welcome ways
in which a bug can be reproduced more quickly/easily).

--linas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-11 16:57                       ` Linas Vepstas
@ 2008-01-14  5:24                         ` Olof Johansson
  2008-01-14 15:21                           ` Linas Vepstas
  0 siblings, 1 reply; 59+ messages in thread
From: Olof Johansson @ 2008-01-14  5:24 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: lkessler, linuxppc-dev, Nathan Lynch, mahuja, strosake

On Fri, Jan 11, 2008 at 10:57:51AM -0600, Linas Vepstas wrote:
> On 10/01/2008, Nathan Lynch <ntl@pobox.com> wrote:
> > Mike Strosaker wrote:
> > >
> > > At the risk of repeating what others have already said, the PHYP-assistance
> > > method provides some advantages that the kexec method cannot:
> > >  - Availability of the system for production use before the dump data is
> > > collected.  As was mentioned before, some production systems may choose not
> > > to operate with the limited memory initially available after the reboot,
> > > but it sure is nice to provide the option.
> >
> > I'm more concerned that this design encourages the user to resume a
> > workload *which is almost certainly known to result in a system crash*
> > before collection of crash data is complete.  Maybe the gamble will
> > pay off most of the time, but I wouldn't want to be working support
> > when it doesn't.
> 
> Workloads that cause crashes within hours of startup tend to be
> weeded-out/discovered during pre-production test of the system
> to be deployed. Since its pre-production test, dumps can be
> taken in a leisurely manner. Heck, even a session at the
> xmon prompt can be contemplated.
> 
> The problem is when the crash only reproduces after days or
> weeks of uptime, on a production machine.  Since the machine
> is in production, its got to be brought back up ASAP.  Since
> its crashing only after days/weeks, the dump should have
> plenty of time to complete.  (And if it crashes quickly after
> that reboot ... well, support people always welcome ways
> in which a bug can be reproduced more quickly/easily).

How do you expect to have it in full production if you don't have all
resources available for it? It's not until the dump has finished that you
can return all memory to the production environment and use it.

This can very easily be argued in both direction, with no clear winner:
If the crash is stress-induced (say a slashdotted website), for those
cases it seems more rational to take the time, collect _good data_ even
if it takes a little longer, and then go back into production. Especially
if the alternative is to go back into production immediately, collect
about half of the data, and then crash again. Rinse and repeat.

Anyway -- I can agree that some of the arguments w.r.t robustness and
reliability of collecting dumps can be higher using this approach. It
really surprises me that there's no way to reset a device through PHYP
though. Seems like such a fundamental feature.

I think people are overly optimistic if they think it'll be possible
to do all of this reliably (as in with consistent performance) without
a second reboot though. At least without similar amounts of work being
done as it would have taken to fix kdump's reliability in the first place.

Speaking of reboots. PHYP isn't known for being quick at rebooting a
partition, it used to take in the order of minutes even on a small
machine. Has that been fixed? If not, the avoiding an extra reboot
argument hardly seems like a benefit versus kdump+kexec, which reboots
nearly instantly and without involvement from PHYP.


-Olof

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-14  5:24                         ` Olof Johansson
@ 2008-01-14 15:21                           ` Linas Vepstas
  0 siblings, 0 replies; 59+ messages in thread
From: Linas Vepstas @ 2008-01-14 15:21 UTC (permalink / raw)
  To: Olof Johansson; +Cc: lkessler, linuxppc-dev, Nathan Lynch, mahuja, strosake

On 13/01/2008, Olof Johansson <olof@lixom.net> wrote:
>
> How do you expect to have it in full production if you don't have all
> resources available for it? It's not until the dump has finished that you
> can return all memory to the production environment and use it.

With the PHYP dump, each chunk of RAM is returned for
general use immediately after being dumped; so its not
an all-or-nothing proposition.  Production systems don't
often hit 100% RAM use right out of the gate, they often
take hours or days to get there, so again, there should
be time to dump.

> This can very easily be argued in both direction, with no clear winner:
> If the crash is stress-induced (say a slashdotted website), for those
> cases it seems more rational to take the time, collect _good data_ even
> if it takes a little longer, and then go back into production. Especially
> if the alternative is to go back into production immediately, collect
> about half of the data, and then crash again. Rinse and repeat.

Again, the mode of operation for the phyp dump  is that you'll
always have all of the data from the *first* crash, even if there
are multiple crashes. That's because the the as-yet undumped
RAM is not put back into production.

> really surprises me that there's no way to reset a device through PHYP
> though. Seems like such a fundamental feature.

I don't know who said that; that's not right. The EEH function
certainly does allow you to halt/restart PCI traffic to a particular
device and also to reset the device.  So, yes, the pSeries
kexec code should call into the eeh subsystem to rationalize
the device state.

> I think people are overly optimistic if they think it'll be possible
> to do all of this reliably (as in with consistent performance) without
> a second reboot though.

The NUMA issues do concern me. But then, the whole virtualized,
fractional-cpu, tickless operation stuff sounds like a performance
tuning nightmare to begin with.

> At least without similar amounts of work being
> done as it would have taken to fix kdump's reliability in the first place.

:-)


> Speaking of reboots. PHYP isn't known for being quick at rebooting a
> partition, it used to take in the order of minutes even on a small
> machine. Has that been fixed?

Dunno.  Probably not.

>  If not, the avoiding an extra reboot
> argument hardly seems like a benefit versus kdump+kexec, which reboots
> nearly instantly and without involvement from PHYP.

OK, let me tell you what I'm up against right now.
I'm dealing with sporadic corruption on my home box.

About a month ago, I bought a whizzy ASUS M2NE
motherboard & an AMD64 2-core cpu, and two sticks
of RAM, 1GB per stick. I have one new hard drive,
SATA, and one old hard drive, from my old machine,
the PATA.  The two disks are mirrored in a RAID-1
config. Running Ubuntu.

During install/upgrade a month ago, I noticed some of
the install files seemed to have gotten corrupted, but
that downloading them again got me a working version.
This put a serious frown on my face: maybe a bad ethernet
card or connection !?

Two weeks ago, gcc stopped working one morning, although
it worked fine the night before. I'd done nothing in the interim
but sleep. Reinstalling it made it work again. Yesterday,
something else stopped working.  I found the offending
library, I compared file checksums against a known-good
version, and they were off. (!!!) Disk corruption?

Then apt-get stopped working. The /var/lib/dpkg/status file
had randomly corrupted single bytes. Its ascii, I hand
repaired it; it had maybe 10 bad bytes out of 2MB total size.

I installed tripwire. Between the first run of tripwire, and the
second, less than an hour later, it reported several dozen
files have changed checksums. Manual inspection of some
of these files against known-good versions show that, at least
this morning, that's no longer the case.

System hasn't crashed in a month, since first boot.  So
what's going on? Is it possible that one of the two disks
is serving up bad data, which explains the funny checksum
behaviour? Or maybe its bad RAM, so that a fresh disk
read shows good data?  If its bad ram, why doesn't the
system crash?  I forced fsck last night, fsck came back
spotless.

So ... moral of the story: If phyp is doing some sort of
hardware checks and validation, that's great. I wish I could
afford a pSeries system for my home computer, because
my impression is that they are very stable, and don't do
things like data corruption.  I'm such a friggin cheapskate
that I can't bear to spend many thousands instead of many
hundreds of dollars. However, I will trade a longer boot
for the dream of higher reliability.

--linas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-01-22 19:12 Manish Ahuja
@ 2008-01-22 19:26 ` Manish Ahuja
  0 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-01-22 19:26 UTC (permalink / raw)
  To: ppc-dev, paulus, linasvepstas; +Cc: Larry Kessler, Michael Strosaker

Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>

----
 Documentation/powerpc/phyp-assisted-dump.txt |  129 +++++++++++++++++++++++++++
 1 file changed, 129 insertions(+)

Index: 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt	2008-01-07 18:05:46.000000000 -0600
@@ -0,0 +1,129 @@
+
+                   Hypervisor-Assisted Dump
+                   ------------------------
+                       November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+   and read /proc/vmcore to obtain the contents of memory,
+   which holds the previous crashed kernel. The userspace
+   tools may copy this info to disk, or network, nas, san,
+   iscsi, etc. as desired.
+
+   For Example: the values in /sys/kernel/release-region
+   would look something like this (address-range pairs).
+   CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
+   DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+     "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+In order for this scheme to work, memory needs to be reserved
+quite early in the boot cycle. However, access to the device
+tree this early in the boot cycle is difficult, and device-tree
+access is needed to determine if there is a crash data waiting.
+To work around this problem, all but 256MB of RAM is reserved
+during early boot. A short while later in boot, a check is made
+to determine if there is dump data waiting. If there isn't,
+then the reserved memory is released to general kernel use.
+If there is dump data, then the /sys/kernel/release_region
+file is created, and the reserved memory is held.
+
+If there is no waiting dump data, then all but 256MB of the
+reserved ram will be released for general kernel use. The
+highest 256 MB of RAM will *not* be released: this region
+will be kept permanently reserved, so that it can act as
+a receptacle for a copy of the low 256MB in the case a crash
+does occur. See, however, "open issues" below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+
+General notes:
+--------------
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+------------
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead? There is a dump_subsys being created by the s390 code,
+   perhaps the pseries code should use a similar layout as well.
+
+ o Is reserving a 256MB region really required? The goal of
+   reserving a 256MB scratch area is to make sure that no
+   important crash data is clobbered when the hypervisor
+   save low mem to the scratch area. But, if one could assure
+   that nothing important is located in some 256MB area, then
+   it would not need to be reserved. Something that can be
+   improved in subsequent versions.
+
+ o Still working the kdump team to integrate this with kdump,
+   some work remains but this would not affect the current
+   patches.
+
+ o Still need to write a shell script, to copy the dump away.
+   Currently I am parsing it manually.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-02-12  6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
@ 2008-02-12  6:53   ` Manish Ahuja
  0 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-12  6:53 UTC (permalink / raw)
  To: linuxppc-dev, paulus; +Cc: Manish Ahuja

Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>

----
 Documentation/powerpc/phyp-assisted-dump.txt |  127 +++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

Index: 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt	2008-02-12 06:38:25.000000000 -0600
@@ -0,0 +1,127 @@
+
+                   Hypervisor-Assisted Dump
+                   ------------------------
+                       November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+   and read /proc/vmcore to obtain the contents of memory,
+   which holds the previous crashed kernel. The userspace
+   tools may copy this info to disk, or network, nas, san,
+   iscsi, etc. as desired.
+
+   For Example: the values in /sys/kernel/release-region
+   would look something like this (address-range pairs).
+   CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
+   DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+     "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+
+During boot, a check is made to see if firmware supports
+this feature on this particular machine. If it does, then
+we check to see if a active dump is waiting for us. If yes
+then everything but 256 MB of RAM is reserved during early
+boot. This area is released once we collect a dump from user
+land scripts that are run. If there is dump data, then
+the /sys/kernel/release_region file is created, and
+the reserved memory is held.
+
+If there is no waiting dump data, then only the highest
+256MB of the ram is reserved as a scratch area. This area
+is *not* be released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy
+of the low 256MB in the case a crash does occur. See,
+however, "open issues" below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+General notes:
+--------------
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+------------
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead? There is a dump_subsys being created by the s390 code,
+   perhaps the pseries code should use a similar layout as well.
+
+ o Is reserving a 256MB region really required? The goal of
+   reserving a 256MB scratch area is to make sure that no
+   important crash data is clobbered when the hypervisor
+   save low mem to the scratch area. But, if one could assure
+   that nothing important is located in some 256MB area, then
+   it would not need to be reserved. Something that can be
+   improved in subsequent versions.
+
+ o Still working the kdump team to integrate this with kdump,
+   some work remains but this would not affect the current
+   patches.
+
+ o Still need to write a shell script, to copy the dump away.
+   Currently I am parsing it manually.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
@ 2008-02-18  4:53 Manish Ahuja
  2008-02-18  5:34 ` [PATCH 1/8] pseries: phyp dump: Documentation Manish Ahuja
                   ` (8 more replies)
  0 siblings, 9 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-18  4:53 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas

The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides 
documentation explaining what this is  :-) . Yes, its supposed
to be an improvement over kdump.

A list of open issues / todo list is included in the documentation.
It also appears that the not-yet-released firmware versions this was tested 
on are still, ahem, incomplete; this work is also pending.

I have included most of the changes requested. Although, I did find
one or two, fixed in a later patch file rather than the first location
they appeared at.

Also it now does not block any memory on machines other than power6 boxes
which have the requisite firmware. This is from a power5 box.

from jal-lp6 a power5 machine.
.........
Phyp-dump not supported on this hardware
Using pSeries machine description
console [udbg-1] enabled
.......

I think I incorporated everyones comments so far. 


-- Manish & Linas.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 1/8] pseries: phyp dump: Documentation
  2008-02-18  4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
@ 2008-02-18  5:34 ` Manish Ahuja
  2008-02-18  5:36 ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-18  5:34 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja


Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>

----
 Documentation/powerpc/phyp-assisted-dump.txt |  127 +++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

Index: 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt	2008-02-18 03:22:33.000000000 -0600
@@ -0,0 +1,127 @@
+
+                   Hypervisor-Assisted Dump
+                   ------------------------
+                       November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+   and read /proc/vmcore to obtain the contents of memory,
+   which holds the previous crashed kernel. The userspace
+   tools may copy this info to disk, or network, nas, san,
+   iscsi, etc. as desired.
+
+   For Example: the values in /sys/kernel/release-region
+   would look something like this (address-range pairs).
+   CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
+   DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+     "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+
+During boot, a check is made to see if firmware supports
+this feature on this particular machine. If it does, then
+we check to see if a active dump is waiting for us. If yes
+then everything but 256 MB of RAM is reserved during early
+boot. This area is released once we collect a dump from user
+land scripts that are run. If there is dump data, then
+the /sys/kernel/release_region file is created, and
+the reserved memory is held.
+
+If there is no waiting dump data, then only the highest
+256MB of the ram is reserved as a scratch area. This area
+is *not* be released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy
+of the low 256MB in the case a crash does occur. See,
+however, "open issues" below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+General notes:
+--------------
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+------------
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead? There is a dump_subsys being created by the s390 code,
+   perhaps the pseries code should use a similar layout as well.
+
+ o Is reserving a 256MB region really required? The goal of
+   reserving a 256MB scratch area is to make sure that no
+   important crash data is clobbered when the hypervisor
+   save low mem to the scratch area. But, if one could assure
+   that nothing important is located in some 256MB area, then
+   it would not need to be reserved. Something that can be
+   improved in subsequent versions.
+
+ o Still working the kdump team to integrate this with kdump,
+   some work remains but this would not affect the current
+   patches.
+
+ o Still need to write a shell script, to copy the dump away.
+   Currently I am parsing it manually.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
  2008-02-18  4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
  2008-02-18  5:34 ` [PATCH 1/8] pseries: phyp dump: Documentation Manish Ahuja
@ 2008-02-18  5:36 ` Manish Ahuja
  2008-02-18  5:38 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-18  5:36 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja


Initial patch for reserving memory in early boot, and freeing it later.
If the previous boot had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>

----
 arch/powerpc/kernel/prom.c                 |   49 ++++++++++++++++++++
 arch/powerpc/kernel/rtas.c                 |   34 +++++++++++++
 arch/powerpc/platforms/pseries/Makefile    |    1 
 arch/powerpc/platforms/pseries/phyp_dump.c |   71 +++++++++++++++++++++++++++++
 include/asm-powerpc/phyp_dump.h            |   38 +++++++++++++++
 include/asm-powerpc/rtas.h                 |    3 +
 6 files changed, 196 insertions(+)

Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h	2008-02-18 04:30:28.000000000 -0600
@@ -0,0 +1,38 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ *      This program is free software; you can redistribute it and/or
+ *      modify it under the terms of the GNU General Public License
+ *      as published by the Free Software Foundation; either version
+ *      2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END   (1UL<<28)
+
+struct phyp_dump {
+	/* Memory that is reserved during very early boot. */
+	unsigned long init_reserve_start;
+	unsigned long init_reserve_size;
+	/* Check status during boot if dump supported, active & present*/
+	unsigned long phyp_dump_configured;
+	unsigned long phyp_dump_is_active;
+	/* store cpu & hpte size */
+	unsigned long cpu_state_size;
+	unsigned long hpte_region_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:32:17.000000000 -0600
@@ -0,0 +1,71 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ *      This program is free software; you can redistribute it and/or
+ *      modify it under the terms of the GNU General Public License
+ *      as published by the Free Software Foundation; either version
+ *      2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/pfn.h>
+#include <linux/swap.h>
+
+#include <asm/page.h>
+#include <asm/phyp_dump.h>
+#include <asm/machdep.h>
+
+/* Global, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_global;
+struct phyp_dump *phyp_dump_info = &phyp_dump_global;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+	struct page *rpage;
+	unsigned long end_pfn;
+	long i;
+
+	end_pfn = start_pfn + nr_pages;
+
+	for (i = start_pfn; i <= end_pfn; i++) {
+		rpage = pfn_to_page(i);
+		if (PageReserved(rpage)) {
+			ClearPageReserved(rpage);
+			init_page_count(rpage);
+			__free_page(rpage);
+			totalram_pages++;
+		}
+	}
+}
+
+static int __init phyp_dump_setup(void)
+{
+	unsigned long start_pfn, nr_pages;
+
+	/* If no memory was reserved in early boot, there is nothing to do */
+	if (phyp_dump_info->init_reserve_size == 0)
+		return 0;
+
+	/* Release memory that was reserved in early boot */
+	start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+	nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
+	release_memory_range(start_pfn, nr_pages);
+
+	return 0;
+}
+machine_subsys_initcall(pseries, phyp_dump_setup);
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/Makefile	2008-02-18 03:22:06.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/Makefile	2008-02-18 03:23:47.000000000 -0600
@@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu
 obj-$(CONFIG_HVC_CONSOLE)	+= hvconsole.o
 obj-$(CONFIG_HVCS)		+= hvcserver.o
 obj-$(CONFIG_HCALL_STATS)	+= hvCall_inst.o
+obj-$(CONFIG_PHYP_DUMP)	+= phyp_dump.o
Index: 2.6.25-rc1/arch/powerpc/kernel/prom.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/kernel/prom.c	2008-02-18 03:22:06.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/kernel/prom.c	2008-02-18 03:23:47.000000000 -0600
@@ -51,6 +51,7 @@
 #include <asm/machdep.h>
 #include <asm/pSeries_reconfig.h>
 #include <asm/pci-bridge.h>
+#include <asm/phyp_dump.h>
 #include <asm/kexec.h>
 
 #ifdef DEBUG
@@ -1039,6 +1040,51 @@ static void __init early_reserve_mem(voi
 #endif
 }
 
+#ifdef CONFIG_PHYP_DUMP
+/**
+ * reserve_crashed_mem() - reserve all not-yet-dumped mmemory
+ *
+ * This routine may reserve memory regions in the kernel only
+ * if the system is supported and a dump was taken in last
+ * boot instance or if the hardware is supported and the
+ * scratch area needs to be setup. In other instances it returns
+ * without reserving anything. The memory in case of dump being
+ * active is freed when the dump is collected (by userland tools).
+ */
+static void __init reserve_crashed_mem(void)
+{
+	unsigned long base, size;
+	if (!phyp_dump_info->phyp_dump_configured) {
+		printk(KERN_ERR "Phyp-dump not supported on this hardware\n");
+		return;
+	}
+
+	if (phyp_dump_info->phyp_dump_is_active) {
+		/* Reserve *everything* above RMR.Area freed by userland tools*/
+		base = PHYP_DUMP_RMR_END;
+		size = lmb_end_of_DRAM() - base;
+
+		/* XXX crashed_ram_end is wrong, since it may be beyond
+		 * the memory_limit, it will need to be adjusted. */
+		lmb_reserve(base, size);
+
+		phyp_dump_info->init_reserve_start = base;
+		phyp_dump_info->init_reserve_size = size;
+	} else {
+		size = phyp_dump_info->cpu_state_size +
+			phyp_dump_info->hpte_region_size +
+			PHYP_DUMP_RMR_END;
+		base = lmb_end_of_DRAM() - size;
+		lmb_reserve(base, size);
+		phyp_dump_info->init_reserve_start = base;
+		phyp_dump_info->init_reserve_size = size;
+	}
+}
+#else
+static inline void __init reserve_crashed_mem(void) {}
+#endif /* CONFIG_PHYP_DUMP */
+
+
 void __init early_init_devtree(void *params)
 {
 	DBG(" -> early_init_devtree(%p)\n", params);
@@ -1050,6 +1096,8 @@ void __init early_init_devtree(void *par
 	/* Some machines might need RTAS info for debugging, grab it now. */
 	of_scan_flat_dt(early_init_dt_scan_rtas, NULL);
 #endif
+	/* scan tree to see if dump occured during last boot */
+	of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL);
 
 	/* Retrieve various informations from the /chosen node of the
 	 * device-tree, including the platform type, initrd location and
@@ -1071,6 +1119,7 @@ void __init early_init_devtree(void *par
 	reserve_kdump_trampoline();
 	reserve_crashkernel();
 	early_reserve_mem();
+	reserve_crashed_mem();
 
 	lmb_enforce_memory_limit(memory_limit);
 	lmb_analyze();
Index: 2.6.25-rc1/arch/powerpc/kernel/rtas.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/kernel/rtas.c	2008-02-18 03:22:06.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/kernel/rtas.c	2008-02-18 03:23:47.000000000 -0600
@@ -39,6 +39,7 @@
 #include <asm/syscalls.h>
 #include <asm/smp.h>
 #include <asm/atomic.h>
+#include <asm/phyp_dump.h>
 
 struct rtas_t rtas = {
 	.lock = SPIN_LOCK_UNLOCKED
@@ -883,6 +884,39 @@ void __init rtas_initialize(void)
 #endif
 }
 
+int __init early_init_dt_scan_phyp_dump(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+#ifdef CONFIG_PHYP_DUMP
+	const unsigned int *sizes;
+
+	phyp_dump_info->phyp_dump_configured = 0;
+	phyp_dump_info->phyp_dump_is_active = 0;
+
+	if (depth != 1 || strcmp(uname, "rtas") != 0)
+		return 0;
+
+	if (of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL))
+		phyp_dump_info->phyp_dump_configured++;
+
+	if (of_get_flat_dt_prop(node, "ibm,dump-kernel", NULL))
+		phyp_dump_info->phyp_dump_is_active++;
+
+	sizes = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
+									NULL);
+	if (!sizes)
+		return 0;
+
+	if (sizes[0] == 1)
+		phyp_dump_info->cpu_state_size = *((unsigned long *)&sizes[1]);
+
+	if (sizes[3] == 2)
+		phyp_dump_info->hpte_region_size =
+						*((unsigned long *)&sizes[4]);
+#endif
+	return 1;
+}
+
 int __init early_init_dt_scan_rtas(unsigned long node,
 		const char *uname, int depth, void *data)
 {
Index: 2.6.25-rc1/include/asm-powerpc/rtas.h
===================================================================
--- 2.6.25-rc1.orig/include/asm-powerpc/rtas.h	2008-02-18 03:22:06.000000000 -0600
+++ 2.6.25-rc1/include/asm-powerpc/rtas.h	2008-02-18 03:23:47.000000000 -0600
@@ -183,6 +183,9 @@ extern unsigned int rtas_busy_delay(int 
 
 extern int early_init_dt_scan_rtas(unsigned long node,
 		const char *uname, int depth, void *data);
+int early_init_dt_scan_phyp_dump(unsigned long node,
+		const char *uname, int depth, void *data);
+
 
 extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
  2008-02-18  4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
  2008-02-18  5:34 ` [PATCH 1/8] pseries: phyp dump: Documentation Manish Ahuja
  2008-02-18  5:36 ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
@ 2008-02-18  5:38 ` Manish Ahuja
  2008-02-18  5:40 ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-18  5:38 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja



Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space 
finishes dumping a section, it must release that memory
by writing to sysfs. For example,

  echo "0x40000000 0x10000000" > /sys/kernel/release_region

will release 256MB starting at the 1GB.  The released memory
becomes free for general use.

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>

------
 arch/powerpc/platforms/pseries/phyp_dump.c |   81 +++++++++++++++++++++++++++--
 1 file changed, 76 insertions(+), 5 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 03:23:47.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:32:13.000000000 -0600
@@ -12,18 +12,23 @@
  */
 
 #include <linux/init.h>
+#include <linux/kobject.h>
 #include <linux/mm.h>
+#include <linux/of.h>
 #include <linux/pfn.h>
 #include <linux/swap.h>
+#include <linux/sysfs.h>
 
 #include <asm/page.h>
 #include <asm/phyp_dump.h>
 #include <asm/machdep.h>
+#include <asm/rtas.h>
 
 /* Global, used to communicate data between early boot and late boot */
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = &phyp_dump_global;
 
+/* ------------------------------------------------- */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -53,18 +58,84 @@ release_memory_range(unsigned long start
 	}
 }
 
-static int __init phyp_dump_setup(void)
+/* ------------------------------------------------- */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ *   "echo <start addr> <length> > /sys/kernel/release_region"
+ *
+ * Example:
+ *   "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t store_release_region(struct kobject *kobj,
+				struct kobj_attribute *attr,
+				const char *buf, size_t count)
 {
+	unsigned long start_addr, length, end_addr;
 	unsigned long start_pfn, nr_pages;
+	ssize_t ret;
+
+	ret = sscanf(buf, "%lx %lx", &start_addr, &length);
+	if (ret != 2)
+		return -EINVAL;
+
+	/* Range-check - don't free any reserved memory that
+	 * wasn't reserved for phyp-dump */
+	if (start_addr < phyp_dump_info->init_reserve_start)
+		start_addr = phyp_dump_info->init_reserve_start;
+
+	end_addr = phyp_dump_info->init_reserve_start +
+			phyp_dump_info->init_reserve_size;
+	if (start_addr+length > end_addr)
+		length = end_addr - start_addr;
+
+	/* Release the region of memory assed in by user */
+	start_pfn = PFN_DOWN(start_addr);
+	nr_pages = PFN_DOWN(length);
+	release_memory_range(start_pfn, nr_pages);
+
+	return count;
+}
+
+static struct kobj_attribute rr = __ATTR(release_region, 0600,
+					 NULL, store_release_region);
+
+static int __init phyp_dump_setup(void)
+{
+	struct device_node *rtas;
+	const int *dump_header = NULL;
+	int header_len = 0;
+	int rc;
 
 	/* If no memory was reserved in early boot, there is nothing to do */
 	if (phyp_dump_info->init_reserve_size == 0)
 		return 0;
 
-	/* Release memory that was reserved in early boot */
-	start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
-	nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
-	release_memory_range(start_pfn, nr_pages);
+	/* Return if phyp dump not supported */
+	if (!phyp_dump_info->phyp_dump_configured)
+		return -ENOSYS;
+
+	/* Is there dump data waiting for us? */
+	rtas = of_find_node_by_path("/rtas");
+	if (rtas) {
+		dump_header = of_get_property(rtas, "ibm,kernel-dump",
+								&header_len);
+		of_node_put(rtas);
+	}
+
+	if (dump_header == NULL)
+		return 0;
+
+	/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+	rc = sysfs_create_file(kernel_kobj, &rr.attr);
+	if (rc) {
+		printk(KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n",
+									rc);
+		return 0;
+	}
 
 	return 0;
 }

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 4/8] pseries: phyp dump: register dump area.
  2008-02-18  4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
                   ` (2 preceding siblings ...)
  2008-02-18  5:38 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
@ 2008-02-18  5:40 ` Manish Ahuja
  2008-02-18  5:41 ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-18  5:40 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja


Set up the actual dump header, register it with the hypervisor.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>

------
 arch/powerpc/platforms/pseries/phyp_dump.c |  137 +++++++++++++++++++++++++++--
 1 file changed, 131 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 03:26:56.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:30:28.000000000 -0600
@@ -28,6 +28,117 @@
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = &phyp_dump_global;
 
+static int ibm_configure_kernel_dump;
+/* ------------------------------------------------- */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+	u32 dump_flags;
+	u16 source_type;
+	u16 error_flags;
+	u64 source_address;
+	u64 source_length;
+	u64 length_copied;
+	u64 destination_address;
+};
+
+struct phyp_dump_header {
+	u32 version;
+	u16 num_of_sections;
+	u16 status;
+
+	u32 first_offset_section;
+	u32 dump_disk_section;
+	u64 block_num_dd;
+	u64 num_of_blocks_dd;
+	u32 offset_dd;
+	u32 maxtime_to_auto;
+	/* No dump disk path string used */
+
+	struct dump_section cpu_data;
+	struct dump_section hpte_data;
+	struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO  0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+	unsigned long addr_offset = 0;
+
+	/* Set up the dump header */
+	ph->version = DUMP_HEADER_VERSION;
+	ph->num_of_sections = NUM_DUMP_SECTIONS;
+	ph->status = 0;
+
+	ph->first_offset_section =
+		(u32)offsetof(struct phyp_dump_header, cpu_data);
+	ph->dump_disk_section = 0;
+	ph->block_num_dd = 0;
+	ph->num_of_blocks_dd = 0;
+	ph->offset_dd = 0;
+
+	ph->maxtime_to_auto = 0; /* disabled */
+
+	/* The first two sections are mandatory */
+	ph->cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->cpu_data.source_type = DUMP_SOURCE_CPU;
+	ph->cpu_data.source_address = 0;
+	ph->cpu_data.source_length = phyp_dump_info->cpu_state_size;
+	ph->cpu_data.destination_address = addr_offset;
+	addr_offset += phyp_dump_info->cpu_state_size;
+
+	ph->hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->hpte_data.source_type = DUMP_SOURCE_HPTE;
+	ph->hpte_data.source_address = 0;
+	ph->hpte_data.source_length = phyp_dump_info->hpte_region_size;
+	ph->hpte_data.destination_address = addr_offset;
+	addr_offset += phyp_dump_info->hpte_region_size;
+
+	/* This section describes the low kernel region */
+	ph->kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->kernel_data.source_type = DUMP_SOURCE_RMO;
+	ph->kernel_data.source_address = PHYP_DUMP_RMR_START;
+	ph->kernel_data.source_length = PHYP_DUMP_RMR_END;
+	ph->kernel_data.destination_address = addr_offset;
+	addr_offset += ph->kernel_data.source_length;
+
+	return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+	int rc;
+	ph->cpu_data.destination_address += addr;
+	ph->hpte_data.destination_address += addr;
+	ph->kernel_data.destination_address += addr;
+
+	do {
+		rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+				1, ph, sizeof(struct phyp_dump_header));
+	} while (rtas_busy_delay(rc));
+
+	if (rc)
+		printk(KERN_ERR "phyp-dump: unexpected error (%d) on "
+						"register\n", rc);
+}
+
 /* ------------------------------------------------- */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -106,7 +217,9 @@ static struct kobj_attribute rr = __ATTR
 static int __init phyp_dump_setup(void)
 {
 	struct device_node *rtas;
-	const int *dump_header = NULL;
+ 	const struct phyp_dump_header *dump_header = NULL;
+ 	unsigned long dump_area_start;
+ 	unsigned long dump_area_length;
 	int header_len = 0;
 	int rc;
 
@@ -118,7 +231,13 @@ static int __init phyp_dump_setup(void)
 	if (!phyp_dump_info->phyp_dump_configured)
 		return -ENOSYS;
 
-	/* Is there dump data waiting for us? */
+	/* Is there dump data waiting for us? If there isn't,
+	 * then register a new dump area, and release all of
+	 * the rest of the reserved ram.
+	 *
+	 * The /rtas/ibm,kernel-dump rtas node is present only
+	 * if there is dump data waiting for us.
+	 */
 	rtas = of_find_node_by_path("/rtas");
 	if (rtas) {
 		dump_header = of_get_property(rtas, "ibm,kernel-dump",
@@ -126,17 +245,23 @@ static int __init phyp_dump_setup(void)
 		of_node_put(rtas);
 	}
 
-	if (dump_header == NULL)
+	dump_area_length = init_dump_header(&phdr);
+
+	/* align down */
+	dump_area_start = phyp_dump_info->init_reserve_start & PAGE_MASK;
+
+	if (dump_header == NULL) {
+		register_dump_area(&phdr, dump_area_start);
 		return 0;
+	}
 
 	/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
 	rc = sysfs_create_file(kernel_kobj, &rr.attr);
-	if (rc) {
+	if (rc)
 		printk(KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n",
 									rc);
-		return 0;
-	}
 
+	/* ToDo: re-register the dump area, for next time. */
 	return 0;
 }
 machine_subsys_initcall(pseries, phyp_dump_setup);

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 5/8] pseries: phyp dump: debugging print routines.
  2008-02-18  4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
                   ` (3 preceding siblings ...)
  2008-02-18  5:40 ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
@ 2008-02-18  5:41 ` Manish Ahuja
  2008-02-18  5:42 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-18  5:41 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja


Provide some basic debugging support.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
-----

 arch/powerpc/platforms/pseries/phyp_dump.c |   61 ++++++++++++++++++++++++++++-
 1 file changed, 59 insertions(+), 2 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 03:30:53.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:25:19.000000000 -0600
@@ -122,6 +122,61 @@ static unsigned long init_dump_header(st
 	return addr_offset;
 }
 
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+#ifdef DEBUG
+	printk(KERN_INFO "dump header:\n");
+	/* setup some ph->sections required */
+	printk(KERN_INFO "version = %d\n", ph->version);
+	printk(KERN_INFO "Sections = %d\n", ph->num_of_sections);
+	printk(KERN_INFO "Status = 0x%x\n", ph->status);
+
+	/* No ph->disk, so all should be set to 0 */
+	printk(KERN_INFO "Offset to first section 0x%x\n",
+						ph->first_offset_section);
+	printk(KERN_INFO "dump disk sections should be zero\n");
+	printk(KERN_INFO "dump disk section = %d\n", ph->dump_disk_section);
+	printk(KERN_INFO "block num = %ld\n", ph->block_num_dd);
+	printk(KERN_INFO "number of blocks = %ld\n", ph->num_of_blocks_dd);
+	printk(KERN_INFO "dump disk offset = %d\n", ph->offset_dd);
+	printk(KERN_INFO "Max auto time= %d\n", ph->maxtime_to_auto);
+
+	/*set cpu state and hpte states as well scratch pad area */
+	printk(KERN_INFO " CPU AREA \n");
+	printk(KERN_INFO "cpu dump_flags =%d\n", ph->cpu_data.dump_flags);
+	printk(KERN_INFO "cpu source_type =%d\n", ph->cpu_data.source_type);
+	printk(KERN_INFO "cpu error_flags =%d\n", ph->cpu_data.error_flags);
+	printk(KERN_INFO "cpu source_address =%lx\n",
+						ph->cpu_data.source_address);
+	printk(KERN_INFO "cpu source_length =%lx\n",
+						ph->cpu_data.source_length);
+	printk(KERN_INFO "cpu length_copied =%lx\n",
+						ph->cpu_data.length_copied);
+
+	printk(KERN_INFO " HPTE AREA \n");
+	printk(KERN_INFO "HPTE dump_flags =%d\n", ph->hpte_data.dump_flags);
+	printk(KERN_INFO "HPTE source_type =%d\n", ph->hpte_data.source_type);
+	printk(KERN_INFO "HPTE error_flags =%d\n", ph->hpte_data.error_flags);
+	printk(KERN_INFO "HPTE source_address =%lx\n",
+						ph->hpte_data.source_address);
+	printk(KERN_INFO "HPTE source_length =%lx\n",
+						ph->hpte_data.source_length);
+	printk(KERN_INFO "HPTE length_copied =%lx\n",
+						ph->hpte_data.length_copied);
+
+	printk(KERN_INFO " SRSD AREA \n");
+	printk(KERN_INFO "SRSD dump_flags =%d\n", ph->kernel_data.dump_flags);
+	printk(KERN_INFO "SRSD source_type =%d\n", ph->kernel_data.source_type);
+	printk(KERN_INFO "SRSD error_flags =%d\n", ph->kernel_data.error_flags);
+	printk(KERN_INFO "SRSD source_address =%lx\n",
+						ph->kernel_data.source_address);
+	printk(KERN_INFO "SRSD source_length =%lx\n",
+						ph->kernel_data.source_length);
+	printk(KERN_INFO "SRSD length_copied =%lx\n",
+						ph->kernel_data.length_copied);
+#endif
+}
+
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
 	int rc;
@@ -134,9 +189,11 @@ static void register_dump_area(struct ph
 				1, ph, sizeof(struct phyp_dump_header));
 	} while (rtas_busy_delay(rc));
 
-	if (rc)
+	if (rc) {
 		printk(KERN_ERR "phyp-dump: unexpected error (%d) on "
 						"register\n", rc);
+		print_dump_header(ph);
+	}
 }
 
 /* ------------------------------------------------- */
@@ -245,8 +302,8 @@ static int __init phyp_dump_setup(void)
 		of_node_put(rtas);
 	}
 
+	print_dump_header(dump_header);
 	dump_area_length = init_dump_header(&phdr);
-
 	/* align down */
 	dump_area_start = phyp_dump_info->init_reserve_start & PAGE_MASK;
 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.
  2008-02-18  4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
                   ` (4 preceding siblings ...)
  2008-02-18  5:41 ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja
@ 2008-02-18  5:42 ` Manish Ahuja
  2008-02-18  5:44 ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-18  5:42 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja


Routines to 
a. invalidate dump 
b. Calculate region that is reserved and needs to be freed. This is 
   exported through sysfs interface.

Unregister has been removed for now as it wasn't being used.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
-----

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   83 ++++++++++++++++++++++++++---
 include/asm-powerpc/phyp_dump.h            |    3 +
 2 files changed, 80 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:25:19.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:25:32.000000000 -0600
@@ -69,6 +69,10 @@ static struct phyp_dump_header phdr;
 #define DUMP_SOURCE_CPU 0x0001
 #define DUMP_SOURCE_HPTE 0x0002
 #define DUMP_SOURCE_RMO  0x0011
+#define DUMP_ERROR_FLAG 0x2000
+#define DUMP_TRIGGERED 0x4000
+#define DUMP_PERFORMED 0x8000
+
 
 /**
  * init_dump_header() - initialize the header declaring a dump
@@ -180,9 +184,15 @@ static void print_dump_header(const stru
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
 	int rc;
-	ph->cpu_data.destination_address += addr;
-	ph->hpte_data.destination_address += addr;
-	ph->kernel_data.destination_address += addr;
+
+	/* Add addr value if not initialized before */
+	if (ph->cpu_data.destination_address == 0) {
+		ph->cpu_data.destination_address += addr;
+		ph->hpte_data.destination_address += addr;
+		ph->kernel_data.destination_address += addr;
+	}
+
+	/* ToDo Invalidate kdump and free memory range. */
 
 	do {
 		rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
@@ -196,6 +206,30 @@ static void register_dump_area(struct ph
 	}
 }
 
+static
+void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr)
+{
+	int rc;
+
+	/* Add addr value if not initialized before */
+	if (ph->cpu_data.destination_address == 0) {
+		ph->cpu_data.destination_address += addr;
+		ph->hpte_data.destination_address += addr;
+		ph->kernel_data.destination_address += addr;
+	}
+
+	do {
+		rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+				2, ph, sizeof(struct phyp_dump_header));
+	} while (rtas_busy_delay(rc));
+
+	if (rc) {
+		printk(KERN_ERR "phyp-dump: unexpected error (%d) "
+						"on invalidate\n", rc);
+		print_dump_header(ph);
+	}
+}
+
 /* ------------------------------------------------- */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -206,8 +240,8 @@ static void register_dump_area(struct ph
  * lmb_reserved in early boot. The released memory becomes
  * available for genreal use.
  */
-static void
-release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+static
+void release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
 {
 	struct page *rpage;
 	unsigned long end_pfn;
@@ -268,8 +302,29 @@ static ssize_t store_release_region(stru
 	return count;
 }
 
+static ssize_t show_release_region(struct kobject *kobj,
+			struct kobj_attribute *attr, char *buf)
+{
+	u64 second_addr_range;
+
+	/* total reserved size - start of scratch area */
+	second_addr_range = phyp_dump_info->init_reserve_size -
+				phyp_dump_info->reserved_scratch_size;
+	return sprintf(buf, "CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:"
+			    " DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n",
+		phdr.cpu_data.destination_address,
+		phdr.cpu_data.length_copied,
+		phdr.hpte_data.destination_address,
+		phdr.hpte_data.length_copied,
+		phdr.kernel_data.destination_address,
+		phdr.kernel_data.length_copied,
+		phyp_dump_info->init_reserve_start,
+		second_addr_range);
+}
+
 static struct kobj_attribute rr = __ATTR(release_region, 0600,
-					 NULL, store_release_region);
+					show_release_region,
+					store_release_region);
 
 static int __init phyp_dump_setup(void)
 {
@@ -312,6 +367,22 @@ static int __init phyp_dump_setup(void)
 		return 0;
 	}
 
+	/* re-register the dump area, if old dump was invalid */
+	if ((dump_header) && (dump_header->status & DUMP_ERROR_FLAG)) {
+		invalidate_last_dump(&phdr, dump_area_start);
+		register_dump_area(&phdr, dump_area_start);
+		return 0;
+	}
+
+	if (dump_header) {
+		phyp_dump_info->reserved_scratch_addr =
+				dump_header->cpu_data.destination_address;
+		phyp_dump_info->reserved_scratch_size =
+				dump_header->cpu_data.source_length +
+				dump_header->hpte_data.source_length +
+				dump_header->kernel_data.source_length;
+	}
+
 	/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
 	rc = sysfs_create_file(kernel_kobj, &rr.attr);
 	if (rc)
Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===================================================================
--- 2.6.25-rc1.orig/include/asm-powerpc/phyp_dump.h	2008-02-18 04:24:16.000000000 -0600
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h	2008-02-18 04:25:32.000000000 -0600
@@ -30,6 +30,9 @@ struct phyp_dump {
 	/* store cpu & hpte size */
 	unsigned long cpu_state_size;
 	unsigned long hpte_region_size;
+	/* previous scratch area values */
+	unsigned long reserved_scratch_addr;
+	unsigned long reserved_scratch_size;
 };
 
 extern struct phyp_dump *phyp_dump_info;

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 7/8] pseries: phyp dump: Tracking memory range freed.
  2008-02-18  4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
                   ` (5 preceding siblings ...)
  2008-02-18  5:42 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
@ 2008-02-18  5:44 ` Manish Ahuja
  2008-02-18  5:45 ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
  2008-02-22  0:53 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Michael Ellerman
  8 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-18  5:44 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja


This patch tracks the size freed. For now it does a simple
rudimentary calculation of the ranges freed. The idea is
to keep it simple at the external shell script level and 
send in large chunks for now.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
-----

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   35 +++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 03:31:22.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 03:31:30.000000000 -0600
@@ -260,6 +260,39 @@ void release_memory_range(unsigned long 
 	}
 }
 
+/**
+ * track_freed_range -- Counts the range being freed.
+ * Once the counter goes to zero, it re-registers dump for
+ * future use.
+ */
+static void
+track_freed_range(unsigned long addr, unsigned long length)
+{
+	static unsigned long scratch_area_size, reserved_area_size;
+
+	if (addr < phyp_dump_info->init_reserve_start)
+		return;
+
+	if ((addr >= phyp_dump_info->init_reserve_start) &&
+	    (addr <= phyp_dump_info->init_reserve_start +
+	     phyp_dump_info->init_reserve_size))
+		reserved_area_size += length;
+
+	if ((addr >= phyp_dump_info->reserved_scratch_addr) &&
+	    (addr <= phyp_dump_info->reserved_scratch_addr +
+	     phyp_dump_info->reserved_scratch_size))
+		scratch_area_size += length;
+
+	if ((reserved_area_size == phyp_dump_info->init_reserve_size) &&
+	    (scratch_area_size == phyp_dump_info->reserved_scratch_size)) {
+
+		invalidate_last_dump(&phdr,
+				phyp_dump_info->reserved_scratch_addr);
+		register_dump_area(&phdr,
+				phyp_dump_info->reserved_scratch_addr);
+	}
+}
+
 /* ------------------------------------------------- */
 /**
  * sysfs_release_region -- sysfs interface to release memory range.
@@ -284,6 +317,8 @@ static ssize_t store_release_region(stru
 	if (ret != 2)
 		return -EINVAL;
 
+	track_freed_range(start_addr, length);
+
 	/* Range-check - don't free any reserved memory that
 	 * wasn't reserved for phyp-dump */
 	if (start_addr < phyp_dump_info->init_reserve_start)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 8/8] pseries: phyp dump: config file
  2008-02-18  4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
                   ` (6 preceding siblings ...)
  2008-02-18  5:44 ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
@ 2008-02-18  5:45 ` Manish Ahuja
  2008-02-22  0:53 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Michael Ellerman
  8 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-18  5:45 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja



Add hypervisor-assisted dump to kernel config

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>

-----
 arch/powerpc/Kconfig |   11 +++++++++++
 1 file changed, 11 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/Kconfig
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/Kconfig	2008-02-18 03:22:06.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/Kconfig	2008-02-18 03:22:45.000000000 -0600
@@ -306,6 +306,17 @@ config CRASH_DUMP
 
 	  Don't change this unless you know what you are doing.
 
+config PHYP_DUMP
+	bool "Hypervisor-assisted dump (EXPERIMENTAL)"
+	depends on PPC_PSERIES && EXPERIMENTAL
+	default y
+	help
+	  Hypervisor-assisted dump is meant to be a kdump replacement
+	  offering robustness and speed not possible without system
+	  hypervisor assistence.
+
+	  If unsure, say "Y"
+
 config PPCBUG_NVRAM
 	bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC
 	default y if PPC_PREP

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
  2008-02-18  4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
                   ` (7 preceding siblings ...)
  2008-02-18  5:45 ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
@ 2008-02-22  0:53 ` Michael Ellerman
  2008-02-28 23:57   ` Manish Ahuja
  8 siblings, 1 reply; 59+ messages in thread
From: Michael Ellerman @ 2008-02-22  0:53 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: ppc-dev, Linas Vepstas, paulus

[-- Attachment #1: Type: text/plain, Size: 1244 bytes --]

On Sun, 2008-02-17 at 22:53 -0600, Manish Ahuja wrote:
> The following series of patches implement a basic framework
> for hypervisor-assisted dump. The very first patch provides 
> documentation explaining what this is  :-) . Yes, its supposed
> to be an improvement over kdump.
> 
> A list of open issues / todo list is included in the documentation.
> It also appears that the not-yet-released firmware versions this was tested 
> on are still, ahem, incomplete; this work is also pending.
> 
> I have included most of the changes requested. Although, I did find
> one or two, fixed in a later patch file rather than the first location
> they appeared at.

This series still doesn't build on !CONFIG_RTAS configs:
http://kisskb.ellerman.id.au/kisskb/head/629/

This solution is to move early_init_dt_scan_phyp_dump() into
arch/powerpc/platforms/pseries/phyp_dump.c and provide a dummy
implementation in asm-powerpc/phyp_dump.c for the !CONFIG_PHYP_DUMP
case.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
  2008-02-22  0:53 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Michael Ellerman
@ 2008-02-28 23:57   ` Manish Ahuja
  2008-02-29  0:22     ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
                       ` (9 more replies)
  0 siblings, 10 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-28 23:57 UTC (permalink / raw)
  To: michael; +Cc: ppc-dev, Linas Vepstas, paulus

Changes from previous version:

The only changes are in patch 2.
moved early_init_dt_scan_phyp_dump from rtas.c to phyp_dump.c
Added dummy function in phyp_dump.h

Patch 3 required repatching due to changes to patch 2.
Resubmitting all patches to avoid confusion.

Thanks,
Manish


Michael Ellerman wrote:
> On Sun, 2008-02-17 at 22:53 -0600, Manish Ahuja wrote:
>> The following series of patches implement a basic framework
>> for hypervisor-assisted dump. The very first patch provides 
>> documentation explaining what this is  :-) . Yes, its supposed
>> to be an improvement over kdump.
>>
>> A list of open issues / todo list is included in the documentation.
>> It also appears that the not-yet-released firmware versions this was tested 
>> on are still, ahem, incomplete; this work is also pending.
>>
>> I have included most of the changes requested. Although, I did find
>> one or two, fixed in a later patch file rather than the first location
>> they appeared at.
> 
> This series still doesn't build on !CONFIG_RTAS configs:
> http://kisskb.ellerman.id.au/kisskb/head/629/
> 
> This solution is to move early_init_dt_scan_phyp_dump() into
> arch/powerpc/platforms/pseries/phyp_dump.c and provide a dummy
> implementation in asm-powerpc/phyp_dump.c for the !CONFIG_PHYP_DUMP
> case.
> 
> cheers
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 1/8] pseries: phyp dump: Docmentation
  2008-02-28 23:57   ` Manish Ahuja
@ 2008-02-29  0:22     ` Manish Ahuja
  2008-02-29  0:24     ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-29  0:22 UTC (permalink / raw)
  To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas


Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>

----
 Documentation/powerpc/phyp-assisted-dump.txt |  127 +++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

Index: 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt	2008-02-18 03:22:33.000000000 -0600
@@ -0,0 +1,127 @@
+
+                   Hypervisor-Assisted Dump
+                   ------------------------
+                       November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+   and read /proc/vmcore to obtain the contents of memory,
+   which holds the previous crashed kernel. The userspace
+   tools may copy this info to disk, or network, nas, san,
+   iscsi, etc. as desired.
+
+   For Example: the values in /sys/kernel/release-region
+   would look something like this (address-range pairs).
+   CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
+   DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+     "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+
+During boot, a check is made to see if firmware supports
+this feature on this particular machine. If it does, then
+we check to see if a active dump is waiting for us. If yes
+then everything but 256 MB of RAM is reserved during early
+boot. This area is released once we collect a dump from user
+land scripts that are run. If there is dump data, then
+the /sys/kernel/release_region file is created, and
+the reserved memory is held.
+
+If there is no waiting dump data, then only the highest
+256MB of the ram is reserved as a scratch area. This area
+is *not* be released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy
+of the low 256MB in the case a crash does occur. See,
+however, "open issues" below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+General notes:
+--------------
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+------------
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead? There is a dump_subsys being created by the s390 code,
+   perhaps the pseries code should use a similar layout as well.
+
+ o Is reserving a 256MB region really required? The goal of
+   reserving a 256MB scratch area is to make sure that no
+   important crash data is clobbered when the hypervisor
+   save low mem to the scratch area. But, if one could assure
+   that nothing important is located in some 256MB area, then
+   it would not need to be reserved. Something that can be
+   improved in subsequent versions.
+
+ o Still working the kdump team to integrate this with kdump,
+   some work remains but this would not affect the current
+   patches.
+
+ o Still need to write a shell script, to copy the dump away.
+   Currently I am parsing it manually.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
  2008-02-28 23:57   ` Manish Ahuja
  2008-02-29  0:22     ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
@ 2008-02-29  0:24     ` Manish Ahuja
  2008-03-11  1:02       ` Michael Ellerman
  2008-03-11  6:12       ` Paul Mackerras
  2008-02-29  0:27     ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
                       ` (7 subsequent siblings)
  9 siblings, 2 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-29  0:24 UTC (permalink / raw)
  To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas


Initial patch for reserving memory in early boot, and freeing it later.
If the previous boot had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>

----
 arch/powerpc/kernel/prom.c                 |   49 +++++++++++++
 arch/powerpc/platforms/pseries/Makefile    |    1 
 arch/powerpc/platforms/pseries/phyp_dump.c |  105 +++++++++++++++++++++++++++++
 include/asm-powerpc/phyp_dump.h            |   44 ++++++++++++
 4 files changed, 199 insertions(+)

Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h	2008-02-28 22:05:25.000000000 -0600
@@ -0,0 +1,44 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ *      This program is free software; you can redistribute it and/or
+ *      modify it under the terms of the GNU General Public License
+ *      as published by the Free Software Foundation; either version
+ *      2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END   (1UL<<28)
+
+struct phyp_dump {
+	/* Memory that is reserved during very early boot. */
+	unsigned long init_reserve_start;
+	unsigned long init_reserve_size;
+	/* Check status during boot if dump supported, active & present*/
+	unsigned long phyp_dump_configured;
+	unsigned long phyp_dump_is_active;
+	/* store cpu & hpte size */
+	unsigned long cpu_state_size;
+	unsigned long hpte_region_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+int early_init_dt_scan_phyp_dump(unsigned long node,
+		const char *uname, int depth, void *data);
+#else /* CONFIG_PHYP_DUMP */
+int early_init_dt_scan_phyp_dump(unsigned long node,
+		const char *uname, int depth, void *data) { return 0; }
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 21:57:52.000000000 -0600
@@ -0,0 +1,105 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ *      This program is free software; you can redistribute it and/or
+ *      modify it under the terms of the GNU General Public License
+ *      as published by the Free Software Foundation; either version
+ *      2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/pfn.h>
+#include <linux/swap.h>
+
+#include <asm/page.h>
+#include <asm/phyp_dump.h>
+#include <asm/machdep.h>
+#include <asm/prom.h>
+
+/* Global, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_global;
+struct phyp_dump *phyp_dump_info = &phyp_dump_global;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+	struct page *rpage;
+	unsigned long end_pfn;
+	long i;
+
+	end_pfn = start_pfn + nr_pages;
+
+	for (i = start_pfn; i <= end_pfn; i++) {
+		rpage = pfn_to_page(i);
+		if (PageReserved(rpage)) {
+			ClearPageReserved(rpage);
+			init_page_count(rpage);
+			__free_page(rpage);
+			totalram_pages++;
+		}
+	}
+}
+
+static int __init phyp_dump_setup(void)
+{
+	unsigned long start_pfn, nr_pages;
+
+	/* If no memory was reserved in early boot, there is nothing to do */
+	if (phyp_dump_info->init_reserve_size == 0)
+		return 0;
+
+	/* Release memory that was reserved in early boot */
+	start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+	nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
+	release_memory_range(start_pfn, nr_pages);
+
+	return 0;
+}
+machine_subsys_initcall(pseries, phyp_dump_setup);
+
+int __init early_init_dt_scan_phyp_dump(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+#ifdef CONFIG_PHYP_DUMP
+	const unsigned int *sizes;
+
+	phyp_dump_info->phyp_dump_configured = 0;
+	phyp_dump_info->phyp_dump_is_active = 0;
+
+	if (depth != 1 || strcmp(uname, "rtas") != 0)
+		return 0;
+
+	if (of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL))
+		phyp_dump_info->phyp_dump_configured++;
+
+	if (of_get_flat_dt_prop(node, "ibm,dump-kernel", NULL))
+		phyp_dump_info->phyp_dump_is_active++;
+
+	sizes = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
+									NULL);
+	if (!sizes)
+		return 0;
+
+	if (sizes[0] == 1)
+		phyp_dump_info->cpu_state_size = *((unsigned long *)&sizes[1]);
+
+	if (sizes[3] == 2)
+		phyp_dump_info->hpte_region_size =
+						*((unsigned long *)&sizes[4]);
+#endif
+	return 1;
+}
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/Makefile	2008-02-28 21:54:57.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/Makefile	2008-02-28 21:55:27.000000000 -0600
@@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu
 obj-$(CONFIG_HVC_CONSOLE)	+= hvconsole.o
 obj-$(CONFIG_HVCS)		+= hvcserver.o
 obj-$(CONFIG_HCALL_STATS)	+= hvCall_inst.o
+obj-$(CONFIG_PHYP_DUMP)	+= phyp_dump.o
Index: 2.6.25-rc1/arch/powerpc/kernel/prom.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/kernel/prom.c	2008-02-28 21:54:57.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/kernel/prom.c	2008-02-28 21:55:27.000000000 -0600
@@ -51,6 +51,7 @@
 #include <asm/machdep.h>
 #include <asm/pSeries_reconfig.h>
 #include <asm/pci-bridge.h>
+#include <asm/phyp_dump.h>
 #include <asm/kexec.h>
 
 #ifdef DEBUG
@@ -1039,6 +1040,51 @@ static void __init early_reserve_mem(voi
 #endif
 }
 
+#ifdef CONFIG_PHYP_DUMP
+/**
+ * reserve_crashed_mem() - reserve all not-yet-dumped mmemory
+ *
+ * This routine may reserve memory regions in the kernel only
+ * if the system is supported and a dump was taken in last
+ * boot instance or if the hardware is supported and the
+ * scratch area needs to be setup. In other instances it returns
+ * without reserving anything. The memory in case of dump being
+ * active is freed when the dump is collected (by userland tools).
+ */
+static void __init reserve_crashed_mem(void)
+{
+	unsigned long base, size;
+	if (!phyp_dump_info->phyp_dump_configured) {
+		printk(KERN_ERR "Phyp-dump not supported on this hardware\n");
+		return;
+	}
+
+	if (phyp_dump_info->phyp_dump_is_active) {
+		/* Reserve *everything* above RMR.Area freed by userland tools*/
+		base = PHYP_DUMP_RMR_END;
+		size = lmb_end_of_DRAM() - base;
+
+		/* XXX crashed_ram_end is wrong, since it may be beyond
+		 * the memory_limit, it will need to be adjusted. */
+		lmb_reserve(base, size);
+
+		phyp_dump_info->init_reserve_start = base;
+		phyp_dump_info->init_reserve_size = size;
+	} else {
+		size = phyp_dump_info->cpu_state_size +
+			phyp_dump_info->hpte_region_size +
+			PHYP_DUMP_RMR_END;
+		base = lmb_end_of_DRAM() - size;
+		lmb_reserve(base, size);
+		phyp_dump_info->init_reserve_start = base;
+		phyp_dump_info->init_reserve_size = size;
+	}
+}
+#else
+static inline void __init reserve_crashed_mem(void) {}
+#endif /* CONFIG_PHYP_DUMP  && CONFIG_PPC_RTAS */
+
+
 void __init early_init_devtree(void *params)
 {
 	DBG(" -> early_init_devtree(%p)\n", params);
@@ -1050,6 +1096,8 @@ void __init early_init_devtree(void *par
 	/* Some machines might need RTAS info for debugging, grab it now. */
 	of_scan_flat_dt(early_init_dt_scan_rtas, NULL);
 #endif
+	/* scan tree to see if dump occured during last boot */
+	of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL);
 
 	/* Retrieve various informations from the /chosen node of the
 	 * device-tree, including the platform type, initrd location and
@@ -1071,6 +1119,7 @@ void __init early_init_devtree(void *par
 	reserve_kdump_trampoline();
 	reserve_crashkernel();
 	early_reserve_mem();
+	reserve_crashed_mem();
 
 	lmb_enforce_memory_limit(memory_limit);
 	lmb_analyze();

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
  2008-02-28 23:57   ` Manish Ahuja
  2008-02-29  0:22     ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
  2008-02-29  0:24     ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
@ 2008-02-29  0:27     ` Manish Ahuja
  2008-03-11  6:16       ` Paul Mackerras
  2008-02-29  0:29     ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
                       ` (6 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Manish Ahuja @ 2008-02-29  0:27 UTC (permalink / raw)
  To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas


Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space 
finishes dumping a section, it must release that memory
by writing to sysfs. For example,

  echo "0x40000000 0x10000000" > /sys/kernel/release_region

will release 256MB starting at the 1GB.  The released memory
becomes free for general use.

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>

------
 arch/powerpc/platforms/pseries/phyp_dump.c |   82 +++++++++++++++++++++++++++--
 1 file changed, 77 insertions(+), 5 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 21:57:52.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 23:36:01.000000000 -0600
@@ -12,19 +12,25 @@
  */
 
 #include <linux/init.h>
+#include <linux/kobject.h>
 #include <linux/mm.h>
+#include <linux/of.h>
 #include <linux/pfn.h>
 #include <linux/swap.h>
+#include <linux/sysfs.h>
 
 #include <asm/page.h>
 #include <asm/phyp_dump.h>
 #include <asm/machdep.h>
 #include <asm/prom.h>
+#include <asm/rtas.h>
+
 
 /* Global, used to communicate data between early boot and late boot */
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = &phyp_dump_global;
 
+/* ------------------------------------------------- */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -54,18 +60,84 @@ release_memory_range(unsigned long start
 	}
 }
 
-static int __init phyp_dump_setup(void)
+/* ------------------------------------------------- */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ *   "echo <start addr> <length> > /sys/kernel/release_region"
+ *
+ * Example:
+ *   "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t store_release_region(struct kobject *kobj,
+				struct kobj_attribute *attr,
+				const char *buf, size_t count)
 {
+	unsigned long start_addr, length, end_addr;
 	unsigned long start_pfn, nr_pages;
+	ssize_t ret;
+
+	ret = sscanf(buf, "%lx %lx", &start_addr, &length);
+	if (ret != 2)
+		return -EINVAL;
+
+	/* Range-check - don't free any reserved memory that
+	 * wasn't reserved for phyp-dump */
+	if (start_addr < phyp_dump_info->init_reserve_start)
+		start_addr = phyp_dump_info->init_reserve_start;
+
+	end_addr = phyp_dump_info->init_reserve_start +
+			phyp_dump_info->init_reserve_size;
+	if (start_addr+length > end_addr)
+		length = end_addr - start_addr;
+
+	/* Release the region of memory assed in by user */
+	start_pfn = PFN_DOWN(start_addr);
+	nr_pages = PFN_DOWN(length);
+	release_memory_range(start_pfn, nr_pages);
+
+	return count;
+}
+
+static struct kobj_attribute rr = __ATTR(release_region, 0600,
+					 NULL, store_release_region);
+
+static int __init phyp_dump_setup(void)
+{
+	struct device_node *rtas;
+	const int *dump_header = NULL;
+	int header_len = 0;
+	int rc;
 
 	/* If no memory was reserved in early boot, there is nothing to do */
 	if (phyp_dump_info->init_reserve_size == 0)
 		return 0;
 
-	/* Release memory that was reserved in early boot */
-	start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
-	nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
-	release_memory_range(start_pfn, nr_pages);
+	/* Return if phyp dump not supported */
+	if (!phyp_dump_info->phyp_dump_configured)
+		return -ENOSYS;
+
+	/* Is there dump data waiting for us? */
+	rtas = of_find_node_by_path("/rtas");
+	if (rtas) {
+		dump_header = of_get_property(rtas, "ibm,kernel-dump",
+								&header_len);
+		of_node_put(rtas);
+	}
+
+	if (dump_header == NULL)
+		return 0;
+
+	/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+	rc = sysfs_create_file(kernel_kobj, &rr.attr);
+	if (rc) {
+		printk(KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n",
+									rc);
+		return 0;
+	}
 
 	return 0;
 }

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 4/8] pseries: phyp dump: register dump area.
  2008-02-28 23:57   ` Manish Ahuja
                       ` (2 preceding siblings ...)
  2008-02-29  0:27     ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
@ 2008-02-29  0:29     ` Manish Ahuja
  2008-03-11  6:17       ` Paul Mackerras
  2008-02-29  0:31     ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja
                       ` (5 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Manish Ahuja @ 2008-02-29  0:29 UTC (permalink / raw)
  To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas


Set up the actual dump header, register it with the hypervisor.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>

------
 arch/powerpc/platforms/pseries/phyp_dump.c |  137 +++++++++++++++++++++++++++--
 1 file changed, 131 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 23:36:01.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 23:36:42.000000000 -0600
@@ -30,6 +30,117 @@
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = &phyp_dump_global;
 
+static int ibm_configure_kernel_dump;
+/* ------------------------------------------------- */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+	u32 dump_flags;
+	u16 source_type;
+	u16 error_flags;
+	u64 source_address;
+	u64 source_length;
+	u64 length_copied;
+	u64 destination_address;
+};
+
+struct phyp_dump_header {
+	u32 version;
+	u16 num_of_sections;
+	u16 status;
+
+	u32 first_offset_section;
+	u32 dump_disk_section;
+	u64 block_num_dd;
+	u64 num_of_blocks_dd;
+	u32 offset_dd;
+	u32 maxtime_to_auto;
+	/* No dump disk path string used */
+
+	struct dump_section cpu_data;
+	struct dump_section hpte_data;
+	struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO  0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+	unsigned long addr_offset = 0;
+
+	/* Set up the dump header */
+	ph->version = DUMP_HEADER_VERSION;
+	ph->num_of_sections = NUM_DUMP_SECTIONS;
+	ph->status = 0;
+
+	ph->first_offset_section =
+		(u32)offsetof(struct phyp_dump_header, cpu_data);
+	ph->dump_disk_section = 0;
+	ph->block_num_dd = 0;
+	ph->num_of_blocks_dd = 0;
+	ph->offset_dd = 0;
+
+	ph->maxtime_to_auto = 0; /* disabled */
+
+	/* The first two sections are mandatory */
+	ph->cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->cpu_data.source_type = DUMP_SOURCE_CPU;
+	ph->cpu_data.source_address = 0;
+	ph->cpu_data.source_length = phyp_dump_info->cpu_state_size;
+	ph->cpu_data.destination_address = addr_offset;
+	addr_offset += phyp_dump_info->cpu_state_size;
+
+	ph->hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->hpte_data.source_type = DUMP_SOURCE_HPTE;
+	ph->hpte_data.source_address = 0;
+	ph->hpte_data.source_length = phyp_dump_info->hpte_region_size;
+	ph->hpte_data.destination_address = addr_offset;
+	addr_offset += phyp_dump_info->hpte_region_size;
+
+	/* This section describes the low kernel region */
+	ph->kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->kernel_data.source_type = DUMP_SOURCE_RMO;
+	ph->kernel_data.source_address = PHYP_DUMP_RMR_START;
+	ph->kernel_data.source_length = PHYP_DUMP_RMR_END;
+	ph->kernel_data.destination_address = addr_offset;
+	addr_offset += ph->kernel_data.source_length;
+
+	return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+	int rc;
+	ph->cpu_data.destination_address += addr;
+	ph->hpte_data.destination_address += addr;
+	ph->kernel_data.destination_address += addr;
+
+	do {
+		rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+				1, ph, sizeof(struct phyp_dump_header));
+	} while (rtas_busy_delay(rc));
+
+	if (rc)
+		printk(KERN_ERR "phyp-dump: unexpected error (%d) on "
+						"register\n", rc);
+}
+
 /* ------------------------------------------------- */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -108,7 +219,9 @@ static struct kobj_attribute rr = __ATTR
 static int __init phyp_dump_setup(void)
 {
 	struct device_node *rtas;
-	const int *dump_header = NULL;
+ 	const struct phyp_dump_header *dump_header = NULL;
+ 	unsigned long dump_area_start;
+ 	unsigned long dump_area_length;
 	int header_len = 0;
 	int rc;
 
@@ -120,7 +233,13 @@ static int __init phyp_dump_setup(void)
 	if (!phyp_dump_info->phyp_dump_configured)
 		return -ENOSYS;
 
-	/* Is there dump data waiting for us? */
+	/* Is there dump data waiting for us? If there isn't,
+	 * then register a new dump area, and release all of
+	 * the rest of the reserved ram.
+	 *
+	 * The /rtas/ibm,kernel-dump rtas node is present only
+	 * if there is dump data waiting for us.
+	 */
 	rtas = of_find_node_by_path("/rtas");
 	if (rtas) {
 		dump_header = of_get_property(rtas, "ibm,kernel-dump",
@@ -128,17 +247,23 @@ static int __init phyp_dump_setup(void)
 		of_node_put(rtas);
 	}
 
-	if (dump_header == NULL)
+	dump_area_length = init_dump_header(&phdr);
+
+	/* align down */
+	dump_area_start = phyp_dump_info->init_reserve_start & PAGE_MASK;
+
+	if (dump_header == NULL) {
+		register_dump_area(&phdr, dump_area_start);
 		return 0;
+	}
 
 	/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
 	rc = sysfs_create_file(kernel_kobj, &rr.attr);
-	if (rc) {
+	if (rc)
 		printk(KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n",
 									rc);
-		return 0;
-	}
 
+	/* ToDo: re-register the dump area, for next time. */
 	return 0;
 }
 machine_subsys_initcall(pseries, phyp_dump_setup);

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 5/8] pseries: phyp dump: debugging print routines.
  2008-02-28 23:57   ` Manish Ahuja
                       ` (3 preceding siblings ...)
  2008-02-29  0:29     ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
@ 2008-02-29  0:31     ` Manish Ahuja
  2008-02-29  0:32     ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-29  0:31 UTC (permalink / raw)
  To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas


Provide some basic debugging support.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
-----

 arch/powerpc/platforms/pseries/phyp_dump.c |   61 ++++++++++++++++++++++++++++-
 1 file changed, 59 insertions(+), 2 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 23:36:42.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 23:36:45.000000000 -0600
@@ -124,6 +124,61 @@ static unsigned long init_dump_header(st
 	return addr_offset;
 }
 
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+#ifdef DEBUG
+	printk(KERN_INFO "dump header:\n");
+	/* setup some ph->sections required */
+	printk(KERN_INFO "version = %d\n", ph->version);
+	printk(KERN_INFO "Sections = %d\n", ph->num_of_sections);
+	printk(KERN_INFO "Status = 0x%x\n", ph->status);
+
+	/* No ph->disk, so all should be set to 0 */
+	printk(KERN_INFO "Offset to first section 0x%x\n",
+						ph->first_offset_section);
+	printk(KERN_INFO "dump disk sections should be zero\n");
+	printk(KERN_INFO "dump disk section = %d\n", ph->dump_disk_section);
+	printk(KERN_INFO "block num = %ld\n", ph->block_num_dd);
+	printk(KERN_INFO "number of blocks = %ld\n", ph->num_of_blocks_dd);
+	printk(KERN_INFO "dump disk offset = %d\n", ph->offset_dd);
+	printk(KERN_INFO "Max auto time= %d\n", ph->maxtime_to_auto);
+
+	/*set cpu state and hpte states as well scratch pad area */
+	printk(KERN_INFO " CPU AREA \n");
+	printk(KERN_INFO "cpu dump_flags =%d\n", ph->cpu_data.dump_flags);
+	printk(KERN_INFO "cpu source_type =%d\n", ph->cpu_data.source_type);
+	printk(KERN_INFO "cpu error_flags =%d\n", ph->cpu_data.error_flags);
+	printk(KERN_INFO "cpu source_address =%lx\n",
+						ph->cpu_data.source_address);
+	printk(KERN_INFO "cpu source_length =%lx\n",
+						ph->cpu_data.source_length);
+	printk(KERN_INFO "cpu length_copied =%lx\n",
+						ph->cpu_data.length_copied);
+
+	printk(KERN_INFO " HPTE AREA \n");
+	printk(KERN_INFO "HPTE dump_flags =%d\n", ph->hpte_data.dump_flags);
+	printk(KERN_INFO "HPTE source_type =%d\n", ph->hpte_data.source_type);
+	printk(KERN_INFO "HPTE error_flags =%d\n", ph->hpte_data.error_flags);
+	printk(KERN_INFO "HPTE source_address =%lx\n",
+						ph->hpte_data.source_address);
+	printk(KERN_INFO "HPTE source_length =%lx\n",
+						ph->hpte_data.source_length);
+	printk(KERN_INFO "HPTE length_copied =%lx\n",
+						ph->hpte_data.length_copied);
+
+	printk(KERN_INFO " SRSD AREA \n");
+	printk(KERN_INFO "SRSD dump_flags =%d\n", ph->kernel_data.dump_flags);
+	printk(KERN_INFO "SRSD source_type =%d\n", ph->kernel_data.source_type);
+	printk(KERN_INFO "SRSD error_flags =%d\n", ph->kernel_data.error_flags);
+	printk(KERN_INFO "SRSD source_address =%lx\n",
+						ph->kernel_data.source_address);
+	printk(KERN_INFO "SRSD source_length =%lx\n",
+						ph->kernel_data.source_length);
+	printk(KERN_INFO "SRSD length_copied =%lx\n",
+						ph->kernel_data.length_copied);
+#endif
+}
+
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
 	int rc;
@@ -136,9 +191,11 @@ static void register_dump_area(struct ph
 				1, ph, sizeof(struct phyp_dump_header));
 	} while (rtas_busy_delay(rc));
 
-	if (rc)
+	if (rc) {
 		printk(KERN_ERR "phyp-dump: unexpected error (%d) on "
 						"register\n", rc);
+		print_dump_header(ph);
+	}
 }
 
 /* ------------------------------------------------- */
@@ -247,8 +304,8 @@ static int __init phyp_dump_setup(void)
 		of_node_put(rtas);
 	}
 
+	print_dump_header(dump_header);
 	dump_area_length = init_dump_header(&phdr);
-
 	/* align down */
 	dump_area_start = phyp_dump_info->init_reserve_start & PAGE_MASK;
 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.
  2008-02-28 23:57   ` Manish Ahuja
                       ` (4 preceding siblings ...)
  2008-02-29  0:31     ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja
@ 2008-02-29  0:32     ` Manish Ahuja
  2008-03-11  6:19       ` Paul Mackerras
  2008-02-29  0:33     ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
                       ` (3 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Manish Ahuja @ 2008-02-29  0:32 UTC (permalink / raw)
  To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas


Routines to 
a. invalidate dump 
b. Calculate region that is reserved and needs to be freed. This is 
   exported through sysfs interface.

Unregister has been removed for now as it wasn't being used.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
-----

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   83 ++++++++++++++++++++++++++---
 include/asm-powerpc/phyp_dump.h            |    3 +
 2 files changed, 80 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 23:36:45.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 23:36:47.000000000 -0600
@@ -71,6 +71,10 @@ static struct phyp_dump_header phdr;
 #define DUMP_SOURCE_CPU 0x0001
 #define DUMP_SOURCE_HPTE 0x0002
 #define DUMP_SOURCE_RMO  0x0011
+#define DUMP_ERROR_FLAG 0x2000
+#define DUMP_TRIGGERED 0x4000
+#define DUMP_PERFORMED 0x8000
+
 
 /**
  * init_dump_header() - initialize the header declaring a dump
@@ -182,9 +186,15 @@ static void print_dump_header(const stru
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
 	int rc;
-	ph->cpu_data.destination_address += addr;
-	ph->hpte_data.destination_address += addr;
-	ph->kernel_data.destination_address += addr;
+
+	/* Add addr value if not initialized before */
+	if (ph->cpu_data.destination_address == 0) {
+		ph->cpu_data.destination_address += addr;
+		ph->hpte_data.destination_address += addr;
+		ph->kernel_data.destination_address += addr;
+	}
+
+	/* ToDo Invalidate kdump and free memory range. */
 
 	do {
 		rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
@@ -198,6 +208,30 @@ static void register_dump_area(struct ph
 	}
 }
 
+static
+void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr)
+{
+	int rc;
+
+	/* Add addr value if not initialized before */
+	if (ph->cpu_data.destination_address == 0) {
+		ph->cpu_data.destination_address += addr;
+		ph->hpte_data.destination_address += addr;
+		ph->kernel_data.destination_address += addr;
+	}
+
+	do {
+		rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+				2, ph, sizeof(struct phyp_dump_header));
+	} while (rtas_busy_delay(rc));
+
+	if (rc) {
+		printk(KERN_ERR "phyp-dump: unexpected error (%d) "
+						"on invalidate\n", rc);
+		print_dump_header(ph);
+	}
+}
+
 /* ------------------------------------------------- */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -208,8 +242,8 @@ static void register_dump_area(struct ph
  * lmb_reserved in early boot. The released memory becomes
  * available for genreal use.
  */
-static void
-release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+static
+void release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
 {
 	struct page *rpage;
 	unsigned long end_pfn;
@@ -270,8 +304,29 @@ static ssize_t store_release_region(stru
 	return count;
 }
 
+static ssize_t show_release_region(struct kobject *kobj,
+			struct kobj_attribute *attr, char *buf)
+{
+	u64 second_addr_range;
+
+	/* total reserved size - start of scratch area */
+	second_addr_range = phyp_dump_info->init_reserve_size -
+				phyp_dump_info->reserved_scratch_size;
+	return sprintf(buf, "CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:"
+			    " DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n",
+		phdr.cpu_data.destination_address,
+		phdr.cpu_data.length_copied,
+		phdr.hpte_data.destination_address,
+		phdr.hpte_data.length_copied,
+		phdr.kernel_data.destination_address,
+		phdr.kernel_data.length_copied,
+		phyp_dump_info->init_reserve_start,
+		second_addr_range);
+}
+
 static struct kobj_attribute rr = __ATTR(release_region, 0600,
-					 NULL, store_release_region);
+					show_release_region,
+					store_release_region);
 
 static int __init phyp_dump_setup(void)
 {
@@ -314,6 +369,22 @@ static int __init phyp_dump_setup(void)
 		return 0;
 	}
 
+	/* re-register the dump area, if old dump was invalid */
+	if ((dump_header) && (dump_header->status & DUMP_ERROR_FLAG)) {
+		invalidate_last_dump(&phdr, dump_area_start);
+		register_dump_area(&phdr, dump_area_start);
+		return 0;
+	}
+
+	if (dump_header) {
+		phyp_dump_info->reserved_scratch_addr =
+				dump_header->cpu_data.destination_address;
+		phyp_dump_info->reserved_scratch_size =
+				dump_header->cpu_data.source_length +
+				dump_header->hpte_data.source_length +
+				dump_header->kernel_data.source_length;
+	}
+
 	/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
 	rc = sysfs_create_file(kernel_kobj, &rr.attr);
 	if (rc)
Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===================================================================
--- 2.6.25-rc1.orig/include/asm-powerpc/phyp_dump.h	2008-02-28 22:05:25.000000000 -0600
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h	2008-02-28 23:36:47.000000000 -0600
@@ -30,6 +30,9 @@ struct phyp_dump {
 	/* store cpu & hpte size */
 	unsigned long cpu_state_size;
 	unsigned long hpte_region_size;
+	/* previous scratch area values */
+	unsigned long reserved_scratch_addr;
+	unsigned long reserved_scratch_size;
 };
 
 extern struct phyp_dump *phyp_dump_info;

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 7/8] pseries: phyp dump: Tracking memory range freed.
  2008-02-28 23:57   ` Manish Ahuja
                       ` (5 preceding siblings ...)
  2008-02-29  0:32     ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
@ 2008-02-29  0:33     ` Manish Ahuja
  2008-02-29  0:35     ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-02-29  0:33 UTC (permalink / raw)
  To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas


This patch tracks the size freed. For now it does a simple
rudimentary calculation of the ranges freed. The idea is
to keep it simple at the external shell script level and 
send in large chunks for now.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
-----

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   35 +++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 23:36:47.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 23:36:49.000000000 -0600
@@ -262,6 +262,39 @@ void release_memory_range(unsigned long 
 	}
 }
 
+/**
+ * track_freed_range -- Counts the range being freed.
+ * Once the counter goes to zero, it re-registers dump for
+ * future use.
+ */
+static void
+track_freed_range(unsigned long addr, unsigned long length)
+{
+	static unsigned long scratch_area_size, reserved_area_size;
+
+	if (addr < phyp_dump_info->init_reserve_start)
+		return;
+
+	if ((addr >= phyp_dump_info->init_reserve_start) &&
+	    (addr <= phyp_dump_info->init_reserve_start +
+	     phyp_dump_info->init_reserve_size))
+		reserved_area_size += length;
+
+	if ((addr >= phyp_dump_info->reserved_scratch_addr) &&
+	    (addr <= phyp_dump_info->reserved_scratch_addr +
+	     phyp_dump_info->reserved_scratch_size))
+		scratch_area_size += length;
+
+	if ((reserved_area_size == phyp_dump_info->init_reserve_size) &&
+	    (scratch_area_size == phyp_dump_info->reserved_scratch_size)) {
+
+		invalidate_last_dump(&phdr,
+				phyp_dump_info->reserved_scratch_addr);
+		register_dump_area(&phdr,
+				phyp_dump_info->reserved_scratch_addr);
+	}
+}
+
 /* ------------------------------------------------- */
 /**
  * sysfs_release_region -- sysfs interface to release memory range.
@@ -286,6 +319,8 @@ static ssize_t store_release_region(stru
 	if (ret != 2)
 		return -EINVAL;
 
+	track_freed_range(start_addr, length);
+
 	/* Range-check - don't free any reserved memory that
 	 * wasn't reserved for phyp-dump */
 	if (start_addr < phyp_dump_info->init_reserve_start)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 8/8] pseries: phyp dump: config file
  2008-02-28 23:57   ` Manish Ahuja
                       ` (6 preceding siblings ...)
  2008-02-29  0:33     ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
@ 2008-02-29  0:35     ` Manish Ahuja
  2008-03-11  6:21       ` Paul Mackerras
  2008-02-29  2:20     ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Michael Ellerman
  2008-03-03 23:37     ` Joel Schopp
  9 siblings, 1 reply; 59+ messages in thread
From: Manish Ahuja @ 2008-02-29  0:35 UTC (permalink / raw)
  To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas




Add hypervisor-assisted dump to kernel config

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>

-----
 arch/powerpc/Kconfig |   11 +++++++++++
 1 file changed, 11 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/Kconfig
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/Kconfig	2008-02-18 03:22:06.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/Kconfig	2008-02-18 03:22:45.000000000 -0600
@@ -306,6 +306,17 @@ config CRASH_DUMP
 
 	  Don't change this unless you know what you are doing.
 
+config PHYP_DUMP
+	bool "Hypervisor-assisted dump (EXPERIMENTAL)"
+	depends on PPC_PSERIES && EXPERIMENTAL
+	default y
+	help
+	  Hypervisor-assisted dump is meant to be a kdump replacement
+	  offering robustness and speed not possible without system
+	  hypervisor assistence.
+
+	  If unsure, say "Y"
+
 config PPCBUG_NVRAM
 	bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC
 	default y if PPC_PREP

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
  2008-02-28 23:57   ` Manish Ahuja
                       ` (7 preceding siblings ...)
  2008-02-29  0:35     ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
@ 2008-02-29  2:20     ` Michael Ellerman
  2008-03-03 23:37     ` Joel Schopp
  9 siblings, 0 replies; 59+ messages in thread
From: Michael Ellerman @ 2008-02-29  2:20 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: ppc-dev, Linas Vepstas, paulus

[-- Attachment #1: Type: text/plain, Size: 589 bytes --]

On Thu, 2008-02-28 at 17:57 -0600, Manish Ahuja wrote:
> Changes from previous version:
> 
> The only changes are in patch 2.
> moved early_init_dt_scan_phyp_dump from rtas.c to phyp_dump.c
> Added dummy function in phyp_dump.h

This fixes the build failures I was seeing!
http://kisskb.ellerman.id.au/kisskb/head/664/

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
  2008-02-28 23:57   ` Manish Ahuja
                       ` (8 preceding siblings ...)
  2008-02-29  2:20     ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Michael Ellerman
@ 2008-03-03 23:37     ` Joel Schopp
  9 siblings, 0 replies; 59+ messages in thread
From: Joel Schopp @ 2008-03-03 23:37 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: ppc-dev, Linas Vepstas, paulus

This looks like it is to a stable usable point now.  In my opinion it is 
ready to be merged into the next tree for 2.6.26.

Reviewed-by: Joel Schopp <jschopp@austin.ibm.com>


Manish Ahuja wrote:
> Changes from previous version:
>
> The only changes are in patch 2.
> moved early_init_dt_scan_phyp_dump from rtas.c to phyp_dump.c
> Added dummy function in phyp_dump.h
>
> Patch 3 required repatching due to changes to patch 2.
> Resubmitting all patches to avoid confusion.
>
> Thanks,
> Manish
>
>
> Michael Ellerman wrote:
>   
>> On Sun, 2008-02-17 at 22:53 -0600, Manish Ahuja wrote:
>>     
>>> The following series of patches implement a basic framework
>>> for hypervisor-assisted dump. The very first patch provides 
>>> documentation explaining what this is  :-) . Yes, its supposed
>>> to be an improvement over kdump.
>>>
>>> A list of open issues / todo list is included in the documentation.
>>> It also appears that the not-yet-released firmware versions this was tested 
>>> on are still, ahem, incomplete; this work is also pending.
>>>
>>> I have included most of the changes requested. Although, I did find
>>> one or two, fixed in a later patch file rather than the first location
>>> they appeared at.
>>>       
>> This series still doesn't build on !CONFIG_RTAS configs:
>> http://kisskb.ellerman.id.au/kisskb/head/629/
>>
>> This solution is to move early_init_dt_scan_phyp_dump() into
>> arch/powerpc/platforms/pseries/phyp_dump.c and provide a dummy
>> implementation in asm-powerpc/phyp_dump.c for the !CONFIG_PHYP_DUMP
>> case.
>>
>> cheers
>>
>>     
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
>   

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
  2008-02-29  0:24     ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
@ 2008-03-11  1:02       ` Michael Ellerman
  2008-03-12 17:52         ` Linas Vepstas
  2008-03-11  6:12       ` Paul Mackerras
  1 sibling, 1 reply; 59+ messages in thread
From: Michael Ellerman @ 2008-03-11  1:02 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus

[-- Attachment #1: Type: text/plain, Size: 6762 bytes --]

On Thu, 2008-02-28 at 18:24 -0600, Manish Ahuja wrote:
> Initial patch for reserving memory in early boot, and freeing it later.
> If the previous boot had ended with a crash, the reserved memory would contain
> a copy of the crashed kernel data.
> 
> Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
> Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>

Hi Manish,

A few comments inline ..

> Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-28 21:57:52.000000000 -0600
> @@ -0,0 +1,105 @@
> +/*
> + * Hypervisor-assisted dump
> + *
> + * Linas Vepstas, Manish Ahuja 2008
> + * Copyright 2008 IBM Corp.
> + *
> + *      This program is free software; you can redistribute it and/or
> + *      modify it under the terms of the GNU General Public License
> + *      as published by the Free Software Foundation; either version
> + *      2 of the License, or (at your option) any later version.
> + *
> + */
> +
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +#include <linux/pfn.h>
> +#include <linux/swap.h>
> +
> +#include <asm/page.h>
> +#include <asm/phyp_dump.h>
> +#include <asm/machdep.h>
> +#include <asm/prom.h>
> +
> +/* Global, used to communicate data between early boot and late boot */
> +static struct phyp_dump phyp_dump_global;
> +struct phyp_dump *phyp_dump_info = &phyp_dump_global;

I don't see the point of this. You have a static (ie. non-global) struct
called phyp_dump_global, then you create a pointer to it and pass that
around. It could just be:

phyp_dump.h:
extern struct phyp_dump phyp_dump_info; 

phyp_dump.c:
struct phyp_dump phyp_dump_info;

phyp_dump_info.foo = bar;

I also think the struct should be called phyp_dump_info, not phyp_dump -
it contains info about phyp_dump, not the dump itself.

> +
> +/**
> + * release_memory_range -- release memory previously lmb_reserved
> + * @start_pfn: starting physical frame number
> + * @nr_pages: number of pages to free.
> + *
> + * This routine will release memory that had been previously
> + * lmb_reserved in early boot. The released memory becomes
> + * available for genreal use.
> + */
> +static void
> +release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
> +{
> +	struct page *rpage;
> +	unsigned long end_pfn;
> +	long i;
> +
> +	end_pfn = start_pfn + nr_pages;
> +
> +	for (i = start_pfn; i <= end_pfn; i++) {
> +		rpage = pfn_to_page(i);
> +		if (PageReserved(rpage)) {
> +			ClearPageReserved(rpage);
> +			init_page_count(rpage);
> +			__free_page(rpage);
> +			totalram_pages++;
> +		}
> +	}
> +}
> +
> +static int __init phyp_dump_setup(void)
> +{
> +	unsigned long start_pfn, nr_pages;
> +
> +	/* If no memory was reserved in early boot, there is nothing to do */
> +	if (phyp_dump_info->init_reserve_size == 0)
> +		return 0;
> +
> +	/* Release memory that was reserved in early boot */
> +	start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
> +	nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
> +	release_memory_range(start_pfn, nr_pages);
> +
> +	return 0;
> +}
> +machine_subsys_initcall(pseries, phyp_dump_setup);
> +
> +int __init early_init_dt_scan_phyp_dump(unsigned long node,
> +		const char *uname, int depth, void *data)
> +{
> +#ifdef CONFIG_PHYP_DUMP
> +	const unsigned int *sizes;
> +
> +	phyp_dump_info->phyp_dump_configured = 0;
> +	phyp_dump_info->phyp_dump_is_active = 0;
> +
> +	if (depth != 1 || strcmp(uname, "rtas") != 0)
> +		return 0;
> +
> +	if (of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL))
> +		phyp_dump_info->phyp_dump_configured++;
> +
> +	if (of_get_flat_dt_prop(node, "ibm,dump-kernel", NULL))
> +		phyp_dump_info->phyp_dump_is_active++;
> +
> +	sizes = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
> +									NULL);
> +	if (!sizes)
> +		return 0;
> +
> +	if (sizes[0] == 1)
> +		phyp_dump_info->cpu_state_size = *((unsigned long *)&sizes[1]);
> +
> +	if (sizes[3] == 2)
> +		phyp_dump_info->hpte_region_size =
> +						*((unsigned long *)&sizes[4]);
> +#endif

This doesn't need to be inside #ifdef, you have a dummy version already
defined in the header file.

> Index: 2.6.25-rc1/arch/powerpc/kernel/prom.c
> ===================================================================
> --- 2.6.25-rc1.orig/arch/powerpc/kernel/prom.c	2008-02-28 21:54:57.000000000 -0600
> +++ 2.6.25-rc1/arch/powerpc/kernel/prom.c	2008-02-28 21:55:27.000000000 -0600
> @@ -1039,6 +1040,51 @@ static void __init early_reserve_mem(voi
>  #endif
>  }
>  
> +#ifdef CONFIG_PHYP_DUMP
> +/**
> + * reserve_crashed_mem() - reserve all not-yet-dumped mmemory
> + *
> + * This routine may reserve memory regions in the kernel only
> + * if the system is supported and a dump was taken in last
> + * boot instance or if the hardware is supported and the
> + * scratch area needs to be setup. In other instances it returns
> + * without reserving anything. The memory in case of dump being
> + * active is freed when the dump is collected (by userland tools).
> + */
> +static void __init reserve_crashed_mem(void)

This could do with a name change IMO, eg. phyp_dump_reserve_mem() or
something.

> +{
> +	unsigned long base, size;
> +	if (!phyp_dump_info->phyp_dump_configured) {
> +		printk(KERN_ERR "Phyp-dump not supported on this hardware\n");
> +		return;
> +	}
> +
> +	if (phyp_dump_info->phyp_dump_is_active) {
> +		/* Reserve *everything* above RMR.Area freed by userland tools*/
> +		base = PHYP_DUMP_RMR_END;
> +		size = lmb_end_of_DRAM() - base;
> +
> +		/* XXX crashed_ram_end is wrong, since it may be beyond
> +		 * the memory_limit, it will need to be adjusted. */
> +		lmb_reserve(base, size);
> +
> +		phyp_dump_info->init_reserve_start = base;
> +		phyp_dump_info->init_reserve_size = size;
> +	} else {
> +		size = phyp_dump_info->cpu_state_size +
> +			phyp_dump_info->hpte_region_size +
> +			PHYP_DUMP_RMR_END;
> +		base = lmb_end_of_DRAM() - size;
> +		lmb_reserve(base, size);
> +		phyp_dump_info->init_reserve_start = base;
> +		phyp_dump_info->init_reserve_size = size;
> +	}
> +}
> +#else
> +static inline void __init reserve_crashed_mem(void) {}
> +#endif /* CONFIG_PHYP_DUMP  && CONFIG_PPC_RTAS */


cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
  2008-02-29  0:24     ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
  2008-03-11  1:02       ` Michael Ellerman
@ 2008-03-11  6:12       ` Paul Mackerras
  2008-03-12  0:13         ` Michael Ellerman
  1 sibling, 1 reply; 59+ messages in thread
From: Paul Mackerras @ 2008-03-11  6:12 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: linuxppc-dev, linasvepstas, mahuja

Manish Ahuja writes:

> +#else /* CONFIG_PHYP_DUMP */
> +int early_init_dt_scan_phyp_dump(unsigned long node,
> +		const char *uname, int depth, void *data) { return 0; }

This shouldn't be in the header file.  Either put it in prom.c (and
make it return 1 so the of_scan_flat_dt call doesn't have to go
through the entire device tree), or put #ifdef CONFIG_PHYP_DUMP around
the of_scan_flat_dt call itself.

> +/* Global, used to communicate data between early boot and late boot */
> +static struct phyp_dump phyp_dump_global;
> +struct phyp_dump *phyp_dump_info = &phyp_dump_global;

It's a little weird to have a static variable with global in its name.

> +int __init early_init_dt_scan_phyp_dump(unsigned long node,
> +		const char *uname, int depth, void *data)
> +{
> +#ifdef CONFIG_PHYP_DUMP

This is in phyp_dump.c, which only gets compiled if CONFIG_PHYP_DUMP
is set, so you don't need this ifdef.

Paul.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
  2008-02-29  0:27     ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
@ 2008-03-11  6:16       ` Paul Mackerras
  2008-03-11 16:44         ` Dale Farnsworth
  2008-03-12 17:38         ` Linas Vepstas
  0 siblings, 2 replies; 59+ messages in thread
From: Paul Mackerras @ 2008-03-11  6:16 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: linuxppc-dev, linasvepstas, mahuja

Manish Ahuja writes:

> Check to see if there actually is data from a previously
> crashed kernel waiting. If so, Allow user-sapce tools to
> grab the data (by reading /proc/kcore). When user-space 
> finishes dumping a section, it must release that memory
> by writing to sysfs. For example,
> 
>   echo "0x40000000 0x10000000" > /sys/kernel/release_region
> 
> will release 256MB starting at the 1GB.  The released memory
> becomes free for general use.
> 
> Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
> Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
> 
> ------

This line needs to be exactly 3 dashes, because otherwise the tools
include the diffstat into the commit message.  Putting 4 or more
dashes was an annoying habit Linas had, and it means I have to fix it
manually (usually after I have committed the patches, and then notice
that the commit message has the extra stuff in it, so I have to go
back and fix the separators, reset my tree and re-commit the patches.)

> +		dump_header = of_get_property(rtas, "ibm,kernel-dump",
> +								&header_len);

This is a somewhat weird-looking way of coping with too-long lines.
Please indent the second line either one more tab than the first line,
or else so that it starts just after the '(' in the first line (which
is what emacs will do by default).  The same comment applies in
several other places.

Paul.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] pseries: phyp dump: register dump area.
  2008-02-29  0:29     ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
@ 2008-03-11  6:17       ` Paul Mackerras
  0 siblings, 0 replies; 59+ messages in thread
From: Paul Mackerras @ 2008-03-11  6:17 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: linuxppc-dev, linasvepstas, mahuja

Manish Ahuja writes:

> +#define NUM_DUMP_SECTIONS 3
> +#define DUMP_HEADER_VERSION 0x1
> +#define DUMP_REQUEST_FLAG 0x1
> +#define DUMP_SOURCE_CPU 0x0001
> +#define DUMP_SOURCE_HPTE 0x0002
> +#define DUMP_SOURCE_RMO  0x0011

I think it would be clearer if you use a tab to line up the values,
like this:

#define NUM_DUMP_SECTIONS	3
#define DUMP_HEADER_VERSION	0x1
#define DUMP_REQUEST_FLAG	0x1
#define DUMP_SOURCE_CPU		0x0001
#define DUMP_SOURCE_HPTE	0x0002
#define DUMP_SOURCE_RMO		0x0011

Paul.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.
  2008-02-29  0:32     ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
@ 2008-03-11  6:19       ` Paul Mackerras
  0 siblings, 0 replies; 59+ messages in thread
From: Paul Mackerras @ 2008-03-11  6:19 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: linuxppc-dev, linasvepstas, mahuja

Manish Ahuja writes:

> -static void
> -release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
> +static
> +void release_memory_range(unsigned long start_pfn, unsigned long nr_pages)

This change looks rather pointless.  If you have to change it, I'd
prefer:

static void release_memory_range(unsigned long start_pfn,
			unsigned long nr_pages)

Paul.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 8/8] pseries: phyp dump: config file
  2008-02-29  0:35     ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
@ 2008-03-11  6:21       ` Paul Mackerras
  2008-03-12 16:36         ` Manish Ahuja
  0 siblings, 1 reply; 59+ messages in thread
From: Paul Mackerras @ 2008-03-11  6:21 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: linuxppc-dev, linasvepstas, mahuja

Manish Ahuja writes:

> +config PHYP_DUMP
> +	bool "Hypervisor-assisted dump (EXPERIMENTAL)"
> +	depends on PPC_PSERIES && EXPERIMENTAL
> +	default y

I think this should default to n for now (i.e. leave out the default
line entirely).  We can make it default to y later.

Paul.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
  2008-03-11  6:16       ` Paul Mackerras
@ 2008-03-11 16:44         ` Dale Farnsworth
  2008-03-12 17:38         ` Linas Vepstas
  1 sibling, 0 replies; 59+ messages in thread
From: Dale Farnsworth @ 2008-03-11 16:44 UTC (permalink / raw)
  To: Linuxppc-dev

Paul wrote:
> Manish Ahuja writes:
> > +		dump_header = of_get_property(rtas, "ibm,kernel-dump",
> > +								&header_len);
> 
> This is a somewhat weird-looking way of coping with too-long lines.

Yes, but not too surprising, since it precisely follows the recommendation
(and the example) in Chapter 2 of Documentation/CodingStyle.  :)

-Dale

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
  2008-03-11  6:12       ` Paul Mackerras
@ 2008-03-12  0:13         ` Michael Ellerman
  2008-03-12  0:53           ` Michael Ellerman
  0 siblings, 1 reply; 59+ messages in thread
From: Michael Ellerman @ 2008-03-12  0:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: mahuja, linuxppc-dev, linasvepstas

[-- Attachment #1: Type: text/plain, Size: 903 bytes --]

On Tue, 2008-03-11 at 17:12 +1100, Paul Mackerras wrote:
> Manish Ahuja writes:
> 
> > +#else /* CONFIG_PHYP_DUMP */
> > +int early_init_dt_scan_phyp_dump(unsigned long node,
> > +		const char *uname, int depth, void *data) { return 0; }
> 
> This shouldn't be in the header file.  Either put it in prom.c (and
> make it return 1 so the of_scan_flat_dt call doesn't have to go
> through the entire device tree), or put #ifdef CONFIG_PHYP_DUMP around
> the of_scan_flat_dt call itself.

It should be in the header file, otherwise we need an #ifdef around the
call site - which is uglier.

It should definitely return 1 though.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
  2008-03-12  0:13         ` Michael Ellerman
@ 2008-03-12  0:53           ` Michael Ellerman
  0 siblings, 0 replies; 59+ messages in thread
From: Michael Ellerman @ 2008-03-12  0:53 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: mahuja, linuxppc-dev, linasvepstas

[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]


On Wed, 2008-03-12 at 11:13 +1100, Michael Ellerman wrote:
> On Tue, 2008-03-11 at 17:12 +1100, Paul Mackerras wrote:
> > Manish Ahuja writes:
> > 
> > > +#else /* CONFIG_PHYP_DUMP */
> > > +int early_init_dt_scan_phyp_dump(unsigned long node,
> > > +		const char *uname, int depth, void *data) { return 0; }
> > 
> > This shouldn't be in the header file.  Either put it in prom.c (and
> > make it return 1 so the of_scan_flat_dt call doesn't have to go
> > through the entire device tree), or put #ifdef CONFIG_PHYP_DUMP around
> > the of_scan_flat_dt call itself.
> 
> It should be in the header file, otherwise we need an #ifdef around the
> call site - which is uglier.

Right I'm an idiot. It is called via a function pointer, so a static
inline (which this should be, but isn't) is no good. An #ifdef around
the call site is probably the least ugly option given that otherwise we
have to have an empty version in the binary.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 8/8] pseries: phyp dump: config file
  2008-03-11  6:21       ` Paul Mackerras
@ 2008-03-12 16:36         ` Manish Ahuja
  0 siblings, 0 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-03-12 16:36 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, linasvepstas, mahuja

Thanks for the review. I will try and make the recommended changes and repost it soon.

Manish


Paul Mackerras wrote:
> Manish Ahuja writes:
> 
>> +config PHYP_DUMP
>> +	bool "Hypervisor-assisted dump (EXPERIMENTAL)"
>> +	depends on PPC_PSERIES && EXPERIMENTAL
>> +	default y
> 
> I think this should default to n for now (i.e. leave out the default
> line entirely).  We can make it default to y later.
> 
> Paul.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
  2008-03-11  6:16       ` Paul Mackerras
  2008-03-11 16:44         ` Dale Farnsworth
@ 2008-03-12 17:38         ` Linas Vepstas
  1 sibling, 0 replies; 59+ messages in thread
From: Linas Vepstas @ 2008-03-12 17:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, mahuja

On 11/03/2008, Paul Mackerras <paulus@samba.org> wrote:
>
>  > ------
>
> This line needs to be exactly 3 dashes, because otherwise the tools
>  include the diffstat into the commit message.  Putting 4 or more
>  dashes was an annoying habit Linas had, and it means I have to fix it
>  manually (usually after I have committed the patches, and then notice
>  that the commit message has the extra stuff in it, so I have to go
>  back and fix the separators, reset my tree and re-commit the patches.)

Sorry, I had no idea!  If I didn't have enough dashes, then quilt would
sometimes wipe out the comment at the top, so paranoia made me
add lots of dashes.

--linas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
  2008-03-11  1:02       ` Michael Ellerman
@ 2008-03-12 17:52         ` Linas Vepstas
  2008-03-13  4:29           ` Manish Ahuja
  0 siblings, 1 reply; 59+ messages in thread
From: Linas Vepstas @ 2008-03-12 17:52 UTC (permalink / raw)
  To: michael; +Cc: mahuja, linuxppc-dev, paulus

On 10/03/2008, Michael Ellerman <michael@ellerman.id.au> wrote:
> On Thu, 2008-02-28 at 18:24 -0600, Manish Ahuja wrote:

>  > +
>  > +/* Global, used to communicate data between early boot and late boot */
>  > +static struct phyp_dump phyp_dump_global;
>  > +struct phyp_dump *phyp_dump_info = &phyp_dump_global;
>
> I don't see the point of this. You have a static (ie. non-global) struct
>  called phyp_dump_global, then you create a pointer to it and pass that
>  around.

I did this. This is a style used to minimize disruption due to future
design changes. Basically, the idea is that, at some later time, for
some unknown reason, we decide that this structure shouldn't
be global, or maybe shouldn't be statically allocated, or maybe
should be per-cpu, or who knows.  By creating a pointer, and
just passing that around, you isolate other code from this change.

I learned this trick after spending too many months of my life hunting
down globals and replacing them by dynamically allocated structs.
Its a long and painful process, on many levels, often requiring major
code restructuring.  Code that touches globals directly is often
poorly thought out, designed.  But going in the opposite direction
is easy: if your code always passes everything it needs as args
to subroutines,  then you are free & clear ... if one of those args
just happens to be a pointer to a global, there's no loss (not even
a performance loss -- the arg passing overhead is about the same
as a global TOC lookup!)

So it may look weird if you're not used to seeing it; but the alternative
is almost always worse.

--linas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
  2008-03-12 17:52         ` Linas Vepstas
@ 2008-03-13  4:29           ` Manish Ahuja
  2008-03-14  4:20             ` Michael Ellerman
  2008-03-14  5:19             ` Paul Mackerras
  0 siblings, 2 replies; 59+ messages in thread
From: Manish Ahuja @ 2008-03-13  4:29 UTC (permalink / raw)
  To: linasvepstas; +Cc: linuxppc-dev, paulus, mahuja

If Mike and Paul are okay, then I will leave this bit as is and fix all 
other issues and comments.

Thanks,
Manish



Linas Vepstas wrote:
> On 10/03/2008, Michael Ellerman <michael@ellerman.id.au> wrote:
>> On Thu, 2008-02-28 at 18:24 -0600, Manish Ahuja wrote:
> 
>>  > +
>>  > +/* Global, used to communicate data between early boot and late boot */
>>  > +static struct phyp_dump phyp_dump_global;
>>  > +struct phyp_dump *phyp_dump_info = &phyp_dump_global;
>>
>> I don't see the point of this. You have a static (ie. non-global) struct
>>  called phyp_dump_global, then you create a pointer to it and pass that
>>  around.
> 
> I did this. This is a style used to minimize disruption due to future
> design changes. Basically, the idea is that, at some later time, for
> some unknown reason, we decide that this structure shouldn't
> be global, or maybe shouldn't be statically allocated, or maybe
> should be per-cpu, or who knows.  By creating a pointer, and
> just passing that around, you isolate other code from this change.
> 
> I learned this trick after spending too many months of my life hunting
> down globals and replacing them by dynamically allocated structs.
> Its a long and painful process, on many levels, often requiring major
> code restructuring.  Code that touches globals directly is often
> poorly thought out, designed.  But going in the opposite direction
> is easy: if your code always passes everything it needs as args
> to subroutines,  then you are free & clear ... if one of those args
> just happens to be a pointer to a global, there's no loss (not even
> a performance loss -- the arg passing overhead is about the same
> as a global TOC lookup!)
> 
> So it may look weird if you're not used to seeing it; but the alternative
> is almost always worse.
> 
> --linas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
  2008-03-13  4:29           ` Manish Ahuja
@ 2008-03-14  4:20             ` Michael Ellerman
  2008-03-14  5:19             ` Paul Mackerras
  1 sibling, 0 replies; 59+ messages in thread
From: Michael Ellerman @ 2008-03-14  4:20 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus

[-- Attachment #1: Type: text/plain, Size: 2381 bytes --]

On Wed, 2008-03-12 at 23:29 -0500, Manish Ahuja wrote:
> If Mike and Paul are okay, then I will leave this bit as is and fix all 
> other issues and comments.

Well I still don't like it - it uglifies the code _now_, for a potential
future benefit that may never come. But I don't care that much, if
Paul's happy with it let it go in.

cheers

> Linas Vepstas wrote:
> > On 10/03/2008, Michael Ellerman <michael@ellerman.id.au> wrote:
> >> On Thu, 2008-02-28 at 18:24 -0600, Manish Ahuja wrote:
> > 
> >>  > +
> >>  > +/* Global, used to communicate data between early boot and late boot */
> >>  > +static struct phyp_dump phyp_dump_global;
> >>  > +struct phyp_dump *phyp_dump_info = &phyp_dump_global;
> >>
> >> I don't see the point of this. You have a static (ie. non-global) struct
> >>  called phyp_dump_global, then you create a pointer to it and pass that
> >>  around.
> > 
> > I did this. This is a style used to minimize disruption due to future
> > design changes. Basically, the idea is that, at some later time, for
> > some unknown reason, we decide that this structure shouldn't
> > be global, or maybe shouldn't be statically allocated, or maybe
> > should be per-cpu, or who knows.  By creating a pointer, and
> > just passing that around, you isolate other code from this change.
> > 
> > I learned this trick after spending too many months of my life hunting
> > down globals and replacing them by dynamically allocated structs.
> > Its a long and painful process, on many levels, often requiring major
> > code restructuring.  Code that touches globals directly is often
> > poorly thought out, designed.  But going in the opposite direction
> > is easy: if your code always passes everything it needs as args
> > to subroutines,  then you are free & clear ... if one of those args
> > just happens to be a pointer to a global, there's no loss (not even
> > a performance loss -- the arg passing overhead is about the same
> > as a global TOC lookup!)
> > 
> > So it may look weird if you're not used to seeing it; but the alternative
> > is almost always worse.


-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
  2008-03-13  4:29           ` Manish Ahuja
  2008-03-14  4:20             ` Michael Ellerman
@ 2008-03-14  5:19             ` Paul Mackerras
  1 sibling, 0 replies; 59+ messages in thread
From: Paul Mackerras @ 2008-03-14  5:19 UTC (permalink / raw)
  To: Manish Ahuja; +Cc: linuxppc-dev, linasvepstas, mahuja

Manish Ahuja writes:

> If Mike and Paul are okay, then I will leave this bit as is and fix all 
> other issues and comments.

Well, part of the problem is the semantic dissonance caused by having
a static variable called "global".  Please change the name
"phyp_dump_global" to "phyp_dump_vars" or something similar - that
will only affect two lines of code and will reduce the ugliness a bit.

Paul.

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2008-03-14  5:19 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-18  4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-02-18  5:34 ` [PATCH 1/8] pseries: phyp dump: Documentation Manish Ahuja
2008-02-18  5:36 ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
2008-02-18  5:38 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
2008-02-18  5:40 ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
2008-02-18  5:41 ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja
2008-02-18  5:42 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
2008-02-18  5:44 ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
2008-02-18  5:45 ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
2008-02-22  0:53 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Michael Ellerman
2008-02-28 23:57   ` Manish Ahuja
2008-02-29  0:22     ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-02-29  0:24     ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
2008-03-11  1:02       ` Michael Ellerman
2008-03-12 17:52         ` Linas Vepstas
2008-03-13  4:29           ` Manish Ahuja
2008-03-14  4:20             ` Michael Ellerman
2008-03-14  5:19             ` Paul Mackerras
2008-03-11  6:12       ` Paul Mackerras
2008-03-12  0:13         ` Michael Ellerman
2008-03-12  0:53           ` Michael Ellerman
2008-02-29  0:27     ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
2008-03-11  6:16       ` Paul Mackerras
2008-03-11 16:44         ` Dale Farnsworth
2008-03-12 17:38         ` Linas Vepstas
2008-02-29  0:29     ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
2008-03-11  6:17       ` Paul Mackerras
2008-02-29  0:31     ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja
2008-02-29  0:32     ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
2008-03-11  6:19       ` Paul Mackerras
2008-02-29  0:33     ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
2008-02-29  0:35     ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
2008-03-11  6:21       ` Paul Mackerras
2008-03-12 16:36         ` Manish Ahuja
2008-02-29  2:20     ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Michael Ellerman
2008-03-03 23:37     ` Joel Schopp
  -- strict thread matches above, loose matches on Subject: below --
2008-01-22 19:12 Manish Ahuja
2008-01-22 19:26 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-01-08  0:13 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-01-09  4:29   ` Nathan Lynch
2008-01-09  4:58     ` Michael Ellerman
2008-01-09 15:31     ` Linas Vepstas
2008-01-09 18:44       ` Nathan Lynch
2008-01-09 19:28         ` Manish Ahuja
2008-01-09 22:59         ` Michael Ellerman
2008-01-09 23:18           ` Manish Ahuja
2008-01-10  2:47           ` Linas Vepstas
2008-01-10  3:55             ` Michael Ellerman
2008-01-10  2:33         ` Linas Vepstas
2008-01-10  3:17           ` Olof Johansson
2008-01-10  4:12             ` Linas Vepstas
2008-01-10  4:52               ` Michael Ellerman
2008-01-10 16:21               ` Olof Johansson
2008-01-10 16:34                 ` Linas Vepstas
2008-01-10 21:46                   ` Mike Strosaker
2008-01-11  1:26                     ` Nathan Lynch
2008-01-11 16:57                       ` Linas Vepstas
2008-01-14  5:24                         ` Olof Johansson
2008-01-14 15:21                           ` Linas Vepstas
2008-02-12  6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-02-12  6:53   ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.