* [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump @ 2008-01-07 23:45 Manish Ahuja 2008-01-08 0:13 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja ` (8 more replies) 0 siblings, 9 replies; 69+ messages in thread From: Manish Ahuja @ 2008-01-07 23:45 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake The following series of patches implement a basic framework for hypervisor-assisted dump. The very first patch provides documentation explaining what this is :-) . Yes, its supposed to be an improvement over kdump. The patches mostly work; a list of open issues / todo list is included in the documentation. It also appears that the not-yet-released firmware versions this was tested on are still, ahem, incomplete; this work is also pending. -- Linas & Manish ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja @ 2008-01-08 0:13 ` Manish Ahuja 2008-01-09 4:29 ` Nathan Lynch 2008-01-08 0:16 ` [PATCH 2/8] pseries: phyp dump: config file Manish Ahuja ` (7 subsequent siblings) 8 siblings, 1 reply; 69+ messages in thread From: Manish Ahuja @ 2008-01-08 0:13 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake Basic documentation for hypervisor-assisted dump. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> ---- Documentation/powerpc/phyp-assisted-dump.txt | 129 +++++++++++++++++++++++++++ 1 file changed, 129 insertions(+) Index: 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt 2008-01-07 18:05:46.000000000 -0600 @@ -0,0 +1,129 @@ + + Hypervisor-Assisted Dump + ------------------------ + November 2007 + +The goal of hypervisor-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +As compared to kdump or other strategies, hypervisor-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- As the dump is performed, the dumped memory becomes + immediately available to the system for normal use. +-- After the dump is completed, no further reboots are + required; the system will be fully usable, and running + in it's normal, production mode on it normal kernel. + +The above can only be accomplished by coordination with, +and assistance from the hypervisor. The procedure is +as follows: + +-- When a system crashes, the hypervisor will save + the low 256MB of RAM to a previously registered + save region. It will also save system state, system + registers, and hardware PTE's. + +-- After the low 256MB area has been saved, the + hypervisor will reset PCI and other hardware state. + It will *not* clear RAM. It will then launch the + bootloader, as normal. + +-- The freshly booted kernel will notice that there + is a new node (ibm,dump-kernel) in the device tree, + indicating that there is crash data available from + a previous boot. It will boot into only 256MB of RAM, + reserving the rest of system memory. + +-- Userspace tools will parse /sys/kernel/release_region + and read /proc/vmcore to obtain the contents of memory, + which holds the previous crashed kernel. The userspace + tools may copy this info to disk, or network, nas, san, + iscsi, etc. as desired. + + For Example: the values in /sys/kernel/release-region + would look something like this (address-range pairs). + CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / + DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A + +-- As the userspace tools complete saving a portion of + dump, they echo an offset and size to + /sys/kernel/release_region to release the reserved + memory back to general use. + + An example of this is: + "echo 0x40000000 0x10000000 > /sys/kernel/release_region" + which will release 256MB at the 1GB boundary. + +Please note that the hypervisor-assisted dump feature +is only available on Power6-based systems with recent +firmware versions. + +Implementation details: +---------------------- +In order for this scheme to work, memory needs to be reserved +quite early in the boot cycle. However, access to the device +tree this early in the boot cycle is difficult, and device-tree +access is needed to determine if there is a crash data waiting. +To work around this problem, all but 256MB of RAM is reserved +during early boot. A short while later in boot, a check is made +to determine if there is dump data waiting. If there isn't, +then the reserved memory is released to general kernel use. +If there is dump data, then the /sys/kernel/release_region +file is created, and the reserved memory is held. + +If there is no waiting dump data, then all but 256MB of the +reserved ram will be released for general kernel use. The +highest 256 MB of RAM will *not* be released: this region +will be kept permanently reserved, so that it can act as +a receptacle for a copy of the low 256MB in the case a crash +does occur. See, however, "open issues" below, as to whether +such a reserved region is really needed. + +Currently the dump will be copied from /proc/vmcore to a +a new file upon user intervention. The starting address +to be read and the range for each data point in provided +in /sys/kernel/release_region. + +The tools to examine the dump will be same as the ones +used for kdump. + + +General notes: +-------------- +Security: please note that there are potential security issues +with any sort of dump mechanism. In particular, plaintext +(unencrypted) data, and possibly passwords, may be present in +the dump data. Userspace tools must take adequate precautions to +preserve security. + +Open issues/ToDo: +------------ + o The various code paths that tell the hypervisor that a crash + occurred, vs. it simply being a normal reboot, should be + reviewed, and possibly clarified/fixed. + + o Instead of using /sys/kernel, should there be a /sys/dump + instead? There is a dump_subsys being created by the s390 code, + perhaps the pseries code should use a similar layout as well. + + o Is reserving a 256MB region really required? The goal of + reserving a 256MB scratch area is to make sure that no + important crash data is clobbered when the hypervisor + save low mem to the scratch area. But, if one could assure + that nothing important is located in some 256MB area, then + it would not need to be reserved. Something that can be + improved in subsequent versions. + + o Still working the kdump team to integrate this with kdump, + some work remains but this would not affect the current + patches. + + o Still need to write a shell script, to copy the dump away. + Currently I am parsing it manually. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-08 0:13 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja @ 2008-01-09 4:29 ` Nathan Lynch 2008-01-09 4:58 ` Michael Ellerman 2008-01-09 15:31 ` Linas Vepstas 0 siblings, 2 replies; 69+ messages in thread From: Nathan Lynch @ 2008-01-09 4:29 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, lkessler, strosake Manish Ahuja wrote: > + > + Hypervisor-Assisted Dump > + ------------------------ > + November 2007 Date is unneeded (and, uhm, dated :) > +The goal of hypervisor-assisted dump is to enable the dump of > +a crashed system, and to do so from a fully-reset system, and > +to minimize the total elapsed time until the system is back > +in production use. Is it actually faster than kdump? > +As compared to kdump or other strategies, hypervisor-assisted > +dump offers several strong, practical advantages: > + > +-- Unlike kdump, the system has been reset, and loaded > + with a fresh copy of the kernel. In particular, > + PCI and I/O devices have been reinitialized and are > + in a clean, consistent state. > +-- As the dump is performed, the dumped memory becomes > + immediately available to the system for normal use. > +-- After the dump is completed, no further reboots are > + required; the system will be fully usable, and running > + in it's normal, production mode on it normal kernel. > + > +The above can only be accomplished by coordination with, > +and assistance from the hypervisor. The procedure is > +as follows: > + > +-- When a system crashes, the hypervisor will save > + the low 256MB of RAM to a previously registered > + save region. It will also save system state, system > + registers, and hardware PTE's. > + > +-- After the low 256MB area has been saved, the > + hypervisor will reset PCI and other hardware state. > + It will *not* clear RAM. It will then launch the > + bootloader, as normal. > + > +-- The freshly booted kernel will notice that there > + is a new node (ibm,dump-kernel) in the device tree, > + indicating that there is crash data available from > + a previous boot. It will boot into only 256MB of RAM, > + reserving the rest of system memory. > + > +-- Userspace tools will parse /sys/kernel/release_region > + and read /proc/vmcore to obtain the contents of memory, > + which holds the previous crashed kernel. The userspace > + tools may copy this info to disk, or network, nas, san, > + iscsi, etc. as desired. > + > + For Example: the values in /sys/kernel/release-region > + would look something like this (address-range pairs). > + CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / > + DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A > + > +-- As the userspace tools complete saving a portion of > + dump, they echo an offset and size to > + /sys/kernel/release_region to release the reserved > + memory back to general use. > + > + An example of this is: > + "echo 0x40000000 0x10000000 > /sys/kernel/release_region" > + which will release 256MB at the 1GB boundary. This violates the "one file, one value" rule of sysfs, but nobody really takes that seriously, I guess. In any case, consider documenting this in Documentation/ABI. > + > +Please note that the hypervisor-assisted dump feature > +is only available on Power6-based systems with recent > +firmware versions. This statement will of course become dated/incorrect so I recommend removing it. > + > +Implementation details: > +---------------------- > +In order for this scheme to work, memory needs to be reserved > +quite early in the boot cycle. However, access to the device > +tree this early in the boot cycle is difficult, and device-tree > +access is needed to determine if there is a crash data waiting. I don't think this bit about early device tree access is correct. By the time your code is reserving memory (from early_init_devtree(), I think), RTAS has been instantiated and you are able to test for the existence of /rtas/ibm,dump-kernel. > +To work around this problem, all but 256MB of RAM is reserved > +during early boot. A short while later in boot, a check is made > +to determine if there is dump data waiting. If there isn't, > +then the reserved memory is released to general kernel use. So I think these gymnastics are unneeded -- unless I'm misunderstanding something, you should be able to determine very early whether to reserve that memory. > +If there is dump data, then the /sys/kernel/release_region > +file is created, and the reserved memory is held. > + > +If there is no waiting dump data, then all but 256MB of the > +reserved ram will be released for general kernel use. The > +highest 256 MB of RAM will *not* be released: this region > +will be kept permanently reserved, so that it can act as > +a receptacle for a copy of the low 256MB in the case a crash > +does occur. See, however, "open issues" below, as to whether > +such a reserved region is really needed. > + > +Currently the dump will be copied from /proc/vmcore to a > +a new file upon user intervention. The starting address > +to be read and the range for each data point in provided ^is > +in /sys/kernel/release_region. > + > +The tools to examine the dump will be same as the ones > +used for kdump. > + > + > +General notes: > +-------------- > +Security: please note that there are potential security issues > +with any sort of dump mechanism. In particular, plaintext > +(unencrypted) data, and possibly passwords, may be present in > +the dump data. Userspace tools must take adequate precautions to > +preserve security. > + > +Open issues/ToDo: > +------------ > + o The various code paths that tell the hypervisor that a crash > + occurred, vs. it simply being a normal reboot, should be > + reviewed, and possibly clarified/fixed. > + > + o Instead of using /sys/kernel, should there be a /sys/dump > + instead? There is a dump_subsys being created by the s390 code, > + perhaps the pseries code should use a similar layout as well. Well, it seems to me that there's little reason to duplicate the s390 layout unless we can actually share code. FWIW, I've been thinking about making a /sys/firmware/phyp hierarchy which could contain much of the System P-specific functions (DLPAR, lparcfg, other crud in /proc/ppc64)... seems suited to this platform-specific dump mechanism. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-09 4:29 ` Nathan Lynch @ 2008-01-09 4:58 ` Michael Ellerman 2008-01-09 15:31 ` Linas Vepstas 1 sibling, 0 replies; 69+ messages in thread From: Michael Ellerman @ 2008-01-09 4:58 UTC (permalink / raw) To: Nathan Lynch; +Cc: lkessler, linuxppc-dev, mahuja, linasvepstas, strosake [-- Attachment #1: Type: text/plain, Size: 4886 bytes --] On Tue, 2008-01-08 at 22:29 -0600, Nathan Lynch wrote: > Manish Ahuja wrote: > > + > > + Hypervisor-Assisted Dump > > + ------------------------ > > + November 2007 > > Date is unneeded (and, uhm, dated :) > > > > +The goal of hypervisor-assisted dump is to enable the dump of > > +a crashed system, and to do so from a fully-reset system, and > > +to minimize the total elapsed time until the system is back > > +in production use. > > Is it actually faster than kdump? > > > > +As compared to kdump or other strategies, hypervisor-assisted > > +dump offers several strong, practical advantages: > > + > > +-- Unlike kdump, the system has been reset, and loaded > > + with a fresh copy of the kernel. In particular, > > + PCI and I/O devices have been reinitialized and are > > + in a clean, consistent state. > > +-- As the dump is performed, the dumped memory becomes > > + immediately available to the system for normal use. > > +-- After the dump is completed, no further reboots are > > + required; the system will be fully usable, and running > > + in it's normal, production mode on it normal kernel. > > + > > +The above can only be accomplished by coordination with, > > +and assistance from the hypervisor. The procedure is > > +as follows: > > + > > +-- When a system crashes, the hypervisor will save > > + the low 256MB of RAM to a previously registered > > + save region. It will also save system state, system > > + registers, and hardware PTE's. > > + > > +-- After the low 256MB area has been saved, the > > + hypervisor will reset PCI and other hardware state. > > + It will *not* clear RAM. It will then launch the > > + bootloader, as normal. > > + > > +-- The freshly booted kernel will notice that there > > + is a new node (ibm,dump-kernel) in the device tree, > > + indicating that there is crash data available from > > + a previous boot. It will boot into only 256MB of RAM, > > + reserving the rest of system memory. > > + > > +-- Userspace tools will parse /sys/kernel/release_region > > + and read /proc/vmcore to obtain the contents of memory, > > + which holds the previous crashed kernel. The userspace > > + tools may copy this info to disk, or network, nas, san, > > + iscsi, etc. as desired. > > + > > + For Example: the values in /sys/kernel/release-region > > + would look something like this (address-range pairs). > > + CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / > > + DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A > > + > > +-- As the userspace tools complete saving a portion of > > + dump, they echo an offset and size to > > + /sys/kernel/release_region to release the reserved > > + memory back to general use. > > + > > + An example of this is: > > + "echo 0x40000000 0x10000000 > /sys/kernel/release_region" > > + which will release 256MB at the 1GB boundary. > > This violates the "one file, one value" rule of sysfs, but nobody > really takes that seriously, I guess. In any case, consider > documenting this in Documentation/ABI. > > > > + > > +Please note that the hypervisor-assisted dump feature > > +is only available on Power6-based systems with recent > > +firmware versions. > > This statement will of course become dated/incorrect so I recommend > removing it. > > > > + > > +Implementation details: > > +---------------------- > > +In order for this scheme to work, memory needs to be reserved > > +quite early in the boot cycle. However, access to the device > > +tree this early in the boot cycle is difficult, and device-tree > > +access is needed to determine if there is a crash data waiting. > > I don't think this bit about early device tree access is correct. By > the time your code is reserving memory (from early_init_devtree(), I > think), RTAS has been instantiated and you are able to test for the > existence of /rtas/ibm,dump-kernel. Yep it's early_init_devtree(), and yes it's fairly easy to access the (flattened) device tree at that point. > > +To work around this problem, all but 256MB of RAM is reserved > > +during early boot. A short while later in boot, a check is made > > +to determine if there is dump data waiting. If there isn't, > > +then the reserved memory is released to general kernel use. > > So I think these gymnastics are unneeded -- unless I'm > misunderstanding something, you should be able to determine very early > whether to reserve that memory. I agree. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-09 4:29 ` Nathan Lynch 2008-01-09 4:58 ` Michael Ellerman @ 2008-01-09 15:31 ` Linas Vepstas 2008-01-09 18:44 ` Nathan Lynch 1 sibling, 1 reply; 69+ messages in thread From: Linas Vepstas @ 2008-01-09 15:31 UTC (permalink / raw) To: Nathan Lynch; +Cc: mahuja, linuxppc-dev, lkessler, strosake Hi Nathan, Thank you for looking at this. On 08/01/2008, Nathan Lynch <ntl@pobox.com> wrote: > Manish Ahuja wrote: > > + > > + Hypervisor-Assisted Dump > > + ------------------------ > > + November 2007 > > Date is unneeded (and, uhm, dated :) I'll argue very very strongly against this. One of the worst things I deal with is undated documents. Google pulls up multiple versions of a document, and you start wondering: "which one of these is the *right one*?" Its only after trying some suggested sequence of commands that aren't working, and more time googling and effort wasted, that you finallly realize that you are reading the version 0.1 of documentation from 1999 that some idiot did not have the decency to put a date on. (A version number is acceptable). Documentation is often in mediocre state, don't compound the problem be leaving off critical info. > > +The goal of hypervisor-assisted dump is to enable the dump of > > +a crashed system, and to do so from a fully-reset system, and > > +to minimize the total elapsed time until the system is back > > +in production use. > > Is it actually faster than kdump? This is a basic presumption; it should blow it away, as, after all, it requires one less reboot! As a side effect, the system is in production *while* the dump is being taken; with kdump, you can't go into production until after the dump is finished, and the system has been rebooted a second time. On systems with terabytes of RAM, the time difference can be hours. > > +-- As the userspace tools complete saving a portion of > > + dump, they echo an offset and size to > > + /sys/kernel/release_region to release the reserved > > + memory back to general use. > > + > > + An example of this is: > > + "echo 0x40000000 0x10000000 > /sys/kernel/release_region" > > + which will release 256MB at the 1GB boundary. > > This violates the "one file, one value" rule of sysfs, but nobody > really takes that seriously, I guess. In any case, consider > documenting this in Documentation/ABI. This interface will almost certainly be changed, in order to allow phyp dump to work with the kdump user-space tools. Its provisional right now, until the details of user-space is hammered out. > > +Please note that the hypervisor-assisted dump feature > > +is only available on Power6-based systems with recent > > +firmware versions. > > This statement will of course become dated/incorrect so I recommend > removing it. A provisional statement. I figure it should get taken out after a few years. > > +Implementation details: > > +---------------------- > > +In order for this scheme to work, memory needs to be reserved > > +quite early in the boot cycle. However, access to the device > > +tree this early in the boot cycle is difficult, and device-tree > > +access is needed to determine if there is a crash data waiting. > > I don't think this bit about early device tree access is correct. By > the time your code is reserving memory (from early_init_devtree(), I > think), RTAS has been instantiated and you are able to test for the > existence of /rtas/ibm,dump-kernel. If I remember right, it was still too early to look up this token directly, so we wrote some code to crawl the flat device tree to find it. But not only was that a lot of work, but I somehow decided that doing this to the flat tree was wrong, as otherwise someone would surely have written the access code. If this can be made to work, that would be great, but we couldn't make it work at the time. > > +To work around this problem, all but 256MB of RAM is reserved > > +during early boot. A short while later in boot, a check is made > > +to determine if there is dump data waiting. If there isn't, > > +then the reserved memory is released to general kernel use. > > So I think these gymnastics are unneeded -- unless I'm > misunderstanding something, you should be able to determine very early > whether to reserve that memory. Only if you can get at rtas, but you can't get at rtas at that point. I think we would have been the earliest users of rtas at that point. Possibly the order of how things are initialized could be changed. > > +Open issues/ToDo: > > +------------ > > + o The various code paths that tell the hypervisor that a crash > > + occurred, vs. it simply being a normal reboot, should be > > + reviewed, and possibly clarified/fixed. > > + > > + o Instead of using /sys/kernel, should there be a /sys/dump > > + instead? There is a dump_subsys being created by the s390 code, > > + perhaps the pseries code should use a similar layout as well. > > Well, it seems to me that there's little reason to duplicate the s390 > layout unless we can actually share code. I'm thinking "user-space tools". This could probably be decided very quickly after a short chat with the s390 architects about tools and standardization and dump formats and the support process. In the end, its some third-level support people who will be dissecting the dump using god-knows-what tool (a hacked kdgb ???), so things should get standardized as much as possible so that we don't duplicate work. > FWIW, I've been thinking about making a /sys/firmware/phyp hierarchy > which could contain much of the System P-specific functions (DLPAR, > lparcfg, other crud in /proc/ppc64)... seems suited to this > platform-specific dump mechanism. Works for me. I was always unclear what /sys/firmware was supposed to contain. --linas ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-09 15:31 ` Linas Vepstas @ 2008-01-09 18:44 ` Nathan Lynch 2008-01-09 19:28 ` Manish Ahuja ` (2 more replies) 0 siblings, 3 replies; 69+ messages in thread From: Nathan Lynch @ 2008-01-09 18:44 UTC (permalink / raw) To: Linas Vepstas; +Cc: mahuja, linuxppc-dev, lkessler, strosake Hi Linas, Linas Vepstas wrote: > > On 08/01/2008, Nathan Lynch <ntl@pobox.com> wrote: > > Manish Ahuja wrote: > > > + > > > +The goal of hypervisor-assisted dump is to enable the dump of > > > +a crashed system, and to do so from a fully-reset system, and > > > +to minimize the total elapsed time until the system is back > > > +in production use. > > > > Is it actually faster than kdump? > > This is a basic presumption; I used the word "actually". I already know that it is intended to be faster. :) > it should blow it away, as, after all, > it requires one less reboot! There's more than rebooting going on during system dump processing. Depending on the system type, booting may not be where most time is spent. > As a side effect, the system is in > production *while* the dump is being taken; A dubious feature IMO. Seems that the design potentially trades reliability of first failure data capture for availability. E.g. system crashes, reboots, resumes processing while copying dump, crashes again before dump procedure is complete. How is that handled, if at all? > with kdump, > you can't go into production until after the dump is finished, > and the system has been rebooted a second time. On > systems with terabytes of RAM, the time difference can be > hours. The difference in time it takes to resume the normal workload may be significant, yes. But the time it takes to get a usable dump image would seem to be the basically the same. Since you bring up large systems... a system with terabytes of RAM is practically guaranteed to be a NUMA configuration with dozens of cpus. When processing a dump on such a system, I wonder how well we fare: can we successfully boot with (say) 128 cpus and 256MB of usable memory? Do we have to hot-online nodes as system memory is freed up (and does that even work)? We need to be able to restore the system to its optimal topology when the dump is finished; if the best we can do is a degraded configuration, the workload will suffer and the system admin is likely to just reboot the machine again so the kernel will have the right NUMA topology. > > > +Implementation details: > > > +---------------------- > > > +In order for this scheme to work, memory needs to be reserved > > > +quite early in the boot cycle. However, access to the device > > > +tree this early in the boot cycle is difficult, and device-tree > > > +access is needed to determine if there is a crash data waiting. > > > > I don't think this bit about early device tree access is correct. By > > the time your code is reserving memory (from early_init_devtree(), I > > think), RTAS has been instantiated and you are able to test for the > > existence of /rtas/ibm,dump-kernel. > > If I remember right, it was still too early to look up this token directly, > so we wrote some code to crawl the flat device tree to find it. But > not only was that a lot of work, but I somehow decided that doing this > to the flat tree was wrong, as otherwise someone would surely have > written the access code. If this can be made to work, that would be > great, but we couldn't make it work at the time. > > > > +To work around this problem, all but 256MB of RAM is reserved > > > +during early boot. A short while later in boot, a check is made > > > +to determine if there is dump data waiting. If there isn't, > > > +then the reserved memory is released to general kernel use. > > > > So I think these gymnastics are unneeded -- unless I'm > > misunderstanding something, you should be able to determine very early > > whether to reserve that memory. > > Only if you can get at rtas, but you can't get at rtas at that point. Sorry, but I think you are mistaken (see Michael's earlier reply). ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-09 18:44 ` Nathan Lynch @ 2008-01-09 19:28 ` Manish Ahuja 2008-01-09 22:59 ` Michael Ellerman 2008-01-10 2:33 ` Linas Vepstas 2 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-01-09 19:28 UTC (permalink / raw) To: Nathan Lynch; +Cc: mahuja, linuxppc-dev, Linas Vepstas, lkessler, strosake > > I used the word "actually". I already know that it is intended to be > faster. :) > >> it should blow it away, as, after all, >> it requires one less reboot! > > There's more than rebooting going on during system dump processing. > Depending on the system type, booting may not be where most time is > spent. > > >> As a side effect, the system is in >> production *while* the dump is being taken; > > A dubious feature IMO. Seems that the design potentially trades > reliability of first failure data capture for availability. > E.g. system crashes, reboots, resumes processing while copying dump, > crashes again before dump procedure is complete. How is that handled, > if at all? This is a simple version. The intent was not to have a complex dump taking mechanism in version 1. Subsequent versions will see planned improvement on the way the pages are tracked and freed. Also it is very easily possible now, to register for another dump as soon as the scratch area is copied to a user designated region. But for now this simple implementation exists. It is also possible to extend this further to only preserve pages that are kernel pages and free the non required pages like user/data pages etc. This would reduce the space preserved and would prevent any issues that are caused by reserving everything in memory except for the first 256 MB. Improvements and future versions are planned to make this efficient. But for now the intent is to get this off the ground and handle simple cases. > > >> with kdump, >> you can't go into production until after the dump is finished, >> and the system has been rebooted a second time. On >> systems with terabytes of RAM, the time difference can be >> hours. > > The difference in time it takes to resume the normal workload may be > significant, yes. But the time it takes to get a usable dump image > would seem to be the basically the same. > > Since you bring up large systems... a system with terabytes of RAM is > practically guaranteed to be a NUMA configuration with dozens of cpus. > When processing a dump on such a system, I wonder how well we fare: > can we successfully boot with (say) 128 cpus and 256MB of usable > memory? Do we have to hot-online nodes as system memory is freed up > (and does that even work)? We need to be able to restore the system > to its optimal topology when the dump is finished; if the best we can > do is a degraded configuration, the workload will suffer and the > system admin is likely to just reboot the machine again so the kernel > will have the right NUMA topology. > > >>>> +Implementation details: >>>> +---------------------- >>>> +In order for this scheme to work, memory needs to be reserved >>>> +quite early in the boot cycle. However, access to the device >>>> +tree this early in the boot cycle is difficult, and device-tree >>>> +access is needed to determine if there is a crash data waiting. >>> I don't think this bit about early device tree access is correct. By >>> the time your code is reserving memory (from early_init_devtree(), I >>> think), RTAS has been instantiated and you are able to test for the >>> existence of /rtas/ibm,dump-kernel. >> If I remember right, it was still too early to look up this token directly, >> so we wrote some code to crawl the flat device tree to find it. But >> not only was that a lot of work, but I somehow decided that doing this >> to the flat tree was wrong, as otherwise someone would surely have >> written the access code. If this can be made to work, that would be >> great, but we couldn't make it work at the time. >> >>>> +To work around this problem, all but 256MB of RAM is reserved >>>> +during early boot. A short while later in boot, a check is made >>>> +to determine if there is dump data waiting. If there isn't, >>>> +then the reserved memory is released to general kernel use. >>> So I think these gymnastics are unneeded -- unless I'm >>> misunderstanding something, you should be able to determine very early >>> whether to reserve that memory. >> Only if you can get at rtas, but you can't get at rtas at that point. > > Sorry, but I think you are mistaken (see Michael's earlier reply). > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-09 18:44 ` Nathan Lynch 2008-01-09 19:28 ` Manish Ahuja @ 2008-01-09 22:59 ` Michael Ellerman 2008-01-09 23:18 ` Manish Ahuja 2008-01-10 2:47 ` Linas Vepstas 2008-01-10 2:33 ` Linas Vepstas 2 siblings, 2 replies; 69+ messages in thread From: Michael Ellerman @ 2008-01-09 22:59 UTC (permalink / raw) To: Nathan Lynch; +Cc: mahuja, linuxppc-dev, Linas Vepstas, lkessler, strosake [-- Attachment #1: Type: text/plain, Size: 4146 bytes --] On Wed, 2008-01-09 at 12:44 -0600, Nathan Lynch wrote: > Hi Linas, > > Linas Vepstas wrote: > > > > On 08/01/2008, Nathan Lynch <ntl@pobox.com> wrote: > > > Manish Ahuja wrote: > > > > + > > > > +The goal of hypervisor-assisted dump is to enable the dump of > > > > +a crashed system, and to do so from a fully-reset system, and > > > > +to minimize the total elapsed time until the system is back > > > > +in production use. > > > > > > Is it actually faster than kdump? > > > > This is a basic presumption; > > > As a side effect, the system is in > > production *while* the dump is being taken; It's in "production" with 256MB of RAM? Err. Sure as the dump progresses more RAM will be freed, but that's hardly production. I think Nathan's right, any sysadmin who wants predictability will probably double reboot anyway. > > with kdump, > > you can't go into production until after the dump is finished, > > and the system has been rebooted a second time. On > > systems with terabytes of RAM, the time difference can be > > hours. > Since you bring up large systems... a system with terabytes of RAM is > practically guaranteed to be a NUMA configuration with dozens of cpus. > When processing a dump on such a system, I wonder how well we fare: > can we successfully boot with (say) 128 cpus and 256MB of usable > memory? Do we have to hot-online nodes as system memory is freed up > (and does that even work)? We need to be able to restore the system > to its optimal topology when the dump is finished; if the best we can > do is a degraded configuration, the workload will suffer and the > system admin is likely to just reboot the machine again so the kernel > will have the right NUMA topology. Yeah that's a good question. Even if the hot-onlining works, there's still kernel data structures allocated at boot which want to be node-local. So the end result will be != a "production" boot. > > > > +Implementation details: > > > > +---------------------- > > > > +In order for this scheme to work, memory needs to be reserved > > > > +quite early in the boot cycle. However, access to the device > > > > +tree this early in the boot cycle is difficult, and device-tree > > > > +access is needed to determine if there is a crash data waiting. > > > > > > I don't think this bit about early device tree access is correct. By > > > the time your code is reserving memory (from early_init_devtree(), I > > > think), RTAS has been instantiated and you are able to test for the > > > existence of /rtas/ibm,dump-kernel. > > > > If I remember right, it was still too early to look up this token directly, > > so we wrote some code to crawl the flat device tree to find it. But > > not only was that a lot of work, but I somehow decided that doing this > > to the flat tree was wrong, as otherwise someone would surely have > > written the access code. If this can be made to work, that would be > > great, but we couldn't make it work at the time. > > > > > > +To work around this problem, all but 256MB of RAM is reserved > > > > +during early boot. A short while later in boot, a check is made > > > > +to determine if there is dump data waiting. If there isn't, > > > > +then the reserved memory is released to general kernel use. > > > > > > So I think these gymnastics are unneeded -- unless I'm > > > misunderstanding something, you should be able to determine very early > > > whether to reserve that memory. > > > > Only if you can get at rtas, but you can't get at rtas at that point. AFAICT you don't need to get at RTAS, you just need to look at the device tree to see if the property is present, and that is trivial. You probably just need to add a check in early_init_dt_scan_rtas() which sets a flag for the PHYP dump stuff, or add your own scan routine if you need. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-09 22:59 ` Michael Ellerman @ 2008-01-09 23:18 ` Manish Ahuja 2008-01-10 2:47 ` Linas Vepstas 1 sibling, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-01-09 23:18 UTC (permalink / raw) To: michael Cc: lkessler, linuxppc-dev, Nathan Lynch, mahuja, Linas Vepstas, strosake > It's in "production" with 256MB of RAM? Err. Sure as the dump progresses > more RAM will be freed, but that's hardly production. I think Nathan's > right, any sysadmin who wants predictability will probably double reboot > anyway. Thats a changeable parameter. Its something we chose for now. It by no means is set in stone. Its not a design parameter. If you like to allocate 1GB we can. But that is something we did for now. we expect this to be a variable value dependent upon the size of the system. So if you have 128 GB system and you can spare 10 gb, you should be able to have 10 GB to boot with. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-09 22:59 ` Michael Ellerman 2008-01-09 23:18 ` Manish Ahuja @ 2008-01-10 2:47 ` Linas Vepstas 2008-01-10 3:55 ` Michael Ellerman 1 sibling, 1 reply; 69+ messages in thread From: Linas Vepstas @ 2008-01-10 2:47 UTC (permalink / raw) To: michael; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake On 09/01/2008, Michael Ellerman <michael@ellerman.id.au> wrote: > > > > Only if you can get at rtas, but you can't get at rtas at that point. > > AFAICT you don't need to get at RTAS, you just need to look at the > device tree to see if the property is present, and that is trivial. > > You probably just need to add a check in early_init_dt_scan_rtas() which > sets a flag for the PHYP dump stuff, or add your own scan routine if you > need. I no longer remember the details. I do remember spending a lot of time trying to figure out how to do this. I know I didn't want to write my own scan routine; maybe that's what stopped me. As it happens, we also did most of the development on a broken phyp which simply did not even have this property, no matter what, and so that may have brain-damaged me. I went for the "most elegant" solution, where "most elegant" is defined as "fewest lines of code", "least effort", etc. Manish may need some hands-on help to extract this token during early boot. Hopefully, he'll let us know. --linas ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-10 2:47 ` Linas Vepstas @ 2008-01-10 3:55 ` Michael Ellerman 0 siblings, 0 replies; 69+ messages in thread From: Michael Ellerman @ 2008-01-10 3:55 UTC (permalink / raw) To: linasvepstas; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake [-- Attachment #1: Type: text/plain, Size: 3344 bytes --] On Wed, 2008-01-09 at 20:47 -0600, Linas Vepstas wrote: > On 09/01/2008, Michael Ellerman <michael@ellerman.id.au> wrote: > > > > > > Only if you can get at rtas, but you can't get at rtas at that point. > > > > AFAICT you don't need to get at RTAS, you just need to look at the > > device tree to see if the property is present, and that is trivial. > > > > You probably just need to add a check in early_init_dt_scan_rtas() which > > sets a flag for the PHYP dump stuff, or add your own scan routine if you > > need. > > I no longer remember the details. I do remember spending a lot of time > trying to figure out how to do this. I know I didn't want to write my own scan > routine; maybe that's what stopped me. As it happens, we also did most > of the development on a broken phyp which simply did not even have > this property, no matter what, and so that may have brain-damaged me. Sure, the API docs for the kernel are a little lacking ;) > I went for the "most elegant" solution, where "most elegant" is defined > as "fewest lines of code", "least effort", etc. > > Manish may need some hands-on help to extract this token during > early boot. Hopefully, he'll let us know. It would just be something like: --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -901,6 +901,11 @@ int __init early_init_dt_scan_rtas(unsigned long node, rtas.size = *sizep; } +#ifdef CONFIG_PHYP_DUMP + if (of_get_flat_dt_prop(node, "ibm,dump-kernel", NULL)) + phyp_dump_is_active++; +#endif + #ifdef CONFIG_UDBG_RTAS_CONSOLE basep = of_get_flat_dt_prop(node, "put-term-char", NULL); if (basep) Or to do your own scan routine: diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index acc0d24..442134e 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -1022,6 +1022,7 @@ void __init early_init_devtree(void *params) /* Some machines might need RTAS info for debugging, grab it now. */ of_scan_flat_dt(early_init_dt_scan_rtas, NULL); #endif + of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL); /* Retrieve various informations from the /chosen node of the * device-tree, including the platform type, initrd location and diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 52e95c2..af2b6e8 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -883,6 +883,19 @@ void __init rtas_initialize(void) #endif } +int __init early_init_dt_scan_phyp_dump(unsigned long node, + const char *uname, int depth, void *data) +{ +#ifdef CONFIG_PHYP_DUMP + if (depth != 1 || strcmp(uname, "rtas") != 0) + return 0; + + if (of_get_flat_dt_prop(node, "ibm,dump-kernel", NULL)) + phyp_dump_is_active++; +#endif + return 1; +} + int __init early_init_dt_scan_rtas(unsigned long node, const char *uname, int depth, void *data) { cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-09 18:44 ` Nathan Lynch 2008-01-09 19:28 ` Manish Ahuja 2008-01-09 22:59 ` Michael Ellerman @ 2008-01-10 2:33 ` Linas Vepstas 2008-01-10 3:17 ` Olof Johansson 2 siblings, 1 reply; 69+ messages in thread From: Linas Vepstas @ 2008-01-10 2:33 UTC (permalink / raw) To: Nathan Lynch; +Cc: mahuja, linuxppc-dev, lkessler, strosake On 09/01/2008, Nathan Lynch <ntl@pobox.com> wrote: > Hi Linas, > > Linas Vepstas wrote: > > > > As a side effect, the system is in > > production *while* the dump is being taken; > > A dubious feature IMO. Hmm. Take it up with Ken Rozendal, this is supposed to be one of the two main selling points of this thing. > Seems that the design potentially trades > reliability of first failure data capture for availability. > E.g. system crashes, reboots, resumes processing while copying dump, > crashes again before dump procedure is complete. How is that handled, > if at all? Its handled by the hypervisor. phyp maintains the copy of the RMO of first crash, until such time that the OS declares the dump of the RMO to be complete. So you'll always have the RMO of the first crash. For the rest of RAM, it will come in two parts: some portion will have been dumped already. The rest has not yet been dumped, and it will still be there, preserved across the second crash. So you get both RMO and all of RAM from the first crash. > > with kdump, > > you can't go into production until after the dump is finished, > > and the system has been rebooted a second time. On > > systems with terabytes of RAM, the time difference can be > > hours. > > The difference in time it takes to resume the normal workload may be > significant, yes. But the time it takes to get a usable dump image > would seem to be the basically the same. Yes. > Since you bring up large systems... a system with terabytes of RAM is > practically guaranteed to be a NUMA configuration with dozens of cpus. > When processing a dump on such a system, I wonder how well we fare: > can we successfully boot with (say) 128 cpus and 256MB of usable > memory? Do we have to hot-online nodes as system memory is freed up > (and does that even work)? We need to be able to restore the system > to its optimal topology when the dump is finished; if the best we can > do is a degraded configuration, the workload will suffer and the > system admin is likely to just reboot the machine again so the kernel > will have the right NUMA topology. Heh. That's the elbow-grease of this thing. The easy part is to get the core function working. The hard part is to test these various configs, and when they don't work, figure out what went wrong. That will take perseverence and brains. --linas ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-10 2:33 ` Linas Vepstas @ 2008-01-10 3:17 ` Olof Johansson 2008-01-10 4:12 ` Linas Vepstas 0 siblings, 1 reply; 69+ messages in thread From: Olof Johansson @ 2008-01-10 3:17 UTC (permalink / raw) To: Linas Vepstas; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote: > Heh. That's the elbow-grease of this thing. The easy part is to get > the core function working. The hard part is to test these various configs, > and when they don't work, figure out what went wrong. That will take > perseverence and brains. This just sounds like a whole lot of extra work to get a feature that already exists. Also, features like these seem to just get tested when the next enterprise distro is released, so they're broken for long stretches of time in mainline. There's a bunch of problems like the NUMA ones, which would by far be easiest to solve by just doing another reboot or kexec, wouldn't they? -Olof ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-10 3:17 ` Olof Johansson @ 2008-01-10 4:12 ` Linas Vepstas 2008-01-10 4:52 ` Michael Ellerman 2008-01-10 16:21 ` Olof Johansson 0 siblings, 2 replies; 69+ messages in thread From: Linas Vepstas @ 2008-01-10 4:12 UTC (permalink / raw) To: Olof Johansson; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake On 09/01/2008, Olof Johansson <olof@lixom.net> wrote: > On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote: > > > Heh. That's the elbow-grease of this thing. The easy part is to get > > the core function working. The hard part is to test these various configs, > > and when they don't work, figure out what went wrong. That will take > > perseverence and brains. > > This just sounds like a whole lot of extra work to get a feature that > already exists. Well, no. kexec is horribly ill-behaved with respect to PCI. The kexec kernel starts running with PCI devices in some random state; maybe they're DMA'ing or who knows what. kexec tries real hard to whack a few needed pci devices into submission but it has been hit-n-miss, and the source of 90% of the kexec headaches and debugging effort. Its not pretty. If all pci-host bridges could shut-down or settle the bus, and raise the #RST line high, and then if all BIOS'es supported this, you'd be right. But they can't .... --linas ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-10 4:12 ` Linas Vepstas @ 2008-01-10 4:52 ` Michael Ellerman 2008-01-10 16:21 ` Olof Johansson 1 sibling, 0 replies; 69+ messages in thread From: Michael Ellerman @ 2008-01-10 4:52 UTC (permalink / raw) To: linasvepstas Cc: linuxppc-dev, lkessler, mahuja, Nathan Lynch, Olof Johansson, strosake [-- Attachment #1: Type: text/plain, Size: 1242 bytes --] On Wed, 2008-01-09 at 22:12 -0600, Linas Vepstas wrote: > On 09/01/2008, Olof Johansson <olof@lixom.net> wrote: > > On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote: > > > > > Heh. That's the elbow-grease of this thing. The easy part is to get > > > the core function working. The hard part is to test these various configs, > > > and when they don't work, figure out what went wrong. That will take > > > perseverence and brains. > > > > This just sounds like a whole lot of extra work to get a feature that > > already exists. > > Well, no. kexec is horribly ill-behaved with respect to PCI. The > kexec kernel starts running with PCI devices in some random > state; maybe they're DMA'ing or who knows what. kexec tries > real hard to whack a few needed pci devices into submission > but it has been hit-n-miss, and the source of 90% of the kexec > headaches and debugging effort. Its not pretty. Isn't that what EEH and the IOMMU are for? :) cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-10 4:12 ` Linas Vepstas 2008-01-10 4:52 ` Michael Ellerman @ 2008-01-10 16:21 ` Olof Johansson 2008-01-10 16:34 ` Linas Vepstas 1 sibling, 1 reply; 69+ messages in thread From: Olof Johansson @ 2008-01-10 16:21 UTC (permalink / raw) To: Linas Vepstas; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake On Wed, Jan 09, 2008 at 10:12:13PM -0600, Linas Vepstas wrote: > On 09/01/2008, Olof Johansson <olof@lixom.net> wrote: > > On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote: > > > > > Heh. That's the elbow-grease of this thing. The easy part is to get > > > the core function working. The hard part is to test these various configs, > > > and when they don't work, figure out what went wrong. That will take > > > perseverence and brains. > > > > This just sounds like a whole lot of extra work to get a feature that > > already exists. > > Well, no. kexec is horribly ill-behaved with respect to PCI. The > kexec kernel starts running with PCI devices in some random > state; maybe they're DMA'ing or who knows what. kexec tries > real hard to whack a few needed pci devices into submission > but it has been hit-n-miss, and the source of 90% of the kexec > headaches and debugging effort. Its not pretty. It surprises me that this hasn't been possible to resolve with less than architecting a completely new interface, given that the platform has all this fancy support for isolating and resetting adapters. After all, the exact same thing has to be done by the hypervisor before rebooting the partition. > If all pci-host bridges could shut-down or settle the bus, and > raise the #RST line high, and then if all BIOS'es supported > this, you'd be right. But they can't .... This argument doesn't hold. We're not talking about some generic PC with a crappy BIOS here, we're specifically talking about POWER6 PHYP. It certainly already has ways to reset adapters in it, or EEH wouldn't work. Actually, the whole phyp dump feature wouldn't work either, since it's exactly what the firmware has to do under the covers as well. -Olof ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-10 16:21 ` Olof Johansson @ 2008-01-10 16:34 ` Linas Vepstas 2008-01-10 21:46 ` Mike Strosaker 0 siblings, 1 reply; 69+ messages in thread From: Linas Vepstas @ 2008-01-10 16:34 UTC (permalink / raw) To: Olof Johansson; +Cc: mahuja, linuxppc-dev, lkessler, Nathan Lynch, strosake On 10/01/2008, Olof Johansson <olof@lixom.net> wrote: > On Wed, Jan 09, 2008 at 10:12:13PM -0600, Linas Vepstas wrote: > > On 09/01/2008, Olof Johansson <olof@lixom.net> wrote: > > > On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote: > > > > > > > Heh. That's the elbow-grease of this thing. The easy part is to get > > > > the core function working. The hard part is to test these various configs, > > > > and when they don't work, figure out what went wrong. That will take > > > > perseverence and brains. > > > > > > This just sounds like a whole lot of extra work to get a feature that > > > already exists. > > > > Well, no. kexec is horribly ill-behaved with respect to PCI. The > > kexec kernel starts running with PCI devices in some random > > state; maybe they're DMA'ing or who knows what. kexec tries > > real hard to whack a few needed pci devices into submission > > but it has been hit-n-miss, and the source of 90% of the kexec > > headaches and debugging effort. Its not pretty. > > It surprises me that this hasn't been possible to resolve with less than > architecting a completely new interface, given that the platform has > all this fancy support for isolating and resetting adapters. After all, > the exact same thing has to be done by the hypervisor before rebooting > the partition. OK, point taken. -- The phyp interfaces are there for AIX, which I guess must not have kexec-like ability. So this is a case of Linux leveraging a feature architected for AIX. -- There's also this idea, somewhat weak, that the crash may have corrupted the ram where the kexec kernel sits. For someone who is used to seeing crashes due to null pointer deref's, this seems fairly unlikely. But perhaps crashes in production systems are more mind-bending. (we did have a case where a USB stick used for boot continued to scribble on memory long after it was supposed to be quiet and unused. This resulted in a very hard to debug crash.) A solution to a corrupted kexec kernel would be to disable memory access to where kexec sits, e.g un-mapping or making r/o the pages where it lies. This begs the questions of "who unhides the kexec kernel", and "what if this 'who' gets corrupted"? In short, the kexec kernel does not boot exactly the same as a cold boot, and so this opens a can of worms about "well, what's different, how do we minimize these differences, etc." and I think that lead AIX to punt, and say "lets just use one single, well-known boot loader/ boot sequence instead of inventing a new one", thus leading to the phyp design. But that's just my guess.. :-) --linas ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-10 16:34 ` Linas Vepstas @ 2008-01-10 21:46 ` Mike Strosaker 2008-01-11 1:26 ` Nathan Lynch 0 siblings, 1 reply; 69+ messages in thread From: Mike Strosaker @ 2008-01-10 21:46 UTC (permalink / raw) To: linasvepstas Cc: linuxppc-dev, lkessler, mahuja, Nathan Lynch, Olof Johansson, strosake Linas Vepstas wrote: > On 10/01/2008, Olof Johansson <olof@lixom.net> wrote: > >>On Wed, Jan 09, 2008 at 10:12:13PM -0600, Linas Vepstas wrote: >> >>>On 09/01/2008, Olof Johansson <olof@lixom.net> wrote: >>> >>>>On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote: >>>> >>>> >>>>>Heh. That's the elbow-grease of this thing. The easy part is to get >>>>>the core function working. The hard part is to test these various configs, >>>>>and when they don't work, figure out what went wrong. That will take >>>>>perseverence and brains. >>>> >>>>This just sounds like a whole lot of extra work to get a feature that >>>>already exists. >>> >>>Well, no. kexec is horribly ill-behaved with respect to PCI. The >>>kexec kernel starts running with PCI devices in some random >>>state; maybe they're DMA'ing or who knows what. kexec tries >>>real hard to whack a few needed pci devices into submission >>>but it has been hit-n-miss, and the source of 90% of the kexec >>>headaches and debugging effort. Its not pretty. >> >>It surprises me that this hasn't been possible to resolve with less than >>architecting a completely new interface, given that the platform has >>all this fancy support for isolating and resetting adapters. After all, >>the exact same thing has to be done by the hypervisor before rebooting >>the partition. > > > OK, point taken. > > -- The phyp interfaces are there for AIX, which I guess must > not have kexec-like ability. So this is a case of Linux leveraging > a feature architected for AIX. Certainly AIX was in a more difficult position at the time, because they don't have a kexec equivalent, and thus were collecting dump data with a potentially faulty kernel. It makes sense to have something outside the partition collect or maintain the data; ideally, some kind of service partition would extract dump data from a failed partition, but giving one partition total access to the memory of another is clearly risky. Both the PHYP-assistance method and the kexec method are ways to simulate that without the risk. At the risk of repeating what others have already said, the PHYP-assistance method provides some advantages that the kexec method cannot: - Availability of the system for production use before the dump data is collected. As was mentioned before, some production systems may choose not to operate with the limited memory initially available after the reboot, but it sure is nice to provide the option. - Ensuring that the devices are in a good state. PHYP doesn't expose a method to force adapters into a frozen state, (which I agree would be useful), and I don't know of any plans to do so. What we are starting to see is that some drivers need modifications in order to work correctly with kdump [1]. Supporting PHYP-assisted dump would eliminate those issues. - The small possibility that the kexec area could have been munged by the failing kernel, preventing it from being able to collect a dump. The NUMA issues are daunting, but not insurmountable. Release early, release often, n'est-ce pas? Mike [1] http://ozlabs.org/pipermail/linuxppc-dev/2007-November/045663.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-10 21:46 ` Mike Strosaker @ 2008-01-11 1:26 ` Nathan Lynch 2008-01-11 16:57 ` Linas Vepstas 0 siblings, 1 reply; 69+ messages in thread From: Nathan Lynch @ 2008-01-11 1:26 UTC (permalink / raw) To: Mike Strosaker Cc: linuxppc-dev, lkessler, mahuja, Olof Johansson, linasvepstas, strosake Mike Strosaker wrote: > > At the risk of repeating what others have already said, the PHYP-assistance > method provides some advantages that the kexec method cannot: > - Availability of the system for production use before the dump data is > collected. As was mentioned before, some production systems may choose not > to operate with the limited memory initially available after the reboot, > but it sure is nice to provide the option. I'm more concerned that this design encourages the user to resume a workload *which is almost certainly known to result in a system crash* before collection of crash data is complete. Maybe the gamble will pay off most of the time, but I wouldn't want to be working support when it doesn't. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-11 1:26 ` Nathan Lynch @ 2008-01-11 16:57 ` Linas Vepstas 2008-01-14 5:24 ` Olof Johansson 0 siblings, 1 reply; 69+ messages in thread From: Linas Vepstas @ 2008-01-11 16:57 UTC (permalink / raw) To: Nathan Lynch; +Cc: linuxppc-dev, lkessler, mahuja, Olof Johansson, strosake On 10/01/2008, Nathan Lynch <ntl@pobox.com> wrote: > Mike Strosaker wrote: > > > > At the risk of repeating what others have already said, the PHYP-assistance > > method provides some advantages that the kexec method cannot: > > - Availability of the system for production use before the dump data is > > collected. As was mentioned before, some production systems may choose not > > to operate with the limited memory initially available after the reboot, > > but it sure is nice to provide the option. > > I'm more concerned that this design encourages the user to resume a > workload *which is almost certainly known to result in a system crash* > before collection of crash data is complete. Maybe the gamble will > pay off most of the time, but I wouldn't want to be working support > when it doesn't. Workloads that cause crashes within hours of startup tend to be weeded-out/discovered during pre-production test of the system to be deployed. Since its pre-production test, dumps can be taken in a leisurely manner. Heck, even a session at the xmon prompt can be contemplated. The problem is when the crash only reproduces after days or weeks of uptime, on a production machine. Since the machine is in production, its got to be brought back up ASAP. Since its crashing only after days/weeks, the dump should have plenty of time to complete. (And if it crashes quickly after that reboot ... well, support people always welcome ways in which a bug can be reproduced more quickly/easily). --linas ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-11 16:57 ` Linas Vepstas @ 2008-01-14 5:24 ` Olof Johansson 2008-01-14 15:21 ` Linas Vepstas 0 siblings, 1 reply; 69+ messages in thread From: Olof Johansson @ 2008-01-14 5:24 UTC (permalink / raw) To: Linas Vepstas; +Cc: lkessler, linuxppc-dev, Nathan Lynch, mahuja, strosake On Fri, Jan 11, 2008 at 10:57:51AM -0600, Linas Vepstas wrote: > On 10/01/2008, Nathan Lynch <ntl@pobox.com> wrote: > > Mike Strosaker wrote: > > > > > > At the risk of repeating what others have already said, the PHYP-assistance > > > method provides some advantages that the kexec method cannot: > > > - Availability of the system for production use before the dump data is > > > collected. As was mentioned before, some production systems may choose not > > > to operate with the limited memory initially available after the reboot, > > > but it sure is nice to provide the option. > > > > I'm more concerned that this design encourages the user to resume a > > workload *which is almost certainly known to result in a system crash* > > before collection of crash data is complete. Maybe the gamble will > > pay off most of the time, but I wouldn't want to be working support > > when it doesn't. > > Workloads that cause crashes within hours of startup tend to be > weeded-out/discovered during pre-production test of the system > to be deployed. Since its pre-production test, dumps can be > taken in a leisurely manner. Heck, even a session at the > xmon prompt can be contemplated. > > The problem is when the crash only reproduces after days or > weeks of uptime, on a production machine. Since the machine > is in production, its got to be brought back up ASAP. Since > its crashing only after days/weeks, the dump should have > plenty of time to complete. (And if it crashes quickly after > that reboot ... well, support people always welcome ways > in which a bug can be reproduced more quickly/easily). How do you expect to have it in full production if you don't have all resources available for it? It's not until the dump has finished that you can return all memory to the production environment and use it. This can very easily be argued in both direction, with no clear winner: If the crash is stress-induced (say a slashdotted website), for those cases it seems more rational to take the time, collect _good data_ even if it takes a little longer, and then go back into production. Especially if the alternative is to go back into production immediately, collect about half of the data, and then crash again. Rinse and repeat. Anyway -- I can agree that some of the arguments w.r.t robustness and reliability of collecting dumps can be higher using this approach. It really surprises me that there's no way to reset a device through PHYP though. Seems like such a fundamental feature. I think people are overly optimistic if they think it'll be possible to do all of this reliably (as in with consistent performance) without a second reboot though. At least without similar amounts of work being done as it would have taken to fix kdump's reliability in the first place. Speaking of reboots. PHYP isn't known for being quick at rebooting a partition, it used to take in the order of minutes even on a small machine. Has that been fixed? If not, the avoiding an extra reboot argument hardly seems like a benefit versus kdump+kexec, which reboots nearly instantly and without involvement from PHYP. -Olof ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 1/8] pseries: phyp dump: Docmentation 2008-01-14 5:24 ` Olof Johansson @ 2008-01-14 15:21 ` Linas Vepstas 0 siblings, 0 replies; 69+ messages in thread From: Linas Vepstas @ 2008-01-14 15:21 UTC (permalink / raw) To: Olof Johansson; +Cc: lkessler, linuxppc-dev, Nathan Lynch, mahuja, strosake On 13/01/2008, Olof Johansson <olof@lixom.net> wrote: > > How do you expect to have it in full production if you don't have all > resources available for it? It's not until the dump has finished that you > can return all memory to the production environment and use it. With the PHYP dump, each chunk of RAM is returned for general use immediately after being dumped; so its not an all-or-nothing proposition. Production systems don't often hit 100% RAM use right out of the gate, they often take hours or days to get there, so again, there should be time to dump. > This can very easily be argued in both direction, with no clear winner: > If the crash is stress-induced (say a slashdotted website), for those > cases it seems more rational to take the time, collect _good data_ even > if it takes a little longer, and then go back into production. Especially > if the alternative is to go back into production immediately, collect > about half of the data, and then crash again. Rinse and repeat. Again, the mode of operation for the phyp dump is that you'll always have all of the data from the *first* crash, even if there are multiple crashes. That's because the the as-yet undumped RAM is not put back into production. > really surprises me that there's no way to reset a device through PHYP > though. Seems like such a fundamental feature. I don't know who said that; that's not right. The EEH function certainly does allow you to halt/restart PCI traffic to a particular device and also to reset the device. So, yes, the pSeries kexec code should call into the eeh subsystem to rationalize the device state. > I think people are overly optimistic if they think it'll be possible > to do all of this reliably (as in with consistent performance) without > a second reboot though. The NUMA issues do concern me. But then, the whole virtualized, fractional-cpu, tickless operation stuff sounds like a performance tuning nightmare to begin with. > At least without similar amounts of work being > done as it would have taken to fix kdump's reliability in the first place. :-) > Speaking of reboots. PHYP isn't known for being quick at rebooting a > partition, it used to take in the order of minutes even on a small > machine. Has that been fixed? Dunno. Probably not. > If not, the avoiding an extra reboot > argument hardly seems like a benefit versus kdump+kexec, which reboots > nearly instantly and without involvement from PHYP. OK, let me tell you what I'm up against right now. I'm dealing with sporadic corruption on my home box. About a month ago, I bought a whizzy ASUS M2NE motherboard & an AMD64 2-core cpu, and two sticks of RAM, 1GB per stick. I have one new hard drive, SATA, and one old hard drive, from my old machine, the PATA. The two disks are mirrored in a RAID-1 config. Running Ubuntu. During install/upgrade a month ago, I noticed some of the install files seemed to have gotten corrupted, but that downloading them again got me a working version. This put a serious frown on my face: maybe a bad ethernet card or connection !? Two weeks ago, gcc stopped working one morning, although it worked fine the night before. I'd done nothing in the interim but sleep. Reinstalling it made it work again. Yesterday, something else stopped working. I found the offending library, I compared file checksums against a known-good version, and they were off. (!!!) Disk corruption? Then apt-get stopped working. The /var/lib/dpkg/status file had randomly corrupted single bytes. Its ascii, I hand repaired it; it had maybe 10 bad bytes out of 2MB total size. I installed tripwire. Between the first run of tripwire, and the second, less than an hour later, it reported several dozen files have changed checksums. Manual inspection of some of these files against known-good versions show that, at least this morning, that's no longer the case. System hasn't crashed in a month, since first boot. So what's going on? Is it possible that one of the two disks is serving up bad data, which explains the funny checksum behaviour? Or maybe its bad RAM, so that a fresh disk read shows good data? If its bad ram, why doesn't the system crash? I forced fsck last night, fsck came back spotless. So ... moral of the story: If phyp is doing some sort of hardware checks and validation, that's great. I wish I could afford a pSeries system for my home computer, because my impression is that they are very stable, and don't do things like data corruption. I'm such a friggin cheapskate that I can't bear to spend many thousands instead of many hundreds of dollars. However, I will trade a longer boot for the dream of higher reliability. --linas ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 2/8] pseries: phyp dump: config file 2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja 2008-01-08 0:13 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja @ 2008-01-08 0:16 ` Manish Ahuja 2008-01-08 3:18 ` Stephen Rothwell 2008-01-08 0:21 ` [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja ` (6 subsequent siblings) 8 siblings, 1 reply; 69+ messages in thread From: Manish Ahuja @ 2008-01-08 0:16 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake Add hypervisor-assisted dump to kernel config Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ----- arch/powerpc/Kconfig | 11 +++++++++++ 1 file changed, 11 insertions(+) Index: linux-2.6.24-rc2-git4/arch/powerpc/Kconfig =================================================================== --- linux-2.6.24-rc2-git4.orig/arch/powerpc/Kconfig 2007-11-14 16:39:20.000000000 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/Kconfig 2007-11-15 14:27:33.000000000 -0600 @@ -261,6 +261,17 @@ config CRASH_DUMP Don't change this unless you know what you are doing. +config PHYP_DUMP + bool "Hypervisor-assisted dump (EXPERIMENTAL)" + depends on PPC_PSERIES && EXPERIMENTAL + default y + help + Hypervisor-assisted dump is meant to be a kdump replacement + offering robustness and speed not possible without system + hypervisor assistence. + + If unsure, say "Y" + config PPCBUG_NVRAM bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC default y if PPC_PREP ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2/8] pseries: phyp dump: config file 2008-01-08 0:16 ` [PATCH 2/8] pseries: phyp dump: config file Manish Ahuja @ 2008-01-08 3:18 ` Stephen Rothwell 0 siblings, 0 replies; 69+ messages in thread From: Stephen Rothwell @ 2008-01-08 3:18 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, lkessler, strosake [-- Attachment #1: Type: text/plain, Size: 445 bytes --] On Mon, 07 Jan 2008 18:16:08 -0600 Manish Ahuja <ahuja@austin.ibm.com> wrote: > > > Add hypervisor-assisted dump to kernel config > > Signed-off-by: Linas Vepstas <linas@austin.ibm.com> This patch should probably come last in the series so that all the code is there before some bisecting autobuilder tries to configure it. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem 2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja 2008-01-08 0:13 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja 2008-01-08 0:16 ` [PATCH 2/8] pseries: phyp dump: config file Manish Ahuja @ 2008-01-08 0:21 ` Manish Ahuja 2008-01-08 3:45 ` Stephen Rothwell 2008-01-08 0:25 ` [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja ` (5 subsequent siblings) 8 siblings, 1 reply; 69+ messages in thread From: Manish Ahuja @ 2008-01-08 0:21 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake Check to see if there actually is data from a previously crashed kernel waiting. If so, Allow user-sapce tools to grab the data (by reading /proc/kcore). When user-space finishes dumping a section, it must release that memory by writing to sysfs. For example, echo "0x40000000 0x10000000" > /sys/kernel/release_region will release 256MB starting at the 1GB. The released memory becomes free for general use. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ------ arch/powerpc/platforms/pseries/phyp_dump.c | 101 +++++++++++++++++++++++++++-- 1 file changed, 96 insertions(+), 5 deletions(-) Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:15:05.000000000 -0600 +++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:24:30.000000000 -0600 @@ -12,17 +12,24 @@ */ #include <linux/init.h> +#include <linux/kobject.h> #include <linux/mm.h> +#include <linux/of.h> #include <linux/pfn.h> #include <linux/swap.h> +#include <linux/sysfs.h> #include <asm/page.h> #include <asm/phyp_dump.h> +#include <asm/rtas.h> /* Global, used to communicate data between early boot and late boot */ static struct phyp_dump phyp_dump_global; struct phyp_dump *phyp_dump_info = &phyp_dump_global; +static int ibm_configure_kernel_dump; + +/* ------------------------------------------------- */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -52,18 +59,102 @@ release_memory_range(unsigned long start } } -static int __init phyp_dump_setup(void) +/* ------------------------------------------------- */ +/** + * sysfs_release_region -- sysfs interface to release memory range. + * + * Usage: + * "echo <start addr> <length> > /sys/kernel/release_region" + * + * Example: + * "echo 0x40000000 0x10000000 > /sys/kernel/release_region" + * + * will release 256MB starting at 1GB. + */ +static ssize_t +store_release_region(struct kset *kset, const char *buf, size_t count) { + unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; + ssize_t ret; - /* If no memory was reserved in early boot, there is nothing to do */ - if (phyp_dump_info->init_reserve_size == 0) - return 0; + ret = sscanf(buf, "%lx %lx", &start_addr, &length); + if (ret != 2) + return -EINVAL; + + /* Range-check - don't free any reserved memory that + * wasn't reserved for phyp-dump */ + if (start_addr < phyp_dump_info->init_reserve_start) + start_addr = phyp_dump_info->init_reserve_start; + + end_addr = phyp_dump_info->init_reserve_start + + phyp_dump_info->init_reserve_size; + if (start_addr+length > end_addr) + length = end_addr - start_addr; + + /* Release the region of memory assed in by user */ + start_pfn = PFN_DOWN(start_addr); + nr_pages = PFN_DOWN(length); + release_memory_range (start_pfn, nr_pages); + + return count; +} + +static ssize_t +show_release_region(struct kset * kset, char *buf) +{ + return sprintf(buf, "ola\n"); +} + +static struct subsys_attribute rr = __ATTR(release_region, 0600, + show_release_region, + store_release_region); + +/* ------------------------------------------------- */ + +static void release_all (void) +{ + unsigned long start_pfn, nr_pages; - /* Release memory that was reserved in early boot */ + /* Release all memory that was reserved in early boot */ start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start); nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size); release_memory_range(start_pfn, nr_pages); +} + +static int __init phyp_dump_setup(void) +{ + struct device_node *rtas; + const int *dump_header; + int header_len = 0; + int rc; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info->init_reserve_size == 0) + return 0; + + /* Return if phyp dump not supported */ + ibm_configure_kernel_dump = rtas_token("ibm,configure-kernel-dump"); + if (ibm_configure_kernel_dump == RTAS_UNKNOWN_SERVICE) { + release_all(); + return -ENOSYS; + } + + /* Is there dump data waiting for us? */ + rtas = of_find_node_by_path("/rtas"); + dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); + if (dump_header == NULL) { + release_all(); + return 0; + } + + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ + rc = subsys_create_file(&kernel_subsys, &rr); + if (rc) { + printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc); + release_all(); + return 0; + } return 0; } ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem 2008-01-08 0:21 ` [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja @ 2008-01-08 3:45 ` Stephen Rothwell 2008-01-08 18:34 ` Linas Vepstas 0 siblings, 1 reply; 69+ messages in thread From: Stephen Rothwell @ 2008-01-08 3:45 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, lkessler, strosake [-- Attachment #1: Type: text/plain, Size: 697 bytes --] On Mon, 07 Jan 2008 18:21:57 -0600 Manish Ahuja <ahuja@austin.ibm.com> wrote: > > +static int __init phyp_dump_setup(void) > +{ > > + /* Is there dump data waiting for us? */ > + rtas = of_find_node_by_path("/rtas"); > + dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); ^^^^^^^^^^^ You could pass NULL here as header_len appears to be unused. Also you need "of_node_put(rtas)" somewhere (probably just here would do). > + if (dump_header == NULL) { > + release_all(); > + return 0; > + } -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem 2008-01-08 3:45 ` Stephen Rothwell @ 2008-01-08 18:34 ` Linas Vepstas 0 siblings, 0 replies; 69+ messages in thread From: Linas Vepstas @ 2008-01-08 18:34 UTC (permalink / raw) To: Stephen Rothwell; +Cc: mahuja, linuxppc-dev, lkessler, strosake On 07/01/2008, Stephen Rothwell <sfr@canb.auug.org.au> wrote: > On Mon, 07 Jan 2008 18:21:57 -0600 Manish Ahuja <ahuja@austin.ibm.com> wrote: > > > > +static int __init phyp_dump_setup(void) > > +{ > > > > + /* Is there dump data waiting for us? */ > > + rtas = of_find_node_by_path("/rtas"); > > + dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); > ^^^^^^^^^^^ > You could pass NULL here as header_len appears to be unused. Also you > need "of_node_put(rtas)" somewhere (probably just here would do). Perhaps the routine should have been "of_get_node_by_path()" ? In ye olden days, "finds" didn't require put, but "gets" did. I'm guessing that this has now all been fixed up for the of_xxx routines, but I think that pci_find_xxx still does not require a pci_put. Why did I bother to write this email? I dunno... --linas ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept 2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja ` (2 preceding siblings ...) 2008-01-08 0:21 ` [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja @ 2008-01-08 0:25 ` Manish Ahuja 2008-01-08 3:16 ` Stephen Rothwell 2008-01-16 4:21 ` Paul Mackerras 2008-01-08 0:28 ` [PATCH 5/8] pseries: phyp dump: register dump area Manish Ahuja ` (4 subsequent siblings) 8 siblings, 2 replies; 69+ messages in thread From: Manish Ahuja @ 2008-01-08 0:25 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/kernel/prom.c | 33 +++++++++++++ arch/powerpc/platforms/pseries/Makefile | 1 arch/powerpc/platforms/pseries/phyp_dump.c | 71 +++++++++++++++++++++++++++++ include/asm-powerpc/phyp_dump.h | 32 +++++++++++++ 4 files changed, 137 insertions(+) Index: linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h 2007-11-19 17:44:21.000000000 -0600 @@ -0,0 +1,32 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyright (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _PPC64_PHYP_DUMP_H +#define _PPC64_PHYP_DUMP_H + +#ifdef CONFIG_PHYP_DUMP + +/* The RMR region will be saved for later dumping + * whenever the kernel crashes. Set this to 256MB. */ +#define PHYP_DUMP_RMR_START 0x0 +#define PHYP_DUMP_RMR_END (1UL<<28) + +struct phyp_dump { + /* Memory that is reserved during very early boot. */ + unsigned long init_reserve_start; + unsigned long init_reserve_size; +}; + +extern struct phyp_dump *phyp_dump_info; + +#endif /* CONFIG_PHYP_DUMP */ +#endif /* _PPC64_PHYP_DUMP_H */ Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-19 19:07:49.000000000 -0600 @@ -0,0 +1,71 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyrhgit (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ + +#include <linux/init.h> +#include <linux/mm.h> +#include <linux/pfn.h> +#include <linux/swap.h> + +#include <asm/page.h> +#include <asm/phyp_dump.h> + +/* Global, used to communicate data between early boot and late boot */ +static struct phyp_dump phyp_dump_global; +struct phyp_dump *phyp_dump_info = &phyp_dump_global; + +/** + * release_memory_range -- release memory previously lmb_reserved + * @start_pfn: starting physical frame number + * @nr_pages: number of pages to free. + * + * This routine will release memory that had been previously + * lmb_reserved in early boot. The released memory becomes + * available for genreal use. + */ +static void +release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +{ + struct page *rpage; + unsigned long end_pfn; + long i; + + end_pfn = start_pfn + nr_pages; + + for (i=start_pfn; i <= end_pfn; i++) { + rpage = pfn_to_page(i); + if (PageReserved(rpage)) { + ClearPageReserved(rpage); + init_page_count(rpage); + __free_page(rpage); + totalram_pages++; + } + } +} + +static int __init phyp_dump_setup(void) +{ + unsigned long start_pfn, nr_pages; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info->init_reserve_size == 0) + return 0; + + /* Release memory that was reserved in early boot */ + start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start); + nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size); + release_memory_range(start_pfn, nr_pages); + + return 0; +} + +subsys_initcall(phyp_dump_setup); Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:43:52.000000000 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:44:21.000000000 -0600 @@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_HVCS) += hvcserver.o obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o +obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c =================================================================== --- linux-2.6.24-rc2-git4.orig/arch/powerpc/kernel/prom.c 2007-11-19 17:43:52.000000000 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c 2007-11-19 17:44:21.000000000 -0600 @@ -51,6 +51,7 @@ #include <asm/machdep.h> #include <asm/pSeries_reconfig.h> #include <asm/pci-bridge.h> +#include <asm/phyp_dump.h> #include <asm/kexec.h> #ifdef DEBUG @@ -1011,6 +1012,37 @@ static void __init early_reserve_mem(voi #endif } +#ifdef CONFIG_PHYP_DUMP + +/** + * reserve_crashed_mem() - reserve all not-yet-dumped mmemory + * + * This routine will reserve almost all of the memory in the + * system, except for a few hundred megabytes used to boot the + * new kernel. As the reserved memory is dumped to the dump + * device (by userland tools), it will be freed and made available. + */ +static void __init reserve_crashed_mem(void) +{ + unsigned long crashed_base, crashed_size; + + /* Reserve *everything* above the RMR. We'll free this real soon. */ + crashed_base = PHYP_DUMP_RMR_END; + crashed_size = lmb_end_of_DRAM() - crashed_base; + + /* XXX crashed_ram_end is wrong, since it may be beyond + * the memory_limit, it will need to be adjusted. */ + lmb_reserve(crashed_base, crashed_size); + + phyp_dump_info->init_reserve_start = crashed_base; + phyp_dump_info->init_reserve_size = crashed_size; +} + +#else +static inline void __init reserve_crashed_mem(void) {} +#endif /* CONFIG_PHYP_DUMP */ + + void __init early_init_devtree(void *params) { DBG(" -> early_init_devtree(%p)\n", params); @@ -1043,6 +1075,7 @@ void __init early_init_devtree(void *par reserve_kdump_trampoline(); reserve_crashkernel(); early_reserve_mem(); + reserve_crashed_mem(); lmb_enforce_memory_limit(memory_limit); lmb_analyze(); ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept 2008-01-08 0:25 ` [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja @ 2008-01-08 3:16 ` Stephen Rothwell 2008-01-16 4:21 ` Paul Mackerras 1 sibling, 0 replies; 69+ messages in thread From: Stephen Rothwell @ 2008-01-08 3:16 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, lkessler, strosake [-- Attachment #1: Type: text/plain, Size: 1771 bytes --] Hi Manish, Just some trivial things ... On Mon, 07 Jan 2008 18:25:31 -0600 Manish Ahuja <ahuja@austin.ibm.com> wrote: > > > +++ linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h 2007-11-19 17:44:21.000000000 -0600 > +#ifndef _PPC64_PHYP_DUMP_H We more usually use _ASM_POWERPC_PHYP_DUMP_H > +#define _PPC64_PHYP_DUMP_H > + > +#ifdef CONFIG_PHYP_DUMP Do these things really need protecting by this CONFIG variable? i.e. does anything change depending on the visibility of these various symbols? > +++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-19 19:07:49.000000000 -0600 > @@ -0,0 +1,71 @@ > +/* > + * Hypervisor-assisted dump > + * > + * Linas Vepstas, Manish Ahuja 2007 > + * Copyrhgit (c) 2007 IBM Corp. ^^^^^^^^^ typo. and you should really be using '©' in new copyright notices (or nothing). > +/** > + * release_memory_range -- release memory previously lmb_reserved > + * @start_pfn: starting physical frame number > + * @nr_pages: number of pages to free. > + * > + * This routine will release memory that had been previously > + * lmb_reserved in early boot. The released memory becomes > + * available for genreal use. ^^^^^^^ typo. > +release_memory_range(unsigned long start_pfn, unsigned long nr_pages) > +{ > + struct page *rpage; > + unsigned long end_pfn; > + long i; > + > + end_pfn = start_pfn + nr_pages; > + > + for (i=start_pfn; i <= end_pfn; i++) { spaces around '=' > +static int __init phyp_dump_setup(void) > +{ > +} > + > +subsys_initcall(phyp_dump_setup); Normally we don't leave a blank line here. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept 2008-01-08 0:25 ` [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja 2008-01-08 3:16 ` Stephen Rothwell @ 2008-01-16 4:21 ` Paul Mackerras 1 sibling, 0 replies; 69+ messages in thread From: Paul Mackerras @ 2008-01-16 4:21 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, lkessler, strosake Manish Ahuja writes: > Initial patch for reserving memory in early boot, and freeing it later. > If the previous boot had ended with a crash, the reserved memory would contain > a copy of the crashed kernel data. The main problem I see here is that if this option is turned on, the kernel now has only 256MB of memory from early boot until subsys_initcalls are done -- on any machine, and whether or not there is actually a dump. That means, for instance, that a machine running bare-metal (such as a G5) might not be able to allocate the hash table for the MMU. Also, any allocations made during that time won't be able to be node-local. So it will be necessary to read the flattened device tree early on to see whether or not there is a dump, so that we don't reserve most of memory in the cases where there isn't a dump. Paul. ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 5/8] pseries: phyp dump: register dump area. 2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja ` (3 preceding siblings ...) 2008-01-08 0:25 ` [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja @ 2008-01-08 0:28 ` Manish Ahuja 2008-01-08 3:59 ` Stephen Rothwell 2008-01-08 0:35 ` [PATCH 6/8] pseries: phyp dump: debugging print routines Manish Ahuja ` (3 subsequent siblings) 8 siblings, 1 reply; 69+ messages in thread From: Manish Ahuja @ 2008-01-08 0:28 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake Set up the actual dump header, register it with the hypervisor. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ------ arch/powerpc/platforms/pseries/phyp_dump.c | 169 +++++++++++++++++++++++++++-- 1 file changed, 163 insertions(+), 6 deletions(-) Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 15:55:37.000000000 -0600 +++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:06:52.000000000 -0600 @@ -30,6 +30,134 @@ struct phyp_dump *phyp_dump_info = &phyp static int ibm_configure_kernel_dump; /* ------------------------------------------------- */ +/* RTAS interfaces to declare the dump regions */ + +struct dump_section { + u32 dump_flags; + u16 source_type; + u16 error_flags; + u64 source_address; + u64 source_length; + u64 length_copied; + u64 destination_address; +}; + +struct phyp_dump_header { + u32 version; + u16 num_of_sections; + u16 status; + + u32 first_offset_section; + u32 dump_disk_section; + u64 block_num_dd; + u64 num_of_blocks_dd; + u32 offset_dd; + u32 maxtime_to_auto; + /* No dump disk path string used */ + + struct dump_section cpu_data; + struct dump_section hpte_data; + struct dump_section kernel_data; +}; + +/* The dump header *must be* in low memory, so .bss it */ +static struct phyp_dump_header phdr; + +#define NUM_DUMP_SECTIONS 3 +#define DUMP_HEADER_VERSION 0x1 +#define DUMP_REQUEST_FLAG 0x1 +#define DUMP_SOURCE_CPU 0x0001 +#define DUMP_SOURCE_HPTE 0x0002 +#define DUMP_SOURCE_RMO 0x0011 + +/** + * init_dump_header() - initialize the header declaring a dump + * Returns: length of dump save area. + * + * When the hypervisor saves crashed state, it needs to put + * it somewhere. The dump header tells the hypervisor where + * the data can be saved. + */ +static unsigned long init_dump_header(struct phyp_dump_header *ph) +{ + struct device_node *rtas; + const unsigned int *sizes; + int len; + unsigned long cpu_state_size = 0; + unsigned long hpte_region_size = 0; + unsigned long addr_offset = 0; + + /* Get the required dump region sizes */ + rtas = of_find_node_by_path("/rtas"); + sizes = of_get_property(rtas, "ibm,configure-kernel-dump-sizes", &len); + if (!sizes || len < 20) + return 0; + + if (sizes[0] == 1) + cpu_state_size = *((unsigned long *) &sizes[1]); + + if (sizes[3] == 2) + hpte_region_size = *((unsigned long *) &sizes[4]); + + /* Set up the dump header */ + ph->version = DUMP_HEADER_VERSION; + ph->num_of_sections = NUM_DUMP_SECTIONS; + ph->status = 0; + + ph->first_offset_section = + (u32) &(((struct phyp_dump_header *) 0)->cpu_data); + ph->dump_disk_section = 0; + ph->block_num_dd = 0; + ph->num_of_blocks_dd = 0; + ph->offset_dd = 0; + + ph->maxtime_to_auto = 0; /* disabled */ + + /* The first two sections are mandatory */ + ph->cpu_data.dump_flags = DUMP_REQUEST_FLAG; + ph->cpu_data.source_type = DUMP_SOURCE_CPU; + ph->cpu_data.source_address = 0; + ph->cpu_data.source_length = cpu_state_size; + ph->cpu_data.destination_address = addr_offset; + addr_offset += cpu_state_size; + + ph->hpte_data.dump_flags = DUMP_REQUEST_FLAG; + ph->hpte_data.source_type = DUMP_SOURCE_HPTE; + ph->hpte_data.source_address = 0; + ph->hpte_data.source_length = hpte_region_size; + ph->hpte_data.destination_address = addr_offset; + addr_offset += hpte_region_size; + + /* This section describes the low kernel region */ + ph->kernel_data.dump_flags = DUMP_REQUEST_FLAG; + ph->kernel_data.source_type = DUMP_SOURCE_RMO; + ph->kernel_data.source_address = PHYP_DUMP_RMR_START; + ph->kernel_data.source_length = PHYP_DUMP_RMR_END; + ph->kernel_data.destination_address = addr_offset; + addr_offset += ph->kernel_data.source_length; + + return addr_offset; +} + +static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + ph->cpu_data.destination_address += addr; + ph->hpte_data.destination_address += addr; + ph->kernel_data.destination_address += addr; + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 1, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) + { + printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc); + } +} + +/* ------------------------------------------------- */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -125,7 +253,11 @@ static void release_all (void) static int __init phyp_dump_setup(void) { struct device_node *rtas; - const int *dump_header; + const struct phyp_dump_header *dump_header; + unsigned long dump_area_start; + unsigned long dump_area_length; + unsigned long free_area_length; + unsigned long start_pfn, nr_pages; int header_len = 0; int rc; @@ -140,22 +272,47 @@ static int __init phyp_dump_setup(void) return -ENOSYS; } - /* Is there dump data waiting for us? */ + /* Is there dump data waiting for us? If there isn't, + * then register a new dump area, and release all of + * the rest of the reserved ram. + * + * The /rtas/ibm,kernel-dump rtas node is present only + * if there is dump data waiting for us. + */ rtas = of_find_node_by_path("/rtas"); dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); + + dump_area_length = init_dump_header (&phdr); + free_area_length = phyp_dump_info->init_reserve_size - dump_area_length; + dump_area_start = phyp_dump_info->init_reserve_start + free_area_length; + dump_area_start = dump_area_start & PAGE_MASK; /* align down */ + free_area_length = dump_area_start - phyp_dump_info->init_reserve_start; + if (dump_header == NULL) { - release_all(); - return 0; + register_dump_area (&phdr, dump_area_start); + goto release_mem; } + /* Don't allow user to release the 256MB scratch area */ + phyp_dump_info->init_reserve_size = free_area_length; + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ rc = subsys_create_file(&kernel_subsys, &rr); if (rc) { printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc); - release_all(); - return 0; + goto release_mem; } + /* ToDo: re-register the dump area, for next time. */ + + return 0; + +release_mem: + /* release everything except the top 256 MB scratch area */ + start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start); + nr_pages = PFN_DOWN(free_area_length); + release_memory_range(start_pfn, nr_pages); + return 0; } > documentation explaining what this is :-) . Yes, its supposed > to be an improvement over kdump. > > The patches mostly work; a list of open issues / todo list > is included in the documentation. It also appears that > the not-yet-released firmware versions this was tested > on are still, ahem, incomplete; this work is also pending. > > -- Linas & Manish > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 5/8] pseries: phyp dump: register dump area. 2008-01-08 0:28 ` [PATCH 5/8] pseries: phyp dump: register dump area Manish Ahuja @ 2008-01-08 3:59 ` Stephen Rothwell 0 siblings, 0 replies; 69+ messages in thread From: Stephen Rothwell @ 2008-01-08 3:59 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, lkessler, strosake [-- Attachment #1: Type: text/plain, Size: 1569 bytes --] Hi Manish, On Mon, 07 Jan 2008 18:28:30 -0600 Manish Ahuja <ahuja@austin.ibm.com> wrote: > > +++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:06:52.000000000 -0600 > +static unsigned long init_dump_header(struct phyp_dump_header *ph) > +{ > + /* Get the required dump region sizes */ > + rtas = of_find_node_by_path("/rtas"); You need to of_node_put(rtas) somewhere. > + if (sizes[0] == 1) > + cpu_state_size = *((unsigned long *) &sizes[1]); We normally don't put spaces after casts. > + ph->first_offset_section = > + (u32) &(((struct phyp_dump_header *) 0)->cpu_data); (u32)offsetof(struct phyp_dump_header, cpu_data); > +static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) > +{ > + if (rc) > + { > + printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc); > + } The braces are not needed. > + > + dump_area_length = init_dump_header (&phdr); We don't put spaces after function names. > + free_area_length = phyp_dump_info->init_reserve_size - dump_area_length; > + dump_area_start = phyp_dump_info->init_reserve_start + free_area_length; > + dump_area_start = dump_area_start & PAGE_MASK; /* align down */ > + free_area_length = dump_area_start - phyp_dump_info->init_reserve_start; > + > if (dump_header == NULL) { > - release_all(); > - return 0; > + register_dump_area (&phdr, dump_area_start); Ditto. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 6/8] pseries: phyp dump: debugging print routines. 2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja ` (4 preceding siblings ...) 2008-01-08 0:28 ` [PATCH 5/8] pseries: phyp dump: register dump area Manish Ahuja @ 2008-01-08 0:35 ` Manish Ahuja 2008-01-08 0:49 ` Arnd Bergmann 2008-01-08 4:03 ` Stephen Rothwell 2008-01-08 0:37 ` [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas Manish Ahuja ` (2 subsequent siblings) 8 siblings, 2 replies; 69+ messages in thread From: Manish Ahuja @ 2008-01-08 0:35 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake Provide some basic debugging support. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> Signed-off-by: Linas Vepsts <linas@austin.ibm.com> ----- arch/powerpc/platforms/pseries/phyp_dump.c | 53 ++++++++++++++++++++++++++++- 1 file changed, 52 insertions(+), 1 deletion(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-01 23:24:10.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-01 23:24:27.000000000 -0600 @@ -2,7 +2,7 @@ * Hypervisor-assisted dump * * Linas Vepstas, Manish Ahuja 2007 - * Copyrhgit (c) 2007 IBM Corp. + * Copyright (c) 2007 IBM Corp. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -139,6 +139,51 @@ static unsigned long init_dump_header(st return addr_offset; } +#ifdef DEBUG +static void print_dump_header(const struct phyp_dump_header *ph) +{ + printk(KERN_INFO "dump header:\n"); + /* setup some ph->sections required */ + printk(KERN_INFO "version = %d\n", ph->version); + printk(KERN_INFO "Sections = %d\n", ph->num_of_sections); + printk(KERN_INFO "Status = 0x%x\n", ph->status); + + /* No ph->disk, so all should be set to 0 */ + printk(KERN_INFO "Offset to first section 0x%x\n", ph->first_offset_section); + printk(KERN_INFO "dump disk sections should be zero\n"); + printk(KERN_INFO "dump disk section = %d\n",ph->dump_disk_section); + printk(KERN_INFO "block num = %ld\n",ph->block_num_dd); + printk(KERN_INFO "number of blocks = %ld\n",ph->num_of_blocks_dd); + printk(KERN_INFO "dump disk offset = %d\n",ph->offset_dd); + printk(KERN_INFO "Max auto time= %d\n",ph->maxtime_to_auto); + + /*set cpu state and hpte states as well scratch pad area */ + printk(KERN_INFO " CPU AREA \n"); + printk(KERN_INFO "cpu dump_flags =%d\n",ph->cpu_data.dump_flags); + printk(KERN_INFO "cpu source_type =%d\n",ph->cpu_data.source_type); + printk(KERN_INFO "cpu error_flags =%d\n",ph->cpu_data.error_flags); + printk(KERN_INFO "cpu source_address =%lx\n",ph->cpu_data.source_address); + printk(KERN_INFO "cpu source_length =%lx\n",ph->cpu_data.source_length); + printk(KERN_INFO "cpu length_copied =%lx\n",ph->cpu_data.length_copied); + + printk(KERN_INFO " HPTE AREA \n"); + printk(KERN_INFO "HPTE dump_flags =%d\n",ph->hpte_data.dump_flags); + printk(KERN_INFO "HPTE source_type =%d\n",ph->hpte_data.source_type); + printk(KERN_INFO "HPTE error_flags =%d\n",ph->hpte_data.error_flags); + printk(KERN_INFO "HPTE source_address =%lx\n",ph->hpte_data.source_address); + printk(KERN_INFO "HPTE source_length =%lx\n",ph->hpte_data.source_length); + printk(KERN_INFO "HPTE length_copied =%lx\n",ph->hpte_data.length_copied); + + printk(KERN_INFO " SRSD AREA \n"); + printk(KERN_INFO "SRSD dump_flags =%d\n",ph->kernel_data.dump_flags); + printk(KERN_INFO "SRSD source_type =%d\n",ph->kernel_data.source_type); + printk(KERN_INFO "SRSD error_flags =%d\n",ph->kernel_data.error_flags); + printk(KERN_INFO "SRSD source_address =%lx\n",ph->kernel_data.source_address); + printk(KERN_INFO "SRSD source_length =%lx\n",ph->kernel_data.source_length); + printk(KERN_INFO "SRSD length_copied =%lx\n",ph->kernel_data.length_copied); +} +#endif + static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; @@ -154,6 +199,9 @@ static void register_dump_area(struct ph if (rc) { printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc); +#ifdef DEBUG + print_dump_header (ph); +#endif } } @@ -271,6 +319,9 @@ static int __init phyp_dump_setup(void) release_all(); return -ENOSYS; } +#ifdef DEBUG + print_dump_header (dump_header); +#endif /* Is there dump data waiting for us? If there isn't, * then register a new dump area, and release all of ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 6/8] pseries: phyp dump: debugging print routines. 2008-01-08 0:35 ` [PATCH 6/8] pseries: phyp dump: debugging print routines Manish Ahuja @ 2008-01-08 0:49 ` Arnd Bergmann 2008-01-08 4:03 ` Stephen Rothwell 1 sibling, 0 replies; 69+ messages in thread From: Arnd Bergmann @ 2008-01-08 0:49 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, lkessler, linasvepstas, strosake On Tuesday 08 January 2008, Manish Ahuja wrote: > +#ifdef DEBUG > +static void print_dump_header(const struct phyp_dump_header *ph) > +{ > +=A0=A0=A0=A0=A0=A0=A0printk(KERN_INFO "dump header:\n"); > ... > +=A0=A0=A0=A0=A0=A0=A0printk(KERN_INFO "SRSD length_copied =3D%lx\n",ph->= kernel_data.length_copied); > +} > +#endif > + If you move the #if to inside of the function, > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0printk (KERN_ERR "phyp-du= mp: unexpected error (%d) on register\n", rc); > +#ifdef DEBUG > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0print_dump_header (ph); > +#endif > =A0=A0=A0=A0=A0=A0=A0=A0} > =A0=A0=A0=A0=A0=A0=A0=A0} > +#ifdef DEBUG > +=A0=A0=A0=A0=A0=A0=A0print_dump_header (dump_header); > +#endif you don't need the #ifdefs here. Arnd <>< ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 6/8] pseries: phyp dump: debugging print routines. 2008-01-08 0:35 ` [PATCH 6/8] pseries: phyp dump: debugging print routines Manish Ahuja 2008-01-08 0:49 ` Arnd Bergmann @ 2008-01-08 4:03 ` Stephen Rothwell 1 sibling, 0 replies; 69+ messages in thread From: Stephen Rothwell @ 2008-01-08 4:03 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, lkessler, strosake [-- Attachment #1: Type: text/plain, Size: 359 bytes --] Hi Manish, Just a trivial comment. On Mon, 07 Jan 2008 18:35:09 -0600 Manish Ahuja <ahuja@austin.ibm.com> wrote: > > + printk(KERN_INFO "dump disk section = %d\n",ph->dump_disk_section); Please put a space after the ','. Here and later. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas. 2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja ` (5 preceding siblings ...) 2008-01-08 0:35 ` [PATCH 6/8] pseries: phyp dump: debugging print routines Manish Ahuja @ 2008-01-08 0:37 ` Manish Ahuja 2008-01-08 4:25 ` Stephen Rothwell 2008-01-08 0:39 ` [PATCH 8/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja 2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja 8 siblings, 1 reply; 69+ messages in thread From: Manish Ahuja @ 2008-01-08 0:37 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake Routines to invalidate and unregister dump routines. Unregister has not been used yet, I will release another patch for that at a later stage with the kdump integration patches. There is also a routine which calculates the regions to be freed and exports that through sysfs. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> --- arch/powerpc/platforms/pseries/phyp_dump.c | 79 ++++++++++++++++++++++++++--- 1 file changed, 73 insertions(+), 6 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-07 18:06:11.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-07 22:57:27.000000000 -0600 @@ -69,6 +69,10 @@ static struct phyp_dump_header phdr; #define DUMP_SOURCE_CPU 0x0001 #define DUMP_SOURCE_HPTE 0x0002 #define DUMP_SOURCE_RMO 0x0011 +#define DUMP_ERROR_FLAG 0x2000 +#define DUMP_TRIGGERED 0x4000 +#define DUMP_PERFORMED 0x8000 + /** * init_dump_header() - initialize the header declaring a dump @@ -187,10 +191,15 @@ static void print_dump_header(const stru static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; - ph->cpu_data.destination_address += addr; - ph->hpte_data.destination_address += addr; - ph->kernel_data.destination_address += addr; + /* Add addr value if not initialized before */ + if (ph->cpu_data.destination_address == 0) { + ph->cpu_data.destination_address += addr; + ph->hpte_data.destination_address += addr; + ph->kernel_data.destination_address += addr; + } + + /* ToDo Invalidate kdump and free memory range. */ do { rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, 1, ph, sizeof(struct phyp_dump_header)); @@ -205,6 +214,49 @@ static void register_dump_area(struct ph } } +static void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + + /* Add addr value if not initialized before */ + if (ph->cpu_data.destination_address == 0) { + ph->cpu_data.destination_address = addr; + ph->hpte_data.destination_address += addr; + ph->kernel_data.destination_address += addr; + } + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 2, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) + { + printk (KERN_ERR "phyp-dump: unexpected error (%d) on invalidate\n", rc); +#ifdef DEBUG + print_dump_header (ph); +#endif + } +} + +static void unregister_dump_area(struct phyp_dump_header *ph) +{ + int rc; + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 3, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) + { + printk (KERN_ERR "phyp-dump: unexpected error (%d) on unregister\n", rc); +#ifdef DEBUG + print_dump_header (ph); +#endif + } +} + /* ------------------------------------------------- */ /** * release_memory_range -- release memory previously lmb_reserved @@ -279,7 +331,17 @@ store_release_region(struct kset *kset, static ssize_t show_release_region(struct kset * kset, char *buf) { - return sprintf(buf, "ola\n"); + u64 second_addr_range; + + /* total reserved size - start of scratch area */ + second_addr_range = phdr.cpu_data.destination_address - + phyp_dump_info->init_reserve_size; + return sprintf(buf, "CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:" + " DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n", + phdr.cpu_data.destination_address, phdr.cpu_data.length_copied, + phdr.hpte_data.destination_address, phdr.hpte_data.length_copied, + phdr.kernel_data.destination_address, phdr.kernel_data.length_copied, + phyp_dump_info->init_reserve_start, second_addr_range); } static struct subsys_attribute rr = __ATTR(release_region, 0600, @@ -344,6 +406,13 @@ static int __init phyp_dump_setup(void) goto release_mem; } + /* re-register the dump area, if old dump was invalid */ + if ((dump_header) && (dump_header->status & DUMP_ERROR_FLAG)) { + invalidate_last_dump (&phdr, dump_area_start); + register_dump_area (&phdr, dump_area_start); + goto release_mem; + } + /* Don't allow user to release the 256MB scratch area */ phyp_dump_info->init_reserve_size = free_area_length; @@ -354,8 +423,6 @@ static int __init phyp_dump_setup(void) goto release_mem; } - /* ToDo: re-register the dump area, for next time. */ - return 0; release_mem: ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas. 2008-01-08 0:37 ` [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas Manish Ahuja @ 2008-01-08 4:25 ` Stephen Rothwell 2008-01-08 22:56 ` Manish Ahuja 0 siblings, 1 reply; 69+ messages in thread From: Stephen Rothwell @ 2008-01-08 4:25 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, lkessler, strosake [-- Attachment #1: Type: text/plain, Size: 2078 bytes --] Hi Manish, Just more trivial stuff. On Mon, 07 Jan 2008 18:37:31 -0600 Manish Ahuja <ahuja@austin.ibm.com> wrote: > > static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) > { > int rc; > - ph->cpu_data.destination_address += addr; > - ph->hpte_data.destination_address += addr; > - ph->kernel_data.destination_address += addr; > > + /* Add addr value if not initialized before */ > + if (ph->cpu_data.destination_address == 0) { > + ph->cpu_data.destination_address += addr; Could be just '=' like further down, right? > +static void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr) > +{ > + if (rc) > + { We prefer if (rc) { > +static void unregister_dump_area(struct phyp_dump_header *ph) > +{ > + if (rc) > + { And again. > static ssize_t > show_release_region(struct kset * kset, char *buf) We normally put the function name on the same line as the type. > { > - return sprintf(buf, "ola\n"); > + u64 second_addr_range; > + > + /* total reserved size - start of scratch area */ > + second_addr_range = phdr.cpu_data.destination_address - > + phyp_dump_info->init_reserve_size; > + return sprintf(buf, "CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:" > + " DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n", > + phdr.cpu_data.destination_address, phdr.cpu_data.length_copied, > + phdr.hpte_data.destination_address, phdr.hpte_data.length_copied, > + phdr.kernel_data.destination_address, phdr.kernel_data.length_copied, > + phyp_dump_info->init_reserve_start, second_addr_range); This indentation should be (probably) two tabs. > + /* re-register the dump area, if old dump was invalid */ > + if ((dump_header) && (dump_header->status & DUMP_ERROR_FLAG)) { ^ ^ Extra parentheses. > + invalidate_last_dump (&phdr, dump_area_start); > + register_dump_area (&phdr, dump_area_start); No spaces after function names. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas. 2008-01-08 4:25 ` Stephen Rothwell @ 2008-01-08 22:56 ` Manish Ahuja 0 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-01-08 22:56 UTC (permalink / raw) To: Stephen Rothwell; +Cc: mahuja, linuxppc-dev, linasvepstas, lkessler, strosake Stephen, >> + /* Add addr value if not initialized before */ >> + if (ph->cpu_data.destination_address == 0) { >> + ph->cpu_data.destination_address += addr; > > Could be just '=' like further down, right? Actually the one below should be += as well. Thanks for catching it. >> + /* total reserved size - start of scratch area */ >> + second_addr_range = phdr.cpu_data.destination_address - >> + phyp_dump_info->init_reserve_size; >> + return sprintf(buf, "CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:" >> + " DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n", >> + phdr.cpu_data.destination_address, phdr.cpu_data.length_copied, >> + phdr.hpte_data.destination_address, phdr.hpte_data.length_copied, >> + phdr.kernel_data.destination_address, phdr.kernel_data.length_copied, >> + phyp_dump_info->init_reserve_start, second_addr_range); > > This indentation should be (probably) two tabs. I kept it one with a few spaces as otherwise it was exceeding 80, I guess, I can just have one per line and that should take care of that. > >> + /* re-register the dump area, if old dump was invalid */ >> + if ((dump_header) && (dump_header->status & DUMP_ERROR_FLAG)) { > ^ ^ > Extra parentheses. Just for clarity.. I would prefer that, if thats okay. > >> + invalidate_last_dump (&phdr, dump_area_start); >> + register_dump_area (&phdr, dump_area_start); > > No spaces after function names. > Yeah, will take that out from here and other files as well. Thanks, Manish ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 8/8] pseries: phyp dump: Tracking memory range freed. 2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja ` (6 preceding siblings ...) 2008-01-08 0:37 ` [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas Manish Ahuja @ 2008-01-08 0:39 ` Manish Ahuja 2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja 8 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-01-08 0:39 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake This patch tracks the size freed. For now it does a simple rudimentary calculation of the ranges freed. The idea is to keep it simple at the external shell script level and send in large chunks for now. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> ----- --- arch/powerpc/platforms/pseries/phyp_dump.c | 36 +++++++++++++++++++++++++++++ include/asm-powerpc/phyp_dump.h | 3 ++ 2 files changed, 39 insertions(+) Index: 2.6.24-rc5/include/asm-powerpc/phyp_dump.h =================================================================== --- 2.6.24-rc5.orig/include/asm-powerpc/phyp_dump.h 2008-01-07 22:55:28.000000000 -0600 +++ 2.6.24-rc5/include/asm-powerpc/phyp_dump.h 2008-01-07 22:58:02.000000000 -0600 @@ -24,6 +24,9 @@ struct phyp_dump { /* Memory that is reserved during very early boot. */ unsigned long init_reserve_start; unsigned long init_reserve_size; + /* Scratch area memory details */ + unsigned long scratch_reserve_start; + unsigned long scratch_reserve_size; }; extern struct phyp_dump *phyp_dump_info; Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-07 22:57:27.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-07 22:58:02.000000000 -0600 @@ -287,6 +287,39 @@ release_memory_range(unsigned long start } } +/** + * track_freed_range -- Counts the range being freed. + * Once the counter goes to zero, it re-registers dump for + * future use. + */ +static void +track_freed_range(unsigned long addr, unsigned long length) +{ + static unsigned long scratch_area_size, reserved_area_size; + + if (addr < phyp_dump_info->init_reserve_start) + return; + + if ((addr >= phyp_dump_info->init_reserve_start) && + (addr <= phyp_dump_info->init_reserve_start + + phyp_dump_info->init_reserve_size)) + reserved_area_size += length; + + if ((addr >= phyp_dump_info->scratch_reserve_start) && + (addr <= phyp_dump_info->scratch_reserve_start + + phyp_dump_info->scratch_reserve_size)) + scratch_area_size += length; + + if ((reserved_area_size == phyp_dump_info->init_reserve_start) && + (scratch_area_size == phyp_dump_info->scratch_reserve_size)) { + + invalidate_last_dump(&phdr, + phyp_dump_info->scratch_reserve_start); + register_dump_area (&phdr, + phyp_dump_info->scratch_reserve_start); + } +} + /* ------------------------------------------------- */ /** * sysfs_release_region -- sysfs interface to release memory range. @@ -310,6 +343,8 @@ store_release_region(struct kset *kset, if (ret != 2) return -EINVAL; + track_freed_range(start_addr, length); + /* Range-check - don't free any reserved memory that * wasn't reserved for phyp-dump */ if (start_addr < phyp_dump_info->init_reserve_start) @@ -414,6 +449,7 @@ static int __init phyp_dump_setup(void) } /* Don't allow user to release the 256MB scratch area */ + /* this might be wrong */ phyp_dump_info->init_reserve_size = free_area_length; /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump 2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja ` (7 preceding siblings ...) 2008-01-08 0:39 ` [PATCH 8/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja @ 2008-02-12 6:31 ` Manish Ahuja 2008-02-12 6:53 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja ` (7 more replies) 8 siblings, 8 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 6:31 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake The following series of patches implement a basic framework for hypervisor-assisted dump. The very first patch provides documentation explaining what this is :-). Yes, its supposed to be an improvement over kdump. A list of open issues / todo list is included in the documentation. It also appears that the not-yet-released firmware versions this was tested on are still, ahem, incomplete; this work is also pending. I have included most of the changes requested. Although, I did find one or two, fixed in a later patch file rather than the first location they appeared at. Also it now does not block any memory on machines other than power6 boxes which have the requisite firmware. This is from a power5 box. from jal-lp6 a power5 machine. ......... Phyp-dump not supported on this hardware Using pSeries machine description console [udbg-1] enabled ....... Since I changed a few more things, I am reposting all the patches. -- Manish & Linas. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 1/8] pseries: phyp dump: Docmentation 2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja @ 2008-02-12 6:53 ` Manish Ahuja 2008-02-12 7:08 ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja ` (6 subsequent siblings) 7 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 6:53 UTC (permalink / raw) To: linuxppc-dev, paulus; +Cc: Manish Ahuja Basic documentation for hypervisor-assisted dump. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> ---- Documentation/powerpc/phyp-assisted-dump.txt | 127 +++++++++++++++++++++++++++ 1 file changed, 127 insertions(+) Index: 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt 2008-02-12 06:38:25.000000000 -0600 @@ -0,0 +1,127 @@ + + Hypervisor-Assisted Dump + ------------------------ + November 2007 + +The goal of hypervisor-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +As compared to kdump or other strategies, hypervisor-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- As the dump is performed, the dumped memory becomes + immediately available to the system for normal use. +-- After the dump is completed, no further reboots are + required; the system will be fully usable, and running + in it's normal, production mode on it normal kernel. + +The above can only be accomplished by coordination with, +and assistance from the hypervisor. The procedure is +as follows: + +-- When a system crashes, the hypervisor will save + the low 256MB of RAM to a previously registered + save region. It will also save system state, system + registers, and hardware PTE's. + +-- After the low 256MB area has been saved, the + hypervisor will reset PCI and other hardware state. + It will *not* clear RAM. It will then launch the + bootloader, as normal. + +-- The freshly booted kernel will notice that there + is a new node (ibm,dump-kernel) in the device tree, + indicating that there is crash data available from + a previous boot. It will boot into only 256MB of RAM, + reserving the rest of system memory. + +-- Userspace tools will parse /sys/kernel/release_region + and read /proc/vmcore to obtain the contents of memory, + which holds the previous crashed kernel. The userspace + tools may copy this info to disk, or network, nas, san, + iscsi, etc. as desired. + + For Example: the values in /sys/kernel/release-region + would look something like this (address-range pairs). + CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / + DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A + +-- As the userspace tools complete saving a portion of + dump, they echo an offset and size to + /sys/kernel/release_region to release the reserved + memory back to general use. + + An example of this is: + "echo 0x40000000 0x10000000 > /sys/kernel/release_region" + which will release 256MB at the 1GB boundary. + +Please note that the hypervisor-assisted dump feature +is only available on Power6-based systems with recent +firmware versions. + +Implementation details: +---------------------- + +During boot, a check is made to see if firmware supports +this feature on this particular machine. If it does, then +we check to see if a active dump is waiting for us. If yes +then everything but 256 MB of RAM is reserved during early +boot. This area is released once we collect a dump from user +land scripts that are run. If there is dump data, then +the /sys/kernel/release_region file is created, and +the reserved memory is held. + +If there is no waiting dump data, then only the highest +256MB of the ram is reserved as a scratch area. This area +is *not* be released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy +of the low 256MB in the case a crash does occur. See, +however, "open issues" below, as to whether +such a reserved region is really needed. + +Currently the dump will be copied from /proc/vmcore to a +a new file upon user intervention. The starting address +to be read and the range for each data point in provided +in /sys/kernel/release_region. + +The tools to examine the dump will be same as the ones +used for kdump. + +General notes: +-------------- +Security: please note that there are potential security issues +with any sort of dump mechanism. In particular, plaintext +(unencrypted) data, and possibly passwords, may be present in +the dump data. Userspace tools must take adequate precautions to +preserve security. + +Open issues/ToDo: +------------ + o The various code paths that tell the hypervisor that a crash + occurred, vs. it simply being a normal reboot, should be + reviewed, and possibly clarified/fixed. + + o Instead of using /sys/kernel, should there be a /sys/dump + instead? There is a dump_subsys being created by the s390 code, + perhaps the pseries code should use a similar layout as well. + + o Is reserving a 256MB region really required? The goal of + reserving a 256MB scratch area is to make sure that no + important crash data is clobbered when the hypervisor + save low mem to the scratch area. But, if one could assure + that nothing important is located in some 256MB area, then + it would not need to be reserved. Something that can be + improved in subsequent versions. + + o Still working the kdump team to integrate this with kdump, + some work remains but this would not affect the current + patches. + + o Still need to write a shell script, to copy the dump away. + Currently I am parsing it manually. ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept 2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja 2008-02-12 6:53 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja @ 2008-02-12 7:08 ` Manish Ahuja 2008-02-12 8:48 ` Michael Ellerman 2008-02-14 3:46 ` Tony Breeds 2008-02-12 7:11 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja ` (5 subsequent siblings) 7 siblings, 2 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 7:08 UTC (permalink / raw) To: linuxppc-dev, paulus; +Cc: mahuja, linasvepstas Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/kernel/prom.c | 50 ++++++++++++++++++++ arch/powerpc/kernel/rtas.c | 32 +++++++++++++ arch/powerpc/platforms/pseries/Makefile | 1 arch/powerpc/platforms/pseries/phyp_dump.c | 71 +++++++++++++++++++++++++++++ include/asm-powerpc/phyp_dump.h | 38 +++++++++++++++ include/asm/rtas.h | 3 + 6 files changed, 195 insertions(+) Index: 2.6.24-rc5/include/asm-powerpc/phyp_dump.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ 2.6.24-rc5/include/asm-powerpc/phyp_dump.h 2008-02-12 06:12:37.000000000 -0600 @@ -0,0 +1,38 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyright (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _PPC64_PHYP_DUMP_H +#define _PPC64_PHYP_DUMP_H + +#ifdef CONFIG_PHYP_DUMP + +/* The RMR region will be saved for later dumping + * whenever the kernel crashes. Set this to 256MB. */ +#define PHYP_DUMP_RMR_START 0x0 +#define PHYP_DUMP_RMR_END (1UL<<28) + +struct phyp_dump { + /* Memory that is reserved during very early boot. */ + unsigned long init_reserve_start; + unsigned long init_reserve_size; + /* Check status during boot if dump supported, active & present*/ + unsigned long phyp_dump_configured; + unsigned long phyp_dump_is_active; + /* store cpu & hpte size */ + unsigned long cpu_state_size; + unsigned long hpte_region_size; +}; + +extern struct phyp_dump *phyp_dump_info; + +#endif /* CONFIG_PHYP_DUMP */ +#endif /* _PPC64_PHYP_DUMP_H */ Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:12:37.000000000 -0600 @@ -0,0 +1,71 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyrhgit (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ + +#include <linux/init.h> +#include <linux/mm.h> +#include <linux/pfn.h> +#include <linux/swap.h> + +#include <asm/page.h> +#include <asm/phyp_dump.h> + +/* Global, used to communicate data between early boot and late boot */ +static struct phyp_dump phyp_dump_global; +struct phyp_dump *phyp_dump_info = &phyp_dump_global; + +/** + * release_memory_range -- release memory previously lmb_reserved + * @start_pfn: starting physical frame number + * @nr_pages: number of pages to free. + * + * This routine will release memory that had been previously + * lmb_reserved in early boot. The released memory becomes + * available for genreal use. + */ +static void +release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +{ + struct page *rpage; + unsigned long end_pfn; + long i; + + end_pfn = start_pfn + nr_pages; + + for (i=start_pfn; i <= end_pfn; i++) { + rpage = pfn_to_page(i); + if (PageReserved(rpage)) { + ClearPageReserved(rpage); + init_page_count(rpage); + __free_page(rpage); + totalram_pages++; + } + } +} + +static int __init phyp_dump_setup(void) +{ + unsigned long start_pfn, nr_pages; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info->init_reserve_size == 0) + return 0; + + /* Release memory that was reserved in early boot */ + start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start); + nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size); + release_memory_range(start_pfn, nr_pages); + + return 0; +} + +subsys_initcall(phyp_dump_setup); Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/Makefile =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/Makefile 2008-02-12 06:12:10.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/Makefile 2008-02-12 06:12:37.000000000 -0600 @@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_HVCS) += hvcserver.o obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o +obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o Index: 2.6.24-rc5/arch/powerpc/kernel/prom.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/kernel/prom.c 2008-02-12 06:12:10.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/kernel/prom.c 2008-02-12 06:12:37.000000000 -0600 @@ -51,6 +51,7 @@ #include <asm/machdep.h> #include <asm/pSeries_reconfig.h> #include <asm/pci-bridge.h> +#include <asm/phyp_dump.h> #include <asm/kexec.h> #ifdef DEBUG @@ -1011,6 +1012,52 @@ static void __init early_reserve_mem(voi #endif } +#ifdef CONFIG_PHYP_DUMP +/** + * reserve_crashed_mem() - reserve all not-yet-dumped mmemory + * + * This routine may reserve memory regions in the kernel only + * if the system is supported and a dump was taken in last + * boot instance or if the hardware is supported and the + * scratch area needs to be setup. In other instances it returns + * without reserving anything. The memory in case of dump being + * active is freed when the dump is collected (by userland tools). + */ +static void __init reserve_crashed_mem(void) +{ + unsigned long base, size; + if (!phyp_dump_info->phyp_dump_configured) { + printk(KERN_ERR "Phyp-dump not supported on this hardware\n"); + return; + } + + if (phyp_dump_info->phyp_dump_is_active) { + /* Reserve *everything* above RMR.Area freed by userland tools*/ + base = PHYP_DUMP_RMR_END; + size = lmb_end_of_DRAM() - base; + + /* XXX crashed_ram_end is wrong, since it may be beyond + * the memory_limit, it will need to be adjusted. */ + lmb_reserve(base, size); + + phyp_dump_info->init_reserve_start = base; + phyp_dump_info->init_reserve_size = size; + } + else { + size = phyp_dump_info->cpu_state_size + + phyp_dump_info->hpte_region_size + + PHYP_DUMP_RMR_END; + base = lmb_end_of_DRAM() - size; + lmb_reserve(base, size); + phyp_dump_info->init_reserve_start = base; + phyp_dump_info->init_reserve_size = size; + } +} +#else +static inline void __init reserve_crashed_mem(void) {} +#endif /* CONFIG_PHYP_DUMP */ + + void __init early_init_devtree(void *params) { DBG(" -> early_init_devtree(%p)\n", params); @@ -1022,6 +1069,8 @@ void __init early_init_devtree(void *par /* Some machines might need RTAS info for debugging, grab it now. */ of_scan_flat_dt(early_init_dt_scan_rtas, NULL); #endif + /* scan tree to see if dump occured during last boot */ + of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL); /* Retrieve various informations from the /chosen node of the * device-tree, including the platform type, initrd location and @@ -1043,6 +1092,7 @@ void __init early_init_devtree(void *par reserve_kdump_trampoline(); reserve_crashkernel(); early_reserve_mem(); + reserve_crashed_mem(); lmb_enforce_memory_limit(memory_limit); lmb_analyze(); Index: 2.6.24-rc5/arch/powerpc/kernel/rtas.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/kernel/rtas.c 2008-02-12 06:12:10.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/kernel/rtas.c 2008-02-12 06:12:37.000000000 -0600 @@ -39,6 +39,7 @@ #include <asm/syscalls.h> #include <asm/smp.h> #include <asm/atomic.h> +#include <asm/phyp_dump.h> struct rtas_t rtas = { .lock = SPIN_LOCK_UNLOCKED @@ -883,6 +884,37 @@ void __init rtas_initialize(void) #endif } +int __init early_init_dt_scan_phyp_dump(unsigned long node, + const char *uname, int depth, void *data) +{ +#ifdef CONFIG_PHYP_DUMP + const unsigned int *sizes; + + phyp_dump_info->phyp_dump_configured = 0; + phyp_dump_info->phyp_dump_is_active = 0; + + if (depth != 1 || strcmp(uname, "rtas") != 0) + return 0; + + if (of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL)) + phyp_dump_info->phyp_dump_configured++; + + if (of_get_flat_dt_prop(node, "ibm,dump-kernel", NULL)) + phyp_dump_info->phyp_dump_is_active++; + + sizes = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes", NULL); + if (!sizes) + return 0; + + if (sizes[0] == 1) + phyp_dump_info->cpu_state_size = *((unsigned long *)&sizes[1]); + + if (sizes[3] == 2) + phyp_dump_info->hpte_region_size = *((unsigned long *)&sizes[4]); +#endif + return 1; +} + int __init early_init_dt_scan_rtas(unsigned long node, const char *uname, int depth, void *data) { Index: 2.6.24-rc5/include/asm/rtas.h =================================================================== --- 2.6.24-rc5.orig/include/asm/rtas.h 2008-02-12 06:12:10.000000000 -0600 +++ 2.6.24-rc5/include/asm/rtas.h 2008-02-12 06:12:37.000000000 -0600 @@ -183,6 +183,9 @@ extern unsigned int rtas_busy_delay(int extern int early_init_dt_scan_rtas(unsigned long node, const char *uname, int depth, void *data); +int early_init_dt_scan_phyp_dump(unsigned long node, + const char *uname, int depth, void *data); + extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal); ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept 2008-02-12 7:08 ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja @ 2008-02-12 8:48 ` Michael Ellerman 2008-02-12 16:38 ` Manish Ahuja 2008-02-14 3:46 ` Tony Breeds 1 sibling, 1 reply; 69+ messages in thread From: Michael Ellerman @ 2008-02-12 8:48 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus [-- Attachment #1: Type: text/plain, Size: 1164 bytes --] On Tue, 2008-02-12 at 01:08 -0600, Manish Ahuja wrote: > Initial patch for reserving memory in early boot, and freeing it later. > If the previous boot had ended with a crash, the reserved memory would contain > a copy of the crashed kernel data. > > Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> > Signed-off-by: Linas Vepstas <linas@austin.ibm.com> > > ---- > arch/powerpc/kernel/prom.c | 50 ++++++++++++++++++++ > arch/powerpc/kernel/rtas.c | 32 +++++++++++++ > arch/powerpc/platforms/pseries/Makefile | 1 > arch/powerpc/platforms/pseries/phyp_dump.c | 71 +++++++++++++++++++++++++++++ > include/asm-powerpc/phyp_dump.h | 38 +++++++++++++++ > include/asm/rtas.h | 3 + ^^^^^^^^^^^^^^^^^^ asm/rtas.h doesn't exist. You need to clean your tree, and patch asm-powerpc/rtas.h instead. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept 2008-02-12 8:48 ` Michael Ellerman @ 2008-02-12 16:38 ` Manish Ahuja 0 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 16:38 UTC (permalink / raw) To: michael; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus Michael, Fixed. -Manish ---------- Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/kernel/prom.c | 50 ++++++++++++++++++++ arch/powerpc/kernel/rtas.c | 32 +++++++++++++ arch/powerpc/platforms/pseries/Makefile | 1 arch/powerpc/platforms/pseries/phyp_dump.c | 71 +++++++++++++++++++++++++++++ include/asm-powerpc/phyp_dump.h | 38 +++++++++++++++ include/asm-powerpc/rtas.h | 3 + 6 files changed, 195 insertions(+) Index: 2.6.24-rc5/include/asm-powerpc/phyp_dump.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ 2.6.24-rc5/include/asm-powerpc/phyp_dump.h 2008-02-12 16:12:45.000000000 -0600 @@ -0,0 +1,38 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyright (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _PPC64_PHYP_DUMP_H +#define _PPC64_PHYP_DUMP_H + +#ifdef CONFIG_PHYP_DUMP + +/* The RMR region will be saved for later dumping + * whenever the kernel crashes. Set this to 256MB. */ +#define PHYP_DUMP_RMR_START 0x0 +#define PHYP_DUMP_RMR_END (1UL<<28) + +struct phyp_dump { + /* Memory that is reserved during very early boot. */ + unsigned long init_reserve_start; + unsigned long init_reserve_size; + /* Check status during boot if dump supported, active & present*/ + unsigned long phyp_dump_configured; + unsigned long phyp_dump_is_active; + /* store cpu & hpte size */ + unsigned long cpu_state_size; + unsigned long hpte_region_size; +}; + +extern struct phyp_dump *phyp_dump_info; + +#endif /* CONFIG_PHYP_DUMP */ +#endif /* _PPC64_PHYP_DUMP_H */ Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 16:12:45.000000000 -0600 @@ -0,0 +1,71 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyrhgit (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ + +#include <linux/init.h> +#include <linux/mm.h> +#include <linux/pfn.h> +#include <linux/swap.h> + +#include <asm/page.h> +#include <asm/phyp_dump.h> + +/* Global, used to communicate data between early boot and late boot */ +static struct phyp_dump phyp_dump_global; +struct phyp_dump *phyp_dump_info = &phyp_dump_global; + +/** + * release_memory_range -- release memory previously lmb_reserved + * @start_pfn: starting physical frame number + * @nr_pages: number of pages to free. + * + * This routine will release memory that had been previously + * lmb_reserved in early boot. The released memory becomes + * available for genreal use. + */ +static void +release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +{ + struct page *rpage; + unsigned long end_pfn; + long i; + + end_pfn = start_pfn + nr_pages; + + for (i=start_pfn; i <= end_pfn; i++) { + rpage = pfn_to_page(i); + if (PageReserved(rpage)) { + ClearPageReserved(rpage); + init_page_count(rpage); + __free_page(rpage); + totalram_pages++; + } + } +} + +static int __init phyp_dump_setup(void) +{ + unsigned long start_pfn, nr_pages; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info->init_reserve_size == 0) + return 0; + + /* Release memory that was reserved in early boot */ + start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start); + nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size); + release_memory_range(start_pfn, nr_pages); + + return 0; +} + +subsys_initcall(phyp_dump_setup); Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/Makefile =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/Makefile 2008-02-12 16:11:44.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/Makefile 2008-02-12 16:12:45.000000000 -0600 @@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_HVCS) += hvcserver.o obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o +obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o Index: 2.6.24-rc5/arch/powerpc/kernel/prom.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/kernel/prom.c 2008-02-12 16:11:44.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/kernel/prom.c 2008-02-12 16:12:45.000000000 -0600 @@ -51,6 +51,7 @@ #include <asm/machdep.h> #include <asm/pSeries_reconfig.h> #include <asm/pci-bridge.h> +#include <asm/phyp_dump.h> #include <asm/kexec.h> #ifdef DEBUG @@ -1011,6 +1012,52 @@ static void __init early_reserve_mem(voi #endif } +#ifdef CONFIG_PHYP_DUMP +/** + * reserve_crashed_mem() - reserve all not-yet-dumped mmemory + * + * This routine may reserve memory regions in the kernel only + * if the system is supported and a dump was taken in last + * boot instance or if the hardware is supported and the + * scratch area needs to be setup. In other instances it returns + * without reserving anything. The memory in case of dump being + * active is freed when the dump is collected (by userland tools). + */ +static void __init reserve_crashed_mem(void) +{ + unsigned long base, size; + if (!phyp_dump_info->phyp_dump_configured) { + printk(KERN_ERR "Phyp-dump not supported on this hardware\n"); + return; + } + + if (phyp_dump_info->phyp_dump_is_active) { + /* Reserve *everything* above RMR.Area freed by userland tools*/ + base = PHYP_DUMP_RMR_END; + size = lmb_end_of_DRAM() - base; + + /* XXX crashed_ram_end is wrong, since it may be beyond + * the memory_limit, it will need to be adjusted. */ + lmb_reserve(base, size); + + phyp_dump_info->init_reserve_start = base; + phyp_dump_info->init_reserve_size = size; + } + else { + size = phyp_dump_info->cpu_state_size + + phyp_dump_info->hpte_region_size + + PHYP_DUMP_RMR_END; + base = lmb_end_of_DRAM() - size; + lmb_reserve(base, size); + phyp_dump_info->init_reserve_start = base; + phyp_dump_info->init_reserve_size = size; + } +} +#else +static inline void __init reserve_crashed_mem(void) {} +#endif /* CONFIG_PHYP_DUMP */ + + void __init early_init_devtree(void *params) { DBG(" -> early_init_devtree(%p)\n", params); @@ -1022,6 +1069,8 @@ void __init early_init_devtree(void *par /* Some machines might need RTAS info for debugging, grab it now. */ of_scan_flat_dt(early_init_dt_scan_rtas, NULL); #endif + /* scan tree to see if dump occured during last boot */ + of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL); /* Retrieve various informations from the /chosen node of the * device-tree, including the platform type, initrd location and @@ -1043,6 +1092,7 @@ void __init early_init_devtree(void *par reserve_kdump_trampoline(); reserve_crashkernel(); early_reserve_mem(); + reserve_crashed_mem(); lmb_enforce_memory_limit(memory_limit); lmb_analyze(); Index: 2.6.24-rc5/arch/powerpc/kernel/rtas.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/kernel/rtas.c 2008-02-12 16:11:44.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/kernel/rtas.c 2008-02-12 16:12:45.000000000 -0600 @@ -39,6 +39,7 @@ #include <asm/syscalls.h> #include <asm/smp.h> #include <asm/atomic.h> +#include <asm/phyp_dump.h> struct rtas_t rtas = { .lock = SPIN_LOCK_UNLOCKED @@ -883,6 +884,37 @@ void __init rtas_initialize(void) #endif } +int __init early_init_dt_scan_phyp_dump(unsigned long node, + const char *uname, int depth, void *data) +{ +#ifdef CONFIG_PHYP_DUMP + const unsigned int *sizes; + + phyp_dump_info->phyp_dump_configured = 0; + phyp_dump_info->phyp_dump_is_active = 0; + + if (depth != 1 || strcmp(uname, "rtas") != 0) + return 0; + + if (of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL)) + phyp_dump_info->phyp_dump_configured++; + + if (of_get_flat_dt_prop(node, "ibm,dump-kernel", NULL)) + phyp_dump_info->phyp_dump_is_active++; + + sizes = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes", NULL); + if (!sizes) + return 0; + + if (sizes[0] == 1) + phyp_dump_info->cpu_state_size = *((unsigned long *)&sizes[1]); + + if (sizes[3] == 2) + phyp_dump_info->hpte_region_size = *((unsigned long *)&sizes[4]); +#endif + return 1; +} + int __init early_init_dt_scan_rtas(unsigned long node, const char *uname, int depth, void *data) { Index: 2.6.24-rc5/include/asm-powerpc/rtas.h =================================================================== --- 2.6.24-rc5.orig/include/asm-powerpc/rtas.h 2008-02-12 16:09:48.000000000 -0600 +++ 2.6.24-rc5/include/asm-powerpc/rtas.h 2008-02-12 16:12:45.000000000 -0600 @@ -183,6 +183,9 @@ extern unsigned int rtas_busy_delay(int extern int early_init_dt_scan_rtas(unsigned long node, const char *uname, int depth, void *data); +int early_init_dt_scan_phyp_dump(unsigned long node, + const char *uname, int depth, void *data); + extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal); ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept 2008-02-12 7:08 ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja 2008-02-12 8:48 ` Michael Ellerman @ 2008-02-14 3:46 ` Tony Breeds 2008-02-14 23:12 ` Olof Johansson 1 sibling, 1 reply; 69+ messages in thread From: Tony Breeds @ 2008-02-14 3:46 UTC (permalink / raw) To: Manish Ahuja; +Cc: linuxppc-dev On Tue, Feb 12, 2008 at 01:08:25AM -0600, Manish Ahuja wrote: > > Initial patch for reserving memory in early boot, and freeing it later. > If the previous boot had ended with a crash, the reserved memory would contain > a copy of the crashed kernel data. > > Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> > Signed-off-by: Linas Vepstas <linas@austin.ibm.com> > > ---- > arch/powerpc/kernel/prom.c | 50 ++++++++++++++++++++ > arch/powerpc/kernel/rtas.c | 32 +++++++++++++ > arch/powerpc/platforms/pseries/Makefile | 1 > arch/powerpc/platforms/pseries/phyp_dump.c | 71 +++++++++++++++++++++++++++++ > include/asm-powerpc/phyp_dump.h | 38 +++++++++++++++ > include/asm/rtas.h | 3 + > 6 files changed, 195 insertions(+) > > Index: 2.6.24-rc5/include/asm-powerpc/phyp_dump.h > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ 2.6.24-rc5/include/asm-powerpc/phyp_dump.h 2008-02-12 06:12:37.000000000 -0600 > @@ -0,0 +1,38 @@ > +/* > + * Hypervisor-assisted dump > + * > + * Linas Vepstas, Manish Ahuja 2007 > + * Copyright (c) 2007 IBM Corp. Hi Manish, Sorry for the minor nits but this should be: --- * Linas Vepstas, Manish Ahuja 2008 * Copyright 2008 IBM Corp. --- You can optionally use the '©' symbol after word 'Copyright' but you shouldn't use '(c)' anymore. Also in at least one place you've misspelt "Copyright" Yours Tony linux.conf.au http://linux.conf.au/ || http://lca2008.linux.org.au/ Jan 28 - Feb 02 2008 The Australian Linux Technical Conference! ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept 2008-02-14 3:46 ` Tony Breeds @ 2008-02-14 23:12 ` Olof Johansson 2008-02-15 7:16 ` Manish Ahuja 0 siblings, 1 reply; 69+ messages in thread From: Olof Johansson @ 2008-02-14 23:12 UTC (permalink / raw) To: Tony Breeds; +Cc: linuxppc-dev On Thu, Feb 14, 2008 at 02:46:21PM +1100, Tony Breeds wrote: > Hi Manish, > Sorry for the minor nits but this should be: > > --- > * Linas Vepstas, Manish Ahuja 2008 > * Copyright 2008 IBM Corp. > --- > > You can optionally use the '??' symbol after word 'Copyright' but you > shouldn't use '(c)' anymore. > > Also in at least one place you've misspelt "Copyright" If we're going to nitpick, then I'd like to point out that the whole series needs to be run through checkpatch and at least the whitespace issues should be taken care of. I'm still not convinced that this is a useful feature compared to hardening kdump, especially now that ehea can handle kexec/kdump (patch posted the other day). But in the end it's up to Paul if he wants to take it or not, not me. -Olof ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept 2008-02-14 23:12 ` Olof Johansson @ 2008-02-15 7:16 ` Manish Ahuja 0 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-15 7:16 UTC (permalink / raw) To: Olof Johansson; +Cc: linuxppc-dev Olof, I will run it through checkpatch before resubmitting. Thanks, Manish Olof Johansson wrote: > On Thu, Feb 14, 2008 at 02:46:21PM +1100, Tony Breeds wrote: > >> Hi Manish, >> Sorry for the minor nits but this should be: >> >> --- >> * Linas Vepstas, Manish Ahuja 2008 >> * Copyright 2008 IBM Corp. >> --- >> >> You can optionally use the '??' symbol after word 'Copyright' but you >> shouldn't use '(c)' anymore. >> >> Also in at least one place you've misspelt "Copyright" > > If we're going to nitpick, then I'd like to point out that the whole > series needs to be run through checkpatch and at least the whitespace > issues should be taken care of. > > I'm still not convinced that this is a useful feature compared to > hardening kdump, especially now that ehea can handle kexec/kdump (patch > posted the other day). But in the end it's up to Paul if he wants to > take it or not, not me. > > > -Olof ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem 2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja 2008-02-12 6:53 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja 2008-02-12 7:08 ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja @ 2008-02-12 7:11 ` Manish Ahuja 2008-02-12 10:08 ` Stephen Rothwell 2008-02-15 1:05 ` Tony Breeds 2008-02-12 7:14 ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja ` (4 subsequent siblings) 7 siblings, 2 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 7:11 UTC (permalink / raw) To: linuxppc-dev, paulus; +Cc: mahuja, linasvepstas Check to see if there actually is data from a previously crashed kernel waiting. If so, Allow user-space tools to grab the data (by reading /proc/kcore). When user-space finishes dumping a section, it must release that memory by writing to sysfs. For example, echo "0x40000000 0x10000000" > /sys/kernel/release_region will release 256MB starting at the 1GB. The released memory becomes free for general use. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ------ arch/powerpc/platforms/pseries/phyp_dump.c | 88 +++++++++++++++++++++++++++-- 1 file changed, 82 insertions(+), 6 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:12:37.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:12:55.000000000 -0600 @@ -12,17 +12,24 @@ */ #include <linux/init.h> +#include <linux/kobject.h> #include <linux/mm.h> +#include <linux/of.h> #include <linux/pfn.h> #include <linux/swap.h> +#include <linux/sysfs.h> #include <asm/page.h> #include <asm/phyp_dump.h> +#include <asm/rtas.h> /* Global, used to communicate data between early boot and late boot */ static struct phyp_dump phyp_dump_global; struct phyp_dump *phyp_dump_info = &phyp_dump_global; +static int ibm_configure_kernel_dump; + +/* ------------------------------------------------- */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -52,20 +59,89 @@ release_memory_range(unsigned long start } } -static int __init phyp_dump_setup(void) +/* ------------------------------------------------- */ +/** + * sysfs_release_region -- sysfs interface to release memory range. + * + * Usage: + * "echo <start addr> <length> > /sys/kernel/release_region" + * + * Example: + * "echo 0x40000000 0x10000000 > /sys/kernel/release_region" + * + * will release 256MB starting at 1GB. + */ +static ssize_t +store_release_region(struct kset *kset, const char *buf, size_t count) { + unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; + ssize_t ret; + + ret = sscanf(buf, "%lx %lx", &start_addr, &length); + if (ret != 2) + return -EINVAL; + + /* Range-check - don't free any reserved memory that + * wasn't reserved for phyp-dump */ + if (start_addr < phyp_dump_info->init_reserve_start) + start_addr = phyp_dump_info->init_reserve_start; + + end_addr = phyp_dump_info->init_reserve_start + + phyp_dump_info->init_reserve_size; + if (start_addr+length > end_addr) + length = end_addr - start_addr; + + /* Release the region of memory assed in by user */ + start_pfn = PFN_DOWN(start_addr); + nr_pages = PFN_DOWN(length); + release_memory_range (start_pfn, nr_pages); + + return count; +} + +static ssize_t +show_release_region(struct kset * kset, char *buf) +{ + return sprintf(buf, "ola\n"); +} + +static struct subsys_attribute rr = __ATTR(release_region, 0600, + show_release_region, + store_release_region); + +static int __init phyp_dump_setup(void) +{ + struct device_node *rtas; + const int *dump_header; + int header_len = 0; + int rc; /* If no memory was reserved in early boot, there is nothing to do */ if (phyp_dump_info->init_reserve_size == 0) return 0; - /* Release memory that was reserved in early boot */ - start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start); - nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size); - release_memory_range(start_pfn, nr_pages); + /* Return if phyp dump not supported */ + if (!phyp_dump_info->phyp_dump_configured) { + return -ENOSYS; + } + + /* Is there dump data waiting for us? */ + rtas = of_find_node_by_path("/rtas"); + dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); + if (dump_header == NULL) { + release_all(); + return 0; + } + + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ + rc = subsys_create_file(&kernel_subsys, &rr); + if (rc) { + printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc); + release_all(); + return 0; + } return 0; } - subsys_initcall(phyp_dump_setup); ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem 2008-02-12 7:11 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja @ 2008-02-12 10:08 ` Stephen Rothwell 2008-02-12 16:40 ` Manish Ahuja 2008-02-15 1:05 ` Tony Breeds 1 sibling, 1 reply; 69+ messages in thread From: Stephen Rothwell @ 2008-02-12 10:08 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus [-- Attachment #1: Type: text/plain, Size: 510 bytes --] Hi Manish, Just a small comment. On Tue, 12 Feb 2008 01:11:58 -0600 Manish Ahuja <ahuja@austin.ibm.com> wrote: > > + /* Is there dump data waiting for us? */ > + rtas = of_find_node_by_path("/rtas"); > + dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); You need an of_node_put(rtas) here. > + if (dump_header == NULL) { > + release_all(); > + return 0; > + } -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem 2008-02-12 10:08 ` Stephen Rothwell @ 2008-02-12 16:40 ` Manish Ahuja 0 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 16:40 UTC (permalink / raw) To: Stephen Rothwell; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus As noted, its fixed in patch 4. If its okay for this time, I will prefer to leave it there. -Manish Stephen Rothwell wrote: > Hi Manish, > > Just a small comment. > > On Tue, 12 Feb 2008 01:11:58 -0600 Manish Ahuja <ahuja@austin.ibm.com> wrote: >> + /* Is there dump data waiting for us? */ >> + rtas = of_find_node_by_path("/rtas"); >> + dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); > > You need an of_node_put(rtas) here. > >> + if (dump_header == NULL) { >> + release_all(); >> + return 0; >> + } > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem 2008-02-12 7:11 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja 2008-02-12 10:08 ` Stephen Rothwell @ 2008-02-15 1:05 ` Tony Breeds 2008-02-15 7:17 ` Manish Ahuja 2008-02-15 17:30 ` Linas Vepstas 1 sibling, 2 replies; 69+ messages in thread From: Tony Breeds @ 2008-02-15 1:05 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus On Tue, Feb 12, 2008 at 01:11:58AM -0600, Manish Ahuja wrote: <snip> > +static ssize_t > +show_release_region(struct kset * kset, char *buf) > +{ > + return sprintf(buf, "ola\n"); > +} > + > +static struct subsys_attribute rr = __ATTR(release_region, 0600, > + show_release_region, > + store_release_region); Any reason this sysfs attribute can't be write only? The show method doesn't seem needed. > +static int __init phyp_dump_setup(void) > +{ <snip> > + /* Is there dump data waiting for us? */ > + rtas = of_find_node_by_path("/rtas"); > + dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); Hmm this isn't good. You need to check rtas != NULL. > + if (dump_header == NULL) { > + release_all(); > + return 0; > + } > + > + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ > + rc = subsys_create_file(&kernel_subsys, &rr); > + if (rc) { > + printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc); > + release_all(); > + return 0; > + } > > return 0; > } > - > subsys_initcall(phyp_dump_setup); Hmm I think this really should be a: machine_subsys_initcall(pseries, phyp_dump_setup) Yours Tony linux.conf.au http://linux.conf.au/ || http://lca2008.linux.org.au/ Jan 28 - Feb 02 2008 The Australian Linux Technical Conference! ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem 2008-02-15 1:05 ` Tony Breeds @ 2008-02-15 7:17 ` Manish Ahuja 2008-02-15 22:32 ` Tony Breeds 2008-02-15 17:30 ` Linas Vepstas 1 sibling, 1 reply; 69+ messages in thread From: Manish Ahuja @ 2008-02-15 7:17 UTC (permalink / raw) To: Tony Breeds; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus Tony Breeds wrote: > On Tue, Feb 12, 2008 at 01:11:58AM -0600, Manish Ahuja wrote: > > <snip> > >> +static ssize_t >> +show_release_region(struct kset * kset, char *buf) >> +{ >> + return sprintf(buf, "ola\n"); >> +} >> + >> +static struct subsys_attribute rr = __ATTR(release_region, 0600, >> + show_release_region, >> + store_release_region); > > Any reason this sysfs attribute can't be write only? The show method > doesn't seem needed. yes, its used later in the code. > >> +static int __init phyp_dump_setup(void) >> +{ > > <snip> > >> + /* Is there dump data waiting for us? */ >> + rtas = of_find_node_by_path("/rtas"); >> + dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); > > Hmm this isn't good. You need to check rtas != NULL. yes, will fix this as well. > >> + if (dump_header == NULL) { >> + release_all(); >> + return 0; >> + } >> + >> + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ >> + rc = subsys_create_file(&kernel_subsys, &rr); >> + if (rc) { >> + printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc); >> + release_all(); >> + return 0; >> + } >> >> return 0; >> } >> - >> subsys_initcall(phyp_dump_setup); > > Hmm I think this really should be a: > machine_subsys_initcall(pseries, phyp_dump_setup) > > Yours Tony > > linux.conf.au http://linux.conf.au/ || http://lca2008.linux.org.au/ > Jan 28 - Feb 02 2008 The Australian Linux Technical Conference! > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem 2008-02-15 7:17 ` Manish Ahuja @ 2008-02-15 22:32 ` Tony Breeds 0 siblings, 0 replies; 69+ messages in thread From: Tony Breeds @ 2008-02-15 22:32 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus On Fri, Feb 15, 2008 at 01:17:16AM -0600, Manish Ahuja wrote: > Tony Breeds wrote: > > Any reason this sysfs attribute can't be write only? The show method > > doesn't seem needed. > > yes, its used later in the code. I see that now, thanks. From my point of view it would make reviewing these patches easier if each patch was a correct and simple as possible. In this case it would have made the review easier if the sysfs attribute was write only now and then modified to add the read side when it's actually implemented. The same goes for fixing typosi, cosmetic changes and reference counting. Looking forward to a respin of this patch series. Yours Tony linux.conf.au http://linux.conf.au/ || http://lca2008.linux.org.au/ Jan 28 - Feb 02 2008 The Australian Linux Technical Conference! ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem 2008-02-15 1:05 ` Tony Breeds 2008-02-15 7:17 ` Manish Ahuja @ 2008-02-15 17:30 ` Linas Vepstas 1 sibling, 0 replies; 69+ messages in thread From: Linas Vepstas @ 2008-02-15 17:30 UTC (permalink / raw) To: Tony Breeds; +Cc: mahuja, linuxppc-dev, paulus On 14/02/2008, Tony Breeds <tony@bakeyournoodle.com> wrote: > On Tue, Feb 12, 2008 at 01:11:58AM -0600, Manish Ahuja wrote: > > +static ssize_t > > +show_release_region(struct kset * kset, char *buf) > > +{ > > + return sprintf(buf, "ola\n"); > > +} > > + > > +static struct subsys_attribute rr = __ATTR(release_region, 0600, > > + show_release_region, > > + store_release_region); > > > Any reason this sysfs attribute can't be write only? The show method > doesn't seem needed. This was supposed to be a place-holder; a later patch would add detailed info. The goal was to have user-land tools that would operate these files to progressively dump and release memory regions; however, until these userland tools get written, the proper interface remains murky (e.g. real addresses? virtual addresses? just delta's or a whole memory map? some sort of numa flags or whatever?) --linas ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 4/8] pseries: phyp dump: register dump area. 2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja ` (2 preceding siblings ...) 2008-02-12 7:11 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja @ 2008-02-12 7:14 ` Manish Ahuja 2008-02-12 10:11 ` Stephen Rothwell 2008-02-12 7:16 ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja ` (3 subsequent siblings) 7 siblings, 1 reply; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 7:14 UTC (permalink / raw) To: linuxppc-dev, paulus; +Cc: mahuja, linasvepstas Set up the actual dump header, register it with the hypervisor. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ------ arch/powerpc/platforms/pseries/phyp_dump.c | 136 +++++++++++++++++++++++++++-- 1 file changed, 129 insertions(+), 7 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:12:55.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:01.000000000 -0600 @@ -30,6 +30,117 @@ struct phyp_dump *phyp_dump_info = &phyp static int ibm_configure_kernel_dump; /* ------------------------------------------------- */ +/* RTAS interfaces to declare the dump regions */ + +struct dump_section { + u32 dump_flags; + u16 source_type; + u16 error_flags; + u64 source_address; + u64 source_length; + u64 length_copied; + u64 destination_address; +}; + +struct phyp_dump_header { + u32 version; + u16 num_of_sections; + u16 status; + + u32 first_offset_section; + u32 dump_disk_section; + u64 block_num_dd; + u64 num_of_blocks_dd; + u32 offset_dd; + u32 maxtime_to_auto; + /* No dump disk path string used */ + + struct dump_section cpu_data; + struct dump_section hpte_data; + struct dump_section kernel_data; +}; + +/* The dump header *must be* in low memory, so .bss it */ +static struct phyp_dump_header phdr; + +#define NUM_DUMP_SECTIONS 3 +#define DUMP_HEADER_VERSION 0x1 +#define DUMP_REQUEST_FLAG 0x1 +#define DUMP_SOURCE_CPU 0x0001 +#define DUMP_SOURCE_HPTE 0x0002 +#define DUMP_SOURCE_RMO 0x0011 + +/** + * init_dump_header() - initialize the header declaring a dump + * Returns: length of dump save area. + * + * When the hypervisor saves crashed state, it needs to put + * it somewhere. The dump header tells the hypervisor where + * the data can be saved. + */ +static unsigned long init_dump_header(struct phyp_dump_header *ph) +{ + unsigned long addr_offset = 0; + + /* Set up the dump header */ + ph->version = DUMP_HEADER_VERSION; + ph->num_of_sections = NUM_DUMP_SECTIONS; + ph->status = 0; + + ph->first_offset_section = + (u32)offsetof(struct phyp_dump_header, cpu_data); + ph->dump_disk_section = 0; + ph->block_num_dd = 0; + ph->num_of_blocks_dd = 0; + ph->offset_dd = 0; + + ph->maxtime_to_auto = 0; /* disabled */ + + /* The first two sections are mandatory */ + ph->cpu_data.dump_flags = DUMP_REQUEST_FLAG; + ph->cpu_data.source_type = DUMP_SOURCE_CPU; + ph->cpu_data.source_address = 0; + ph->cpu_data.source_length = phyp_dump_info->cpu_state_size; + ph->cpu_data.destination_address = addr_offset; + addr_offset += phyp_dump_info->cpu_state_size; + + ph->hpte_data.dump_flags = DUMP_REQUEST_FLAG; + ph->hpte_data.source_type = DUMP_SOURCE_HPTE; + ph->hpte_data.source_address = 0; + ph->hpte_data.source_length = phyp_dump_info->hpte_region_size; + ph->hpte_data.destination_address = addr_offset; + addr_offset += phyp_dump_info->hpte_region_size; + + /* This section describes the low kernel region */ + ph->kernel_data.dump_flags = DUMP_REQUEST_FLAG; + ph->kernel_data.source_type = DUMP_SOURCE_RMO; + ph->kernel_data.source_address = PHYP_DUMP_RMR_START; + ph->kernel_data.source_length = PHYP_DUMP_RMR_END; + ph->kernel_data.destination_address = addr_offset; + addr_offset += ph->kernel_data.source_length; + + return addr_offset; +} + +static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + ph->cpu_data.destination_address += addr; + ph->hpte_data.destination_address += addr; + ph->kernel_data.destination_address += addr; + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 1, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) + { + printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc); + } +} + +/* ------------------------------------------------- */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -113,7 +224,9 @@ static struct subsys_attribute rr = __AT static int __init phyp_dump_setup(void) { struct device_node *rtas; - const int *dump_header; + const struct phyp_dump_header *dump_header; + unsigned long dump_area_start; + unsigned long dump_area_length; int header_len = 0; int rc; @@ -126,22 +239,31 @@ static int __init phyp_dump_setup(void) return -ENOSYS; } - /* Is there dump data waiting for us? */ + /* Is there dump data waiting for us? If there isn't, + * then register a new dump area, and release all of + * the rest of the reserved ram. + * + * The /rtas/ibm,kernel-dump rtas node is present only + * if there is dump data waiting for us. + */ rtas = of_find_node_by_path("/rtas"); dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); + of_node_put(rtas); + + dump_area_length = init_dump_header(&phdr); + dump_area_start = phyp_dump_info->init_reserve_start & PAGE_MASK; /* align down */ + if (dump_header == NULL) { - release_all(); + register_dump_area(&phdr, dump_area_start); return 0; } /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ rc = subsys_create_file(&kernel_subsys, &rr); - if (rc) { + if (rc) printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc); - release_all(); - return 0; - } + /* ToDo: re-register the dump area, for next time. */ return 0; } subsys_initcall(phyp_dump_setup); ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 4/8] pseries: phyp dump: register dump area. 2008-02-12 7:14 ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja @ 2008-02-12 10:11 ` Stephen Rothwell 2008-02-12 16:31 ` Manish Ahuja 0 siblings, 1 reply; 69+ messages in thread From: Stephen Rothwell @ 2008-02-12 10:11 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus [-- Attachment #1: Type: text/plain, Size: 656 bytes --] Hi Manish, > - /* Is there dump data waiting for us? */ > + /* Is there dump data waiting for us? If there isn't, > + * then register a new dump area, and release all of > + * the rest of the reserved ram. > + * > + * The /rtas/ibm,kernel-dump rtas node is present only > + * if there is dump data waiting for us. > + */ > rtas = of_find_node_by_path("/rtas"); > dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); > + of_node_put(rtas); Oh, here is the of_node_put() - you should move that to patch 3. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 4/8] pseries: phyp dump: register dump area. 2008-02-12 10:11 ` Stephen Rothwell @ 2008-02-12 16:31 ` Manish Ahuja 0 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 16:31 UTC (permalink / raw) To: Stephen Rothwell; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus For now, if we can leave this patch as is, that will be great. That move requires me to work all remaining patches as they apply uncleanly after that. I will bunch those two together functionally next time onwards. Thanks, Manish Stephen Rothwell wrote: > Hi Manish, > >> - /* Is there dump data waiting for us? */ >> + /* Is there dump data waiting for us? If there isn't, >> + * then register a new dump area, and release all of >> + * the rest of the reserved ram. >> + * >> + * The /rtas/ibm,kernel-dump rtas node is present only >> + * if there is dump data waiting for us. >> + */ >> rtas = of_find_node_by_path("/rtas"); >> dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); >> + of_node_put(rtas); > > Oh, here is the of_node_put() - you should move that to patch 3. > ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 5/8] pseries: phyp dump: debugging print routines. 2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja ` (3 preceding siblings ...) 2008-02-12 7:14 ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja @ 2008-02-12 7:16 ` Manish Ahuja 2008-02-12 7:18 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja ` (2 subsequent siblings) 7 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 7:16 UTC (permalink / raw) To: linuxppc-dev, paulus; +Cc: mahuja, linasvepstas Provide some basic debugging support. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> Signed-off-by: Linas Vepsts <linas@austin.ibm.com> ----- arch/powerpc/platforms/pseries/phyp_dump.c | 64 +++++++++++++++++++++++++++-- 1 file changed, 60 insertions(+), 4 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:01.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:06.000000000 -0600 @@ -2,7 +2,7 @@ * Hypervisor-assisted dump * * Linas Vepstas, Manish Ahuja 2007 - * Copyrhgit (c) 2007 IBM Corp. + * Copyright (c) 2007 IBM Corp. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -122,6 +122,61 @@ static unsigned long init_dump_header(st return addr_offset; } +static void print_dump_header(const struct phyp_dump_header *ph) +{ +#ifdef DEBUG + printk(KERN_INFO "dump header:\n"); + /* setup some ph->sections required */ + printk(KERN_INFO "version = %d\n", ph->version); + printk(KERN_INFO "Sections = %d\n", ph->num_of_sections); + printk(KERN_INFO "Status = 0x%x\n", ph->status); + + /* No ph->disk, so all should be set to 0 */ + printk(KERN_INFO "Offset to first section 0x%x\n", + ph->first_offset_section); + printk(KERN_INFO "dump disk sections should be zero\n"); + printk(KERN_INFO "dump disk section = %d\n", ph->dump_disk_section); + printk(KERN_INFO "block num = %ld\n", ph->block_num_dd); + printk(KERN_INFO "number of blocks = %ld\n", ph->num_of_blocks_dd); + printk(KERN_INFO "dump disk offset = %d\n", ph->offset_dd); + printk(KERN_INFO "Max auto time= %d\n", ph->maxtime_to_auto); + + /*set cpu state and hpte states as well scratch pad area */ + printk(KERN_INFO " CPU AREA \n"); + printk(KERN_INFO "cpu dump_flags =%d\n", ph->cpu_data.dump_flags); + printk(KERN_INFO "cpu source_type =%d\n", ph->cpu_data.source_type); + printk(KERN_INFO "cpu error_flags =%d\n", ph->cpu_data.error_flags); + printk(KERN_INFO "cpu source_address =%lx\n", + ph->cpu_data.source_address); + printk(KERN_INFO "cpu source_length =%lx\n", + ph->cpu_data.source_length); + printk(KERN_INFO "cpu length_copied =%lx\n", + ph->cpu_data.length_copied); + + printk(KERN_INFO " HPTE AREA \n"); + printk(KERN_INFO "HPTE dump_flags =%d\n", ph->hpte_data.dump_flags); + printk(KERN_INFO "HPTE source_type =%d\n", ph->hpte_data.source_type); + printk(KERN_INFO "HPTE error_flags =%d\n", ph->hpte_data.error_flags); + printk(KERN_INFO "HPTE source_address =%lx\n", + ph->hpte_data.source_address); + printk(KERN_INFO "HPTE source_length =%lx\n", + ph->hpte_data.source_length); + printk(KERN_INFO "HPTE length_copied =%lx\n", + ph->hpte_data.length_copied); + + printk(KERN_INFO " SRSD AREA \n"); + printk(KERN_INFO "SRSD dump_flags =%d\n", ph->kernel_data.dump_flags); + printk(KERN_INFO "SRSD source_type =%d\n", ph->kernel_data.source_type); + printk(KERN_INFO "SRSD error_flags =%d\n", ph->kernel_data.error_flags); + printk(KERN_INFO "SRSD source_address =%lx\n", + ph->kernel_data.source_address); + printk(KERN_INFO "SRSD source_length =%lx\n", + ph->kernel_data.source_length); + printk(KERN_INFO "SRSD length_copied =%lx\n", + ph->kernel_data.length_copied); +#endif +} + static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; @@ -134,9 +189,9 @@ static void register_dump_area(struct ph 1, ph, sizeof(struct phyp_dump_header)); } while (rtas_busy_delay(rc)); - if (rc) - { - printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc); + if (rc) { + printk(KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc); + print_dump_header(ph); } } @@ -238,6 +293,7 @@ static int __init phyp_dump_setup(void) if (!phyp_dump_info->phyp_dump_configured) { return -ENOSYS; } + print_dump_header(dump_header); /* Is there dump data waiting for us? If there isn't, * then register a new dump area, and release all of ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas. 2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja ` (4 preceding siblings ...) 2008-02-12 7:16 ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja @ 2008-02-12 7:18 ` Manish Ahuja 2008-02-12 10:18 ` Stephen Rothwell 2008-02-13 21:43 ` Manish Ahuja 2008-02-12 7:20 ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja 2008-02-12 7:21 ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja 7 siblings, 2 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 7:18 UTC (permalink / raw) To: linuxppc-dev, paulus; +Cc: mahuja, linasvepstas Routines to a. invalidate dump b. Calculate region that is reserved and needs to be freed. This is exported through sysfs interface. Unregister has been removed for now as it wasn't being used. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> ----- --- arch/powerpc/platforms/pseries/phyp_dump.c | 85 +++++++++++++++++++++++++---- include/asm/phyp_dump.h | 6 +- 2 files changed, 79 insertions(+), 12 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:06.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:17.000000000 -0600 @@ -69,6 +69,10 @@ static struct phyp_dump_header phdr; #define DUMP_SOURCE_CPU 0x0001 #define DUMP_SOURCE_HPTE 0x0002 #define DUMP_SOURCE_RMO 0x0011 +#define DUMP_ERROR_FLAG 0x2000 +#define DUMP_TRIGGERED 0x4000 +#define DUMP_PERFORMED 0x8000 + /** * init_dump_header() - initialize the header declaring a dump @@ -180,9 +184,15 @@ static void print_dump_header(const stru static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; - ph->cpu_data.destination_address += addr; - ph->hpte_data.destination_address += addr; - ph->kernel_data.destination_address += addr; + + /* Add addr value if not initialized before */ + if (ph->cpu_data.destination_address == 0) { + ph->cpu_data.destination_address += addr; + ph->hpte_data.destination_address += addr; + ph->kernel_data.destination_address += addr; + } + + /* ToDo Invalidate kdump and free memory range. */ do { rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, @@ -195,6 +205,30 @@ static void register_dump_area(struct ph } } +static +void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + + /* Add addr value if not initialized before */ + if (ph->cpu_data.destination_address == 0) { + ph->cpu_data.destination_address += addr; + ph->hpte_data.destination_address += addr; + ph->kernel_data.destination_address += addr; + } + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 2, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) { + printk (KERN_ERR "phyp-dump: unexpected error (%d) " + "on invalidate\n", rc); + print_dump_header(ph); + } +} + /* ------------------------------------------------- */ /** * release_memory_range -- release memory previously lmb_reserved @@ -205,8 +239,8 @@ static void register_dump_area(struct ph * lmb_reserved in early boot. The released memory becomes * available for genreal use. */ -static void -release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +static +void release_memory_range(unsigned long start_pfn, unsigned long nr_pages) { struct page *rpage; unsigned long end_pfn; @@ -237,8 +271,8 @@ release_memory_range(unsigned long start * * will release 256MB starting at 1GB. */ -static ssize_t -store_release_region(struct kset *kset, const char *buf, size_t count) +static +ssize_t store_release_region(struct kset *kset, const char *buf, size_t count) { unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; @@ -266,10 +300,23 @@ store_release_region(struct kset *kset, return count; } -static ssize_t -show_release_region(struct kset * kset, char *buf) +static ssize_t show_release_region(struct kset * kset, char *buf) { - return sprintf(buf, "ola\n"); + u64 second_addr_range; + + /* total reserved size - start of scratch area */ + second_addr_range = phyp_dump_info->init_reserve_size - + phyp_dump_info->reserved_scratch_size; + return sprintf(buf, "CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:" + " DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n", + phdr.cpu_data.destination_address, + phdr.cpu_data.length_copied, + phdr.hpte_data.destination_address, + phdr.hpte_data.length_copied, + phdr.kernel_data.destination_address, + phdr.kernel_data.length_copied, + phyp_dump_info->init_reserve_start, + second_addr_range); } static struct subsys_attribute rr = __ATTR(release_region, 0600, @@ -293,7 +340,6 @@ static int __init phyp_dump_setup(void) if (!phyp_dump_info->phyp_dump_configured) { return -ENOSYS; } - print_dump_header(dump_header); /* Is there dump data waiting for us? If there isn't, * then register a new dump area, and release all of @@ -305,6 +351,7 @@ static int __init phyp_dump_setup(void) rtas = of_find_node_by_path("/rtas"); dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); of_node_put(rtas); + print_dump_header(dump_header); dump_area_length = init_dump_header(&phdr); dump_area_start = phyp_dump_info->init_reserve_start & PAGE_MASK; /* align down */ @@ -314,6 +361,22 @@ static int __init phyp_dump_setup(void) return 0; } + /* re-register the dump area, if old dump was invalid */ + if ((dump_header) && (dump_header->status & DUMP_ERROR_FLAG)) { + invalidate_last_dump(&phdr, dump_area_start); + register_dump_area(&phdr, dump_area_start); + return 0; + } + + if (dump_header) { + phyp_dump_info->reserved_scratch_addr = + dump_header->cpu_data.destination_address; + phyp_dump_info->reserved_scratch_size = + dump_header->cpu_data.source_length + + dump_header->hpte_data.source_length + + dump_header->kernel_data.source_length; + } + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ rc = subsys_create_file(&kernel_subsys, &rr); if (rc) Index: 2.6.24-rc5/include/asm/phyp_dump.h =================================================================== --- 2.6.24-rc5.orig/include/asm/phyp_dump.h 2008-02-12 06:12:37.000000000 -0600 +++ 2.6.24-rc5/include/asm/phyp_dump.h 2008-02-12 06:13:17.000000000 -0600 @@ -18,7 +18,8 @@ /* The RMR region will be saved for later dumping * whenever the kernel crashes. Set this to 256MB. */ #define PHYP_DUMP_RMR_START 0x0 -#define PHYP_DUMP_RMR_END (1UL<<28) +/*#define PHYP_DUMP_RMR_END (1UL<<28)*/ +#define PHYP_DUMP_RMR_END (1UL<<27) struct phyp_dump { /* Memory that is reserved during very early boot. */ @@ -30,6 +31,9 @@ struct phyp_dump { /* store cpu & hpte size */ unsigned long cpu_state_size; unsigned long hpte_region_size; + /* previous scratch area values */ + unsigned long reserved_scratch_addr; + unsigned long reserved_scratch_size; }; extern struct phyp_dump *phyp_dump_info; ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas. 2008-02-12 7:18 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja @ 2008-02-12 10:18 ` Stephen Rothwell 2008-02-12 16:32 ` Manish Ahuja 2008-02-13 21:43 ` Manish Ahuja 1 sibling, 1 reply; 69+ messages in thread From: Stephen Rothwell @ 2008-02-12 10:18 UTC (permalink / raw) To: Manish Ahuja; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus [-- Attachment #1: Type: text/plain, Size: 521 bytes --] Hi Manish, On Tue, 12 Feb 2008 01:18:22 -0600 Manish Ahuja <ahuja@austin.ibm.com> wrote: > > -static void > -release_memory_range(unsigned long start_pfn, unsigned long nr_pages) > +static > +void release_memory_range(unsigned long start_pfn, unsigned long nr_pages) Cosmetic changes like this should normally be put in separate patch as they just distract from a real review (which this isn't :-)) -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas. 2008-02-12 10:18 ` Stephen Rothwell @ 2008-02-12 16:32 ` Manish Ahuja 0 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 16:32 UTC (permalink / raw) To: Stephen Rothwell; +Cc: mahuja, linuxppc-dev, linasvepstas, paulus Stephen Rothwell wrote: > Hi Manish, > > On Tue, 12 Feb 2008 01:18:22 -0600 Manish Ahuja <ahuja@austin.ibm.com> wrote: >> -static void >> -release_memory_range(unsigned long start_pfn, unsigned long nr_pages) >> +static >> +void release_memory_range(unsigned long start_pfn, unsigned long nr_pages) > > Cosmetic changes like this should normally be put in separate patch as > they just distract from a real review (which this isn't :-)) > Okay, will do from future. ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas. 2008-02-12 7:18 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja 2008-02-12 10:18 ` Stephen Rothwell @ 2008-02-13 21:43 ` Manish Ahuja 1 sibling, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-13 21:43 UTC (permalink / raw) To: linuxppc-dev, paulus; +Cc: mahuja, linasvepstas Changed asm to asm-powerpc. Hopefully this was the last of them. -Manish ---------------------------------------- Routines to a. invalidate dump b. Calculate region that is reserved and needs to be freed. This is exported through sysfs interface. Unregister has been removed for now as it wasn't being used. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> ----- --- arch/powerpc/platforms/pseries/phyp_dump.c | 85 +++++++++++++++++++++++++---- include/asm-powerpc/phyp_dump.h | 3 + 2 files changed, 77 insertions(+), 11 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-13 21:21:00.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-13 21:21:48.000000000 -0600 @@ -69,6 +69,10 @@ static struct phyp_dump_header phdr; #define DUMP_SOURCE_CPU 0x0001 #define DUMP_SOURCE_HPTE 0x0002 #define DUMP_SOURCE_RMO 0x0011 +#define DUMP_ERROR_FLAG 0x2000 +#define DUMP_TRIGGERED 0x4000 +#define DUMP_PERFORMED 0x8000 + /** * init_dump_header() - initialize the header declaring a dump @@ -180,9 +184,15 @@ static void print_dump_header(const stru static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; - ph->cpu_data.destination_address += addr; - ph->hpte_data.destination_address += addr; - ph->kernel_data.destination_address += addr; + + /* Add addr value if not initialized before */ + if (ph->cpu_data.destination_address == 0) { + ph->cpu_data.destination_address += addr; + ph->hpte_data.destination_address += addr; + ph->kernel_data.destination_address += addr; + } + + /* ToDo Invalidate kdump and free memory range. */ do { rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, @@ -195,6 +205,30 @@ static void register_dump_area(struct ph } } +static +void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + + /* Add addr value if not initialized before */ + if (ph->cpu_data.destination_address == 0) { + ph->cpu_data.destination_address += addr; + ph->hpte_data.destination_address += addr; + ph->kernel_data.destination_address += addr; + } + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 2, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) { + printk (KERN_ERR "phyp-dump: unexpected error (%d) " + "on invalidate\n", rc); + print_dump_header(ph); + } +} + /* ------------------------------------------------- */ /** * release_memory_range -- release memory previously lmb_reserved @@ -205,8 +239,8 @@ static void register_dump_area(struct ph * lmb_reserved in early boot. The released memory becomes * available for genreal use. */ -static void -release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +static +void release_memory_range(unsigned long start_pfn, unsigned long nr_pages) { struct page *rpage; unsigned long end_pfn; @@ -237,8 +271,8 @@ release_memory_range(unsigned long start * * will release 256MB starting at 1GB. */ -static ssize_t -store_release_region(struct kset *kset, const char *buf, size_t count) +static +ssize_t store_release_region(struct kset *kset, const char *buf, size_t count) { unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; @@ -266,10 +300,23 @@ store_release_region(struct kset *kset, return count; } -static ssize_t -show_release_region(struct kset * kset, char *buf) +static ssize_t show_release_region(struct kset * kset, char *buf) { - return sprintf(buf, "ola\n"); + u64 second_addr_range; + + /* total reserved size - start of scratch area */ + second_addr_range = phyp_dump_info->init_reserve_size - + phyp_dump_info->reserved_scratch_size; + return sprintf(buf, "CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:" + " DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n", + phdr.cpu_data.destination_address, + phdr.cpu_data.length_copied, + phdr.hpte_data.destination_address, + phdr.hpte_data.length_copied, + phdr.kernel_data.destination_address, + phdr.kernel_data.length_copied, + phyp_dump_info->init_reserve_start, + second_addr_range); } static struct subsys_attribute rr = __ATTR(release_region, 0600, @@ -293,7 +340,6 @@ static int __init phyp_dump_setup(void) if (!phyp_dump_info->phyp_dump_configured) { return -ENOSYS; } - print_dump_header(dump_header); /* Is there dump data waiting for us? If there isn't, * then register a new dump area, and release all of @@ -305,6 +351,7 @@ static int __init phyp_dump_setup(void) rtas = of_find_node_by_path("/rtas"); dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len); of_node_put(rtas); + print_dump_header(dump_header); dump_area_length = init_dump_header(&phdr); dump_area_start = phyp_dump_info->init_reserve_start & PAGE_MASK; /* align down */ @@ -314,6 +361,22 @@ static int __init phyp_dump_setup(void) return 0; } + /* re-register the dump area, if old dump was invalid */ + if ((dump_header) && (dump_header->status & DUMP_ERROR_FLAG)) { + invalidate_last_dump(&phdr, dump_area_start); + register_dump_area(&phdr, dump_area_start); + return 0; + } + + if (dump_header) { + phyp_dump_info->reserved_scratch_addr = + dump_header->cpu_data.destination_address; + phyp_dump_info->reserved_scratch_size = + dump_header->cpu_data.source_length + + dump_header->hpte_data.source_length + + dump_header->kernel_data.source_length; + } + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ rc = subsys_create_file(&kernel_subsys, &rr); if (rc) Index: 2.6.24-rc5/include/asm-powerpc/phyp_dump.h =================================================================== --- 2.6.24-rc5.orig/include/asm-powerpc/phyp_dump.h 2008-02-13 21:21:00.000000000 -0600 +++ 2.6.24-rc5/include/asm-powerpc/phyp_dump.h 2008-02-13 21:22:10.000000000 -0600 @@ -30,6 +30,9 @@ struct phyp_dump { /* store cpu & hpte size */ unsigned long cpu_state_size; unsigned long hpte_region_size; + /* previous scratch area values */ + unsigned long reserved_scratch_addr; + unsigned long reserved_scratch_size; }; extern struct phyp_dump *phyp_dump_info; ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 7/8] pseries: phyp dump: Tracking memory range freed. 2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja ` (5 preceding siblings ...) 2008-02-12 7:18 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja @ 2008-02-12 7:20 ` Manish Ahuja 2008-02-12 7:21 ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja 7 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 7:20 UTC (permalink / raw) To: linuxppc-dev, paulus; +Cc: mahuja, linasvepstas This patch tracks the size freed. For now it does a simple rudimentary calculation of the ranges freed. The idea is to keep it simple at the external shell script level and send in large chunks for now. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> ----- --- arch/powerpc/platforms/pseries/phyp_dump.c | 35 +++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:17.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:21.000000000 -0600 @@ -259,6 +259,39 @@ void release_memory_range(unsigned long } } +/** + * track_freed_range -- Counts the range being freed. + * Once the counter goes to zero, it re-registers dump for + * future use. + */ +static void +track_freed_range(unsigned long addr, unsigned long length) +{ + static unsigned long scratch_area_size, reserved_area_size; + + if (addr < phyp_dump_info->init_reserve_start) + return; + + if ((addr >= phyp_dump_info->init_reserve_start) && + (addr <= phyp_dump_info->init_reserve_start + + phyp_dump_info->init_reserve_size)) + reserved_area_size += length; + + if ((addr >= phyp_dump_info->reserved_scratch_addr) && + (addr <= phyp_dump_info->reserved_scratch_addr + + phyp_dump_info->reserved_scratch_size)) + scratch_area_size += length; + + if ((reserved_area_size == phyp_dump_info->init_reserve_size) && + (scratch_area_size == phyp_dump_info->reserved_scratch_size)) { + + invalidate_last_dump(&phdr, + phyp_dump_info->reserved_scratch_addr); + register_dump_area (&phdr, + phyp_dump_info->reserved_scratch_addr); + } +} + /* ------------------------------------------------- */ /** * sysfs_release_region -- sysfs interface to release memory range. @@ -282,6 +315,8 @@ ssize_t store_release_region(struct kset if (ret != 2) return -EINVAL; + track_freed_range(start_addr, length); + /* Range-check - don't free any reserved memory that * wasn't reserved for phyp-dump */ if (start_addr < phyp_dump_info->init_reserve_start) ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 8/8] pseries: phyp dump: config file 2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja ` (6 preceding siblings ...) 2008-02-12 7:20 ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja @ 2008-02-12 7:21 ` Manish Ahuja 7 siblings, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-02-12 7:21 UTC (permalink / raw) To: linuxppc-dev, paulus; +Cc: mahuja, linasvepstas, lkessler, strosake Add hypervisor-assisted dump to kernel config Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ----- arch/powerpc/Kconfig | 11 +++++++++++ 1 file changed, 11 insertions(+) Index: 2.6.24-rc5/arch/powerpc/Kconfig =================================================================== --- 2.6.24-rc5.orig/arch/powerpc/Kconfig 2008-02-12 06:12:08.000000000 -0600 +++ 2.6.24-rc5/arch/powerpc/Kconfig 2008-02-12 06:13:24.000000000 -0600 @@ -266,6 +266,17 @@ config CRASH_DUMP Don't change this unless you know what you are doing. +config PHYP_DUMP + bool "Hypervisor-assisted dump (EXPERIMENTAL)" + depends on PPC_PSERIES && EXPERIMENTAL + default y + help + Hypervisor-assisted dump is meant to be a kdump replacement + offering robustness and speed not possible without system + hypervisor assistence. + + If unsure, say "Y" + config PPCBUG_NVRAM bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC default y if PPC_PREP ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept @ 2008-01-08 0:19 Manish Ahuja 2008-01-08 0:26 ` Arnd Bergmann 0 siblings, 1 reply; 69+ messages in thread From: Manish Ahuja @ 2008-01-08 0:19 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, linasvepstas, lkessler, strosake Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/kernel/prom.c | 33 +++++++++++++ arch/powerpc/platforms/pseries/Makefile | 1 arch/powerpc/platforms/pseries/phyp_dump.c | 71 +++++++++++++++++++++++++++++ include/asm-powerpc/phyp_dump.h | 32 +++++++++++++ 4 files changed, 137 insertions(+) Index: linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h 2007-11-19 17:44:21.000000000 -0600 @@ -0,0 +1,32 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyright (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _PPC64_PHYP_DUMP_H +#define _PPC64_PHYP_DUMP_H + +#ifdef CONFIG_PHYP_DUMP + +/* The RMR region will be saved for later dumping + * whenever the kernel crashes. Set this to 256MB. */ +#define PHYP_DUMP_RMR_START 0x0 +#define PHYP_DUMP_RMR_END (1UL<<28) + +struct phyp_dump { + /* Memory that is reserved during very early boot. */ + unsigned long init_reserve_start; + unsigned long init_reserve_size; +}; + +extern struct phyp_dump *phyp_dump_info; + +#endif /* CONFIG_PHYP_DUMP */ +#endif /* _PPC64_PHYP_DUMP_H */ Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-19 19:07:49.000000000 -0600 @@ -0,0 +1,71 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyrhgit (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ + +#include <linux/init.h> +#include <linux/mm.h> +#include <linux/pfn.h> +#include <linux/swap.h> + +#include <asm/page.h> +#include <asm/phyp_dump.h> + +/* Global, used to communicate data between early boot and late boot */ +static struct phyp_dump phyp_dump_global; +struct phyp_dump *phyp_dump_info = &phyp_dump_global; + +/** + * release_memory_range -- release memory previously lmb_reserved + * @start_pfn: starting physical frame number + * @nr_pages: number of pages to free. + * + * This routine will release memory that had been previously + * lmb_reserved in early boot. The released memory becomes + * available for genreal use. + */ +static void +release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +{ + struct page *rpage; + unsigned long end_pfn; + long i; + + end_pfn = start_pfn + nr_pages; + + for (i=start_pfn; i <= end_pfn; i++) { + rpage = pfn_to_page(i); + if (PageReserved(rpage)) { + ClearPageReserved(rpage); + init_page_count(rpage); + __free_page(rpage); + totalram_pages++; + } + } +} + +static int __init phyp_dump_setup(void) +{ + unsigned long start_pfn, nr_pages; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info->init_reserve_size == 0) + return 0; + + /* Release memory that was reserved in early boot */ + start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start); + nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size); + release_memory_range(start_pfn, nr_pages); + + return 0; +} + +subsys_initcall(phyp_dump_setup); Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:43:52.000000000 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:44:21.000000000 -0600 @@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_HVCS) += hvcserver.o obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o +obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c =================================================================== --- linux-2.6.24-rc2-git4.orig/arch/powerpc/kernel/prom.c 2007-11-19 17:43:52.000000000 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c 2007-11-19 17:44:21.000000000 -0600 @@ -51,6 +51,7 @@ #include <asm/machdep.h> #include <asm/pSeries_reconfig.h> #include <asm/pci-bridge.h> +#include <asm/phyp_dump.h> #include <asm/kexec.h> #ifdef DEBUG @@ -1011,6 +1012,37 @@ static void __init early_reserve_mem(voi #endif } +#ifdef CONFIG_PHYP_DUMP + +/** + * reserve_crashed_mem() - reserve all not-yet-dumped mmemory + * + * This routine will reserve almost all of the memory in the + * system, except for a few hundred megabytes used to boot the + * new kernel. As the reserved memory is dumped to the dump + * device (by userland tools), it will be freed and made available. + */ +static void __init reserve_crashed_mem(void) +{ + unsigned long crashed_base, crashed_size; + + /* Reserve *everything* above the RMR. We'll free this real soon. */ + crashed_base = PHYP_DUMP_RMR_END; + crashed_size = lmb_end_of_DRAM() - crashed_base; + + /* XXX crashed_ram_end is wrong, since it may be beyond + * the memory_limit, it will need to be adjusted. */ + lmb_reserve(crashed_base, crashed_size); + + phyp_dump_info->init_reserve_start = crashed_base; + phyp_dump_info->init_reserve_size = crashed_size; +} + +#else +static inline void __init reserve_crashed_mem(void) {} +#endif /* CONFIG_PHYP_DUMP */ + + void __init early_init_devtree(void *params) { DBG(" -> early_init_devtree(%p)\n", params); @@ -1043,6 +1075,7 @@ void __init early_init_devtree(void *par reserve_kdump_trampoline(); reserve_crashkernel(); early_reserve_mem(); + reserve_crashed_mem(); lmb_enforce_memory_limit(memory_limit); lmb_analyze(); ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept 2008-01-08 0:19 [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja @ 2008-01-08 0:26 ` Arnd Bergmann 2008-01-08 0:49 ` Linas Vepstas 2008-01-08 0:50 ` Manish Ahuja 0 siblings, 2 replies; 69+ messages in thread From: Arnd Bergmann @ 2008-01-08 0:26 UTC (permalink / raw) To: linuxppc-dev; +Cc: mahuja, lkessler, linasvepstas, strosake On Tuesday 08 January 2008, Manish Ahuja wrote: > Initial patch for reserving memory in early boot, and freeing it later. > If the previous boot had ended with a crash, the reserved memory would contain > a copy of the crashed kernel data. > > Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> > Signed-off-by: Linas Vepstas <linas@austin.ibm.com> I think the signed-off-by chain needs to be modified. The way it appears, you handled the patch first, then sent it to Linas, who forwarded it to whoever will take the patches from the list. This obviously isn't true, since you are actually the one who is sending out the patches. Moreover, I believe that the linas@austin.ibm.com address is now dead, and shouldn't be used for this any more. So, depending on which of you two wrote the majority of a patch, I think it should be either | Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> | Acked-by: Linas Vepstas <linasvepstas@gmail.com> or | From: Linas Vepstas <linasvepstas@gmail.com> | Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> Arnd <>< ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept 2008-01-08 0:26 ` Arnd Bergmann @ 2008-01-08 0:49 ` Linas Vepstas 2008-01-08 1:13 ` Michael Ellerman 2008-01-08 0:50 ` Manish Ahuja 1 sibling, 1 reply; 69+ messages in thread From: Linas Vepstas @ 2008-01-08 0:49 UTC (permalink / raw) To: Arnd Bergmann; +Cc: mahuja, linuxppc-dev, lkessler, strosake On 07/01/2008, Arnd Bergmann <arnd@arndb.de> wrote: > On Tuesday 08 January 2008, Manish Ahuja wrote: > > > Initial patch for reserving memory in early boot, and freeing it later. > > If the previous boot had ended with a crash, the reserved memory would contain > > a copy of the crashed kernel data. > > > > Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> > > Signed-off-by: Linas Vepstas <linas@austin.ibm.com> > > I think the signed-off-by chain needs to be modified. The way it appears, > you handled the patch first, then sent it to Linas, who forwarded it > to whoever will take the patches from the list. Well, -- there was dual authorship. I remangled the patches while Manish wrote code & tested. And I'd mailed them out the first time around, so you could say I forwarded after heavy editing. > This obviously isn't true, since you are actually the one who is sending > out the patches. Moreover, I believe that the linas@austin.ibm.com > address is now dead, and shouldn't be used for this any more. Hmm. I wanted to indicate that the work was done while I was at IBM; clearly, no one is going through git and changing old, expired email addrs, and so submission based on the old addr seemed appropriate. I'm taking the Signed-off-by line as a quasi-legal thing: a fancy ID string, identifying the author(s), rather than a new way to manage email address books. > So, depending on which of you two wrote the majority of a patch, I think > it should be either I'm not sure there was a clear majority. I think Manish did more work in general, but we hacked this together side by side. I got him to create working tested code; I busted it up into individual, clean, documented, mailing-list ready chunks. --linas ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept 2008-01-08 0:49 ` Linas Vepstas @ 2008-01-08 1:13 ` Michael Ellerman 0 siblings, 0 replies; 69+ messages in thread From: Michael Ellerman @ 2008-01-08 1:13 UTC (permalink / raw) To: linasvepstas; +Cc: mahuja, linuxppc-dev, lkessler, strosake, Arnd Bergmann [-- Attachment #1: Type: text/plain, Size: 1945 bytes --] On Mon, 2008-01-07 at 18:49 -0600, Linas Vepstas wrote: > On 07/01/2008, Arnd Bergmann <arnd@arndb.de> wrote: > > On Tuesday 08 January 2008, Manish Ahuja wrote: > > > > > Initial patch for reserving memory in early boot, and freeing it later. > > > If the previous boot had ended with a crash, the reserved memory would contain > > > a copy of the crashed kernel data. > > > > > > Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> > > > Signed-off-by: Linas Vepstas <linas@austin.ibm.com> > > > > I think the signed-off-by chain needs to be modified. The way it appears, > > you handled the patch first, then sent it to Linas, who forwarded it > > to whoever will take the patches from the list. > > This obviously isn't true, since you are actually the one who is sending > > out the patches. Moreover, I believe that the linas@austin.ibm.com > > address is now dead, and shouldn't be used for this any more. > > Hmm. I wanted to indicate that the work was done while I was at IBM; > clearly, no one is going through git and changing old, expired email > addrs, and so submission based on the old addr seemed appropriate. > > I'm taking the Signed-off-by line as a quasi-legal thing: a fancy ID string, > identifying the author(s), rather than a new way to manage email > address books. No one's changing the git history (you can't), but it seems silly to submit a patch with an address that is already dead. There is a general expectation that if/when someone finds a bug in the code they can email the signed-off-by addresses and get a real person. As far as indicating it was written at IBM, that should be covered by the copyright notices. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept 2008-01-08 0:26 ` Arnd Bergmann 2008-01-08 0:49 ` Linas Vepstas @ 2008-01-08 0:50 ` Manish Ahuja 1 sibling, 0 replies; 69+ messages in thread From: Manish Ahuja @ 2008-01-08 0:50 UTC (permalink / raw) To: Arnd Bergmann; +Cc: mahuja, linuxppc-dev, linasvepstas, lkessler, strosake Arnd, Sorry this patch ended up out of sequence. I reposted it properly again in the other thread. We did talk about using gmail address, but Linas was more comfortable using this as this is where he was, when we did this. Hence the use of Austin address with gmail being on the cc list. I am sure he will chime in with more details about it when he gets the opportunity. Thanks, Manish Arnd Bergmann wrote: > On Tuesday 08 January 2008, Manish Ahuja wrote: > >> Initial patch for reserving memory in early boot, and freeing it later. >> If the previous boot had ended with a crash, the reserved memory would contain >> a copy of the crashed kernel data. >> >> Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> >> Signed-off-by: Linas Vepstas <linas@austin.ibm.com> > > I think the signed-off-by chain needs to be modified. The way it appears, > you handled the patch first, then sent it to Linas, who forwarded it > to whoever will take the patches from the list. > > This obviously isn't true, since you are actually the one who is sending > out the patches. Moreover, I believe that the linas@austin.ibm.com > address is now dead, and shouldn't be used for this any more. > > So, depending on which of you two wrote the majority of a patch, I think > it should be either > > | Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> > | Acked-by: Linas Vepstas <linasvepstas@gmail.com> > > or > > | From: Linas Vepstas <linasvepstas@gmail.com> > | Signed-off-by: Manish Ahuja <mahuja@us.ibm.com> > > Arnd <>< ^ permalink raw reply [flat|nested] 69+ messages in thread
end of thread, other threads:[~2008-02-15 22:34 UTC | newest] Thread overview: 69+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja 2008-01-08 0:13 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja 2008-01-09 4:29 ` Nathan Lynch 2008-01-09 4:58 ` Michael Ellerman 2008-01-09 15:31 ` Linas Vepstas 2008-01-09 18:44 ` Nathan Lynch 2008-01-09 19:28 ` Manish Ahuja 2008-01-09 22:59 ` Michael Ellerman 2008-01-09 23:18 ` Manish Ahuja 2008-01-10 2:47 ` Linas Vepstas 2008-01-10 3:55 ` Michael Ellerman 2008-01-10 2:33 ` Linas Vepstas 2008-01-10 3:17 ` Olof Johansson 2008-01-10 4:12 ` Linas Vepstas 2008-01-10 4:52 ` Michael Ellerman 2008-01-10 16:21 ` Olof Johansson 2008-01-10 16:34 ` Linas Vepstas 2008-01-10 21:46 ` Mike Strosaker 2008-01-11 1:26 ` Nathan Lynch 2008-01-11 16:57 ` Linas Vepstas 2008-01-14 5:24 ` Olof Johansson 2008-01-14 15:21 ` Linas Vepstas 2008-01-08 0:16 ` [PATCH 2/8] pseries: phyp dump: config file Manish Ahuja 2008-01-08 3:18 ` Stephen Rothwell 2008-01-08 0:21 ` [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja 2008-01-08 3:45 ` Stephen Rothwell 2008-01-08 18:34 ` Linas Vepstas 2008-01-08 0:25 ` [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja 2008-01-08 3:16 ` Stephen Rothwell 2008-01-16 4:21 ` Paul Mackerras 2008-01-08 0:28 ` [PATCH 5/8] pseries: phyp dump: register dump area Manish Ahuja 2008-01-08 3:59 ` Stephen Rothwell 2008-01-08 0:35 ` [PATCH 6/8] pseries: phyp dump: debugging print routines Manish Ahuja 2008-01-08 0:49 ` Arnd Bergmann 2008-01-08 4:03 ` Stephen Rothwell 2008-01-08 0:37 ` [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas Manish Ahuja 2008-01-08 4:25 ` Stephen Rothwell 2008-01-08 22:56 ` Manish Ahuja 2008-01-08 0:39 ` [PATCH 8/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja 2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja 2008-02-12 6:53 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja 2008-02-12 7:08 ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja 2008-02-12 8:48 ` Michael Ellerman 2008-02-12 16:38 ` Manish Ahuja 2008-02-14 3:46 ` Tony Breeds 2008-02-14 23:12 ` Olof Johansson 2008-02-15 7:16 ` Manish Ahuja 2008-02-12 7:11 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja 2008-02-12 10:08 ` Stephen Rothwell 2008-02-12 16:40 ` Manish Ahuja 2008-02-15 1:05 ` Tony Breeds 2008-02-15 7:17 ` Manish Ahuja 2008-02-15 22:32 ` Tony Breeds 2008-02-15 17:30 ` Linas Vepstas 2008-02-12 7:14 ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja 2008-02-12 10:11 ` Stephen Rothwell 2008-02-12 16:31 ` Manish Ahuja 2008-02-12 7:16 ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja 2008-02-12 7:18 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja 2008-02-12 10:18 ` Stephen Rothwell 2008-02-12 16:32 ` Manish Ahuja 2008-02-13 21:43 ` Manish Ahuja 2008-02-12 7:20 ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja 2008-02-12 7:21 ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja -- strict thread matches above, loose matches on Subject: below -- 2008-01-08 0:19 [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja 2008-01-08 0:26 ` Arnd Bergmann 2008-01-08 0:49 ` Linas Vepstas 2008-01-08 1:13 ` Michael Ellerman 2008-01-08 0:50 ` Manish Ahuja
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).