* [RFC] makedumpfile-1.5.1 RC
@ 2012-11-16 8:15 Atsushi Kumagai
2012-11-20 12:14 ` Lisa Mitchell
0 siblings, 1 reply; 15+ messages in thread
From: Atsushi Kumagai @ 2012-11-16 8:15 UTC (permalink / raw)
To: kexec
Hello,
This is the makedumpfile version 1.5.1-rc.
Your comments/patches are welcome.
http://makedumpfile.git.sourceforge.net/git/gitweb.cgi?p=makedumpfile/makedumpfile;a=shortlog;h=refs/heads/v1.5.1-rc
If there is no problem, I will release v1.5.1 GA on Nov 28.
Main new feature:
o Support for snappy compression
This feature allows you to compress dump data by each page using snappy.
This feature is optional, the user has to prepare snappy library and
build binary with USESNAPPY=on to use it.
o Support for eppic language
This feature allows you to scrub data in a dumpfile with eppic macro.
Eppic macro can specify more flexible rules for scrubbing than the original
rules specified with --config option.
This feature is optional, the user has to prepare eppic (and tinfo) library
and build binary with EPPIC=on to use it.
o Introduce mem_map array logic
This is the new logic of excluding free pages. This logic excludes free
pages by looking up mem_map array instead of free lists, be expected
good performance for cyclic mode.
This feature requires the values below but vmcore doesn't include them,
so the user has to prepare vmlinux or vmcoreinfo which includes them.
When running on cyclic mode and the required values exist, mem_map array
logic is used. Otherwise, free list logic is used.
- OFFSET(page._mapcount)
- OFFSET(page.private)
- SIZE(pageflags)
- NUMBER(PG_buddy)
- NUMBER(PG_slab)
- NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)
Additionally, I will post the patch to add the values above to the upstream
kernel, so vmlinux and vmcoreinfo will be unnecessary for this feature soon.
Changelog:
o New feature
Commits related to "Support for Xen4"
- [PATCH v3 1/9] Make max_pfn symbol optional for Xen dumps. (by Petr Tesarik) 3823577
- [PATCH v3 2/9] Xen: Fix the offset of the _domain field in struct page_info.
(by Petr Tesarik) 6f0e831
- [PATCH v3 3/9] Read the Xen crash ELF note into memory at startup. (by Petr Tesarik) 4e11405
- [PATCH v3 4/9] Split Xen setup into early and late. (by Petr Tesarik) e8295d2
- [PATCH v3 5/9] Initialize phys_start during early Xen setup. (by Petr Tesarik) 064fbc1
- [PATCH v3 6/9] Fix domain pickled_id computation for xen-3.4+. (by Petr Tesarik) 3ae13b4
- [PATCH v3 7/9] Support Xen4 virtuall address space layout. (by Petr Tesarik) 51ea90a
- [PATCH v3 8/9] Add support for filtering out user pages under Xen4. (by Petr Tesarik) fccad16
- [PATCH v3 9/9] Do not fail for symbols removed in Xen4. (by Petr Tesarik) 9151172
- [PATCH] Initialize Xen structures from initial(). (by Petr Tesarik) 56a388e
Commits related to "Support for snappy compression"
- [PATCH 1/9] Add dump header for snappy. (by HATAYAMA Daisuke) 5634487
- [PATCH 2/9] Add command-line processing for snappy. (by HATAYAMA Daisuke) 8c69b49
- [PATCH 3/9] Add snappy build support. (by HATAYAMA Daisuke) c9a24d0
- [PATCH 4/9] Notify snappy unsupporting when disabled. (by HATAYAMA Daisuke) d075b7f
- [PATCH 5/9] Add compression processing. (by HATAYAMA Daisuke) 604aacc
- [PATCH 6/9] Add uncompression processing. (by HATAYAMA Daisuke) 746d5f2
- [PATCH 7/9] Add help message. (by HATAYAMA Daisuke) 550c19c
- [PATCH 8/9] Add manual description. (by HATAYAMA Daisuke) ba3ee46
- [PATCH 9/9] Add README description. (by Atsushi Kumagai) 0626577
Commits related to "Support for eppic language"
- [PATCH v2 1/7] Initialize and setup eppic. (by Aravinda Prasad) 5a9be3f
- [PATCH v2 2/7] makedumpfile and eppic interface layer. (by Aravinda Prasad) da46854
- [PATCH v2 3/7] Eppic call back functions to query a dump image. (by Aravinda Prasad) 3808381
- [PATCH v2 4/7] Implement apigetctype call back function. (by Aravinda Prasad) 213cc99
- [PATCH v2 5/7] Implement apimember and apigetrtype call back functions.
(by Aravinda Prasad) 424a241
- [PATCH v2 6/7] Extend eppic built-in functions to include memset function.
(by Aravinda Prasad) 587d8a3
- [PATCH v2 7/7] Support fully typed symbol access mode. (by Aravinda Prasad) 4f3e0da
Commits related to "mem_map array logic"
- [PATCH v2 01/10] Move page flags setup for old kernels after debuginfo initialization.
(by HATAYAMA Daisuke) ed3fe07
- [PATCH v2 02/10] Add debuginfo interface for enum type size. (by HATAYAMA Daisuke) e308a72
- [PATCH v2 03/10] Add new parameters to various tables. (by HATAYAMA Daisuke) 0ba57be
- [PATCH v2 04/10] Add debuginfo-related processing for VMCOREINFO/VMLINUX.
(by HATAYAMA Daisuke) b1f49d6
- [PATCH v2 05/10] Add hardcoded page flag values. (by HATAYAMA Daisuke) d2f87bc
- [PATCH v2 06/10] Exclude free pages by looking up mem_map array. (by HATAYAMA Daisuke) 8f74a27
- [PATCH v2 07/10] Add page_is_buddy for recent kernels. (by HATAYAMA Daisuke) a0a9fa7
- [PATCH v2 08/10] Add page_is_buddy for PG_buddy. (by HATAYAMA Daisuke) a2f411c
- [PATCH v2 09/10] Add page_is_buddy for old kernels. (by HATAYAMA Daisuke) db3553a
- [PATCH v2 10/10] Warn cyclic buffer overrun and correct it if possible.
(by HATAYAMA Daisuke) a0b9d84
Other commit
- [PATCH] Support for x86_64 1G pages. (by Petr Tesarik) 7b10a11
- [PATCH] Change dwarf analyzer to search also into named containers. (by Atsushi Kumagai) 1f31b6a
- [PATCH] s390x: Add 2GB frame support for page table walker. (by Michael Holzheu) f6ab608
- [PATCH v2 1/2] Add get_free_memory_size() to get the amount of free memory.
(by Atsushi Kumagai) 8dd4a17
- [PATCH v2 2/2] Calculate the size of cyclic buffer automatically. (by Atsushi Kumagai) 01db605
o Bugfix
- [PATCH] add a missing return statement. (by Petr Tesarik) f6134c7
Thanks
Atsushi Kumagai
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [RFC] makedumpfile-1.5.1 RC 2012-11-16 8:15 [RFC] makedumpfile-1.5.1 RC Atsushi Kumagai @ 2012-11-20 12:14 ` Lisa Mitchell 2012-11-20 16:35 ` Vivek Goyal 2012-12-04 13:31 ` Lisa Mitchell 0 siblings, 2 replies; 15+ messages in thread From: Lisa Mitchell @ 2012-11-20 12:14 UTC (permalink / raw) To: Atsushi Kumagai, jerry.hoemann; +Cc: kexec@lists.infradead.org On Fri, 2012-11-16 at 08:15 +0000, Atsushi Kumagai wrote: > Hello, > > This is the makedumpfile version 1.5.1-rc. > Your comments/patches are welcome. > > http://makedumpfile.git.sourceforge.net/git/gitweb.cgi?p=makedumpfile/makedumpfile;a=shortlog;h=refs/heads/v1.5.1-rc > > If there is no problem, I will release v1.5.1 GA on Nov 28. > > > Main new feature: > o Support for snappy compression > This feature allows you to compress dump data by each page using snappy. > This feature is optional, the user has to prepare snappy library and > build binary with USESNAPPY=on to use it. > > o Support for eppic language > This feature allows you to scrub data in a dumpfile with eppic macro. > Eppic macro can specify more flexible rules for scrubbing than the original > rules specified with --config option. > This feature is optional, the user has to prepare eppic (and tinfo) library > and build binary with EPPIC=on to use it. > > o Introduce mem_map array logic > This is the new logic of excluding free pages. This logic excludes free > pages by looking up mem_map array instead of free lists, be expected > good performance for cyclic mode. > This feature requires the values below but vmcore doesn't include them, > so the user has to prepare vmlinux or vmcoreinfo which includes them. > When running on cyclic mode and the required values exist, mem_map array > logic is used. Otherwise, free list logic is used. > > - OFFSET(page._mapcount) > - OFFSET(page.private) > - SIZE(pageflags) > - NUMBER(PG_buddy) > - NUMBER(PG_slab) > - NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE) > > Additionally, I will post the patch to add the values above to the upstream > kernel, so vmlinux and vmcoreinfo will be unnecessary for this feature soon. > > > Changelog: > o New feature > Commits related to "Support for Xen4" > - [PATCH v3 1/9] Make max_pfn symbol optional for Xen dumps. (by Petr Tesarik) 3823577 > - [PATCH v3 2/9] Xen: Fix the offset of the _domain field in struct page_info. > (by Petr Tesarik) 6f0e831 > - [PATCH v3 3/9] Read the Xen crash ELF note into memory at startup. (by Petr Tesarik) 4e11405 > - [PATCH v3 4/9] Split Xen setup into early and late. (by Petr Tesarik) e8295d2 > - [PATCH v3 5/9] Initialize phys_start during early Xen setup. (by Petr Tesarik) 064fbc1 > - [PATCH v3 6/9] Fix domain pickled_id computation for xen-3.4+. (by Petr Tesarik) 3ae13b4 > - [PATCH v3 7/9] Support Xen4 virtuall address space layout. (by Petr Tesarik) 51ea90a > - [PATCH v3 8/9] Add support for filtering out user pages under Xen4. (by Petr Tesarik) fccad16 > - [PATCH v3 9/9] Do not fail for symbols removed in Xen4. (by Petr Tesarik) 9151172 > - [PATCH] Initialize Xen structures from initial(). (by Petr Tesarik) 56a388e > > Commits related to "Support for snappy compression" > - [PATCH 1/9] Add dump header for snappy. (by HATAYAMA Daisuke) 5634487 > - [PATCH 2/9] Add command-line processing for snappy. (by HATAYAMA Daisuke) 8c69b49 > - [PATCH 3/9] Add snappy build support. (by HATAYAMA Daisuke) c9a24d0 > - [PATCH 4/9] Notify snappy unsupporting when disabled. (by HATAYAMA Daisuke) d075b7f > - [PATCH 5/9] Add compression processing. (by HATAYAMA Daisuke) 604aacc > - [PATCH 6/9] Add uncompression processing. (by HATAYAMA Daisuke) 746d5f2 > - [PATCH 7/9] Add help message. (by HATAYAMA Daisuke) 550c19c > - [PATCH 8/9] Add manual description. (by HATAYAMA Daisuke) ba3ee46 > - [PATCH 9/9] Add README description. (by Atsushi Kumagai) 0626577 > > Commits related to "Support for eppic language" > - [PATCH v2 1/7] Initialize and setup eppic. (by Aravinda Prasad) 5a9be3f > - [PATCH v2 2/7] makedumpfile and eppic interface layer. (by Aravinda Prasad) da46854 > - [PATCH v2 3/7] Eppic call back functions to query a dump image. (by Aravinda Prasad) 3808381 > - [PATCH v2 4/7] Implement apigetctype call back function. (by Aravinda Prasad) 213cc99 > - [PATCH v2 5/7] Implement apimember and apigetrtype call back functions. > (by Aravinda Prasad) 424a241 > - [PATCH v2 6/7] Extend eppic built-in functions to include memset function. > (by Aravinda Prasad) 587d8a3 > - [PATCH v2 7/7] Support fully typed symbol access mode. (by Aravinda Prasad) 4f3e0da > > Commits related to "mem_map array logic" > - [PATCH v2 01/10] Move page flags setup for old kernels after debuginfo initialization. > (by HATAYAMA Daisuke) ed3fe07 > - [PATCH v2 02/10] Add debuginfo interface for enum type size. (by HATAYAMA Daisuke) e308a72 > - [PATCH v2 03/10] Add new parameters to various tables. (by HATAYAMA Daisuke) 0ba57be > - [PATCH v2 04/10] Add debuginfo-related processing for VMCOREINFO/VMLINUX. > (by HATAYAMA Daisuke) b1f49d6 > - [PATCH v2 05/10] Add hardcoded page flag values. (by HATAYAMA Daisuke) d2f87bc > - [PATCH v2 06/10] Exclude free pages by looking up mem_map array. (by HATAYAMA Daisuke) 8f74a27 > - [PATCH v2 07/10] Add page_is_buddy for recent kernels. (by HATAYAMA Daisuke) a0a9fa7 > - [PATCH v2 08/10] Add page_is_buddy for PG_buddy. (by HATAYAMA Daisuke) a2f411c > - [PATCH v2 09/10] Add page_is_buddy for old kernels. (by HATAYAMA Daisuke) db3553a > - [PATCH v2 10/10] Warn cyclic buffer overrun and correct it if possible. > (by HATAYAMA Daisuke) a0b9d84 > > Other commit > - [PATCH] Support for x86_64 1G pages. (by Petr Tesarik) 7b10a11 > - [PATCH] Change dwarf analyzer to search also into named containers. (by Atsushi Kumagai) 1f31b6a > - [PATCH] s390x: Add 2GB frame support for page table walker. (by Michael Holzheu) f6ab608 > - [PATCH v2 1/2] Add get_free_memory_size() to get the amount of free memory. > (by Atsushi Kumagai) 8dd4a17 > - [PATCH v2 2/2] Calculate the size of cyclic buffer automatically. (by Atsushi Kumagai) 01db605 > > o Bugfix > - [PATCH] add a missing return statement. (by Petr Tesarik) f6134c7 > > > Thanks > Atsushi Kumagai > I tested this makedumpfile v1.5.1-rc on a 4 TB DL980, on 2.6.32 based kernel, and got good results. With crashkernel=256M, and default settings (i.e. no cyclic buffer option selected), the dump successfully completed in about 2 hours, 40 minutes, and then I specified a cyclic buffer size of 48 M, and the dump completed in the same time, no measurable differences within the accuracy of our measurements. We are still evaluating perfomance data, and don't have very precise measurements here for comparisons, but the results look promising so far. Lisa Mitchell _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-11-20 12:14 ` Lisa Mitchell @ 2012-11-20 16:35 ` Vivek Goyal 2012-11-20 13:03 ` Lisa Mitchell 2012-12-04 13:31 ` Lisa Mitchell 1 sibling, 1 reply; 15+ messages in thread From: Vivek Goyal @ 2012-11-20 16:35 UTC (permalink / raw) To: Lisa Mitchell Cc: kexec@lists.infradead.org, Atsushi Kumagai, jerry.hoemann, Cliff Wickman On Tue, Nov 20, 2012 at 05:14:55AM -0700, Lisa Mitchell wrote: [..] > I tested this makedumpfile v1.5.1-rc on a 4 TB DL980, on 2.6.32 based > kernel, and got good results. With crashkernel=256M, and default > settings (i.e. no cyclic buffer option selected), the dump successfully > completed in about 2 hours, 40 minutes, and then I specified a cyclic > buffer size of 48 M, and the dump completed in the same time, no > measurable differences within the accuracy of our measurements. This sounds little odd to me. - With smaller buffer size of 48M, it should have taken much more time to finish the dump as compared to when no restriction was put on buffer size. I am assuming that out of 256M reserved, say around 128MB was available for makedumpfile to use. - Also 2 hours 40 minutes sounds a lot. Is it practical to wait that long for a machine to dump before it can be brought into service again? Do you have any data w.r.t older makedumpfile (which did not have cyclic buffer logic). I have some data which I collected in 2008. 128GB system took roughly 4 minutes to filter and save dumpfile. So if we scale it linearly then it should take around 32minutes per TB. Hence around 2 hours 8 minutes for a 4TB systems. Your numbers do seems to be in roughly inline. Still 2-2.5 hours seems too long to be able to filter and save core of a 4TB system. We will probably need to figure out what's taking so much of time. May be we need to look into cliff wickman's idea of kernel returning list of pfns to dump and make dump 20 time faster. I will love to have 4TB system dumped in 6 minutes as opposed to 2 hrs. :-) Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-11-20 16:35 ` Vivek Goyal @ 2012-11-20 13:03 ` Lisa Mitchell 2012-11-20 21:46 ` Vivek Goyal 0 siblings, 1 reply; 15+ messages in thread From: Lisa Mitchell @ 2012-11-20 13:03 UTC (permalink / raw) To: Vivek Goyal Cc: kexec@lists.infradead.org, Atsushi Kumagai, Hoemann, Jerry, Cliff Wickman On Tue, 2012-11-20 at 16:35 +0000, Vivek Goyal wrote: > On Tue, Nov 20, 2012 at 05:14:55AM -0700, Lisa Mitchell wrote: > > [..] > > I tested this makedumpfile v1.5.1-rc on a 4 TB DL980, on 2.6.32 based > > kernel, and got good results. With crashkernel=256M, and default > > settings (i.e. no cyclic buffer option selected), the dump successfully > > completed in about 2 hours, 40 minutes, and then I specified a cyclic > > buffer size of 48 M, and the dump completed in the same time, no > > measurable differences within the accuracy of our measurements. > > This sounds little odd to me. > > - With smaller buffer size of 48M, it should have taken much more time > to finish the dump as compared to when no restriction was put on > buffer size. I am assuming that out of 256M reserved, say around 128MB > was available for makedumpfile to use. > > - Also 2 hours 40 minutes sounds a lot. Is it practical to wait that > long for a machine to dump before it can be brought into service > again? Do you have any data w.r.t older makedumpfile (which did not > have cyclic buffer logic). > > I have some data which I collected in 2008. 128GB system took roughly > 4 minutes to filter and save dumpfile. So if we scale it linearly > then it should take around 32minutes per TB. Hence around 2 hours > 8 minutes for a 4TB systems. Your numbers do seems to be in roughly > inline. > > Still 2-2.5 hours seems too long to be able to filter and save core of a > 4TB system. We will probably need to figure out what's taking so much of > time. May be we need to look into cliff wickman's idea of kernel returning > list of pfns to dump and make dump 20 time faster. I will love to have 4TB > system dumped in 6 minutes as opposed to 2 hrs. :-) > > Thanks > Vivek As I stated, I don't really have precise performance data here, but the time I got was comparable to the rough 3-4 hours with a larger crashkernel size that I got a successful dump on this same system with a makedumpfile v1.4. We haven't made a good apples-apples comparison between the two at this point, but this is how long this 4 TB system has been taking to dump, dump level =31, so we feel we are in the same ballpark with makedumpfile v1.5.1. It does seem that the "Excluding pages" parts take up a lot of the time in the dump, as opposed to the copying, but I don't have a good breakdown. I have added the debug mem_level 3 to kdump.conf file, and have seen used memory on this machine recorded right before makedumpfile creates the bitmap and starts filtering be around 140 MB, and have seen makedumpfile fail, with OOM killer active after this point with a crashkernel size of 256 MB or 384 MB using makedumpfile v1.4. So makedumpfile v1.5.1 solves the above problem, and allows us to successfully dump a 4 TB system with these smaller crashkernel sizes. We do need much better performance numbers to insure no regression from makedumpfile v1.4, but I wanted you to get the feedback at least of what testing we had done, and that it appears it is solving the primary problem we were interested in, that we could dump many terabytes of memory with crashkernel sizes fixed at 384 MB or below. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-11-20 13:03 ` Lisa Mitchell @ 2012-11-20 21:46 ` Vivek Goyal 2012-11-20 19:05 ` Lisa Mitchell 0 siblings, 1 reply; 15+ messages in thread From: Vivek Goyal @ 2012-11-20 21:46 UTC (permalink / raw) To: Lisa Mitchell Cc: kexec@lists.infradead.org, Atsushi Kumagai, Hoemann, Jerry, Cliff Wickman On Tue, Nov 20, 2012 at 06:03:20AM -0700, Lisa Mitchell wrote: > On Tue, 2012-11-20 at 16:35 +0000, Vivek Goyal wrote: > > On Tue, Nov 20, 2012 at 05:14:55AM -0700, Lisa Mitchell wrote: > > > > [..] > > > I tested this makedumpfile v1.5.1-rc on a 4 TB DL980, on 2.6.32 based > > > kernel, and got good results. With crashkernel=256M, and default > > > settings (i.e. no cyclic buffer option selected), the dump successfully > > > completed in about 2 hours, 40 minutes, and then I specified a cyclic > > > buffer size of 48 M, and the dump completed in the same time, no > > > measurable differences within the accuracy of our measurements. > > > > This sounds little odd to me. > > > > - With smaller buffer size of 48M, it should have taken much more time > > to finish the dump as compared to when no restriction was put on > > buffer size. I am assuming that out of 256M reserved, say around 128MB > > was available for makedumpfile to use. > > > > - Also 2 hours 40 minutes sounds a lot. Is it practical to wait that > > long for a machine to dump before it can be brought into service > > again? Do you have any data w.r.t older makedumpfile (which did not > > have cyclic buffer logic). > > > > I have some data which I collected in 2008. 128GB system took roughly > > 4 minutes to filter and save dumpfile. So if we scale it linearly > > then it should take around 32minutes per TB. Hence around 2 hours > > 8 minutes for a 4TB systems. Your numbers do seems to be in roughly > > inline. > > > > Still 2-2.5 hours seems too long to be able to filter and save core of a > > 4TB system. We will probably need to figure out what's taking so much of > > time. May be we need to look into cliff wickman's idea of kernel returning > > list of pfns to dump and make dump 20 time faster. I will love to have 4TB > > system dumped in 6 minutes as opposed to 2 hrs. :-) > > > > Thanks > > Vivek > > As I stated, I don't really have precise performance data here, but the > time I got was comparable to the rough 3-4 hours with a larger > crashkernel size that I got a successful dump on this same system with a > makedumpfile v1.4. We haven't made a good apples-apples comparison > between the two at this point, but this is how long this 4 TB system has > been taking to dump, dump level =31, so we feel we are in the same > ballpark with makedumpfile v1.5.1. > > It does seem that the "Excluding pages" parts take up a lot of the time > in the dump, as opposed to the copying, but I don't have a good > breakdown. > > I have added the debug mem_level 3 to kdump.conf file, and have seen > used memory on this machine recorded right before makedumpfile creates > the bitmap and starts filtering be around 140 MB, and have seen > makedumpfile fail, with OOM killer active after this point with a > crashkernel size of 256 MB or 384 MB using makedumpfile v1.4. That's sounds right. makedumpfile requires roughly 64MB of memory per TB. So to be able to filter out 4TB, one needs 256MB of free memory. So no wonder makedumpfile v1.4 will fail with 140MB free. > > So makedumpfile v1.5.1 solves the above problem, and allows us to > successfully dump a 4 TB system with these smaller crashkernel sizes. Ok, great. Good to know v1.5.1 is atleast allowing dumping higher memory systems with smaller reserved amount of memory. > > We do need much better performance numbers to insure no regression from > makedumpfile v1.4, but I wanted you to get the feedback at least of what > testing we had done, and that it appears it is solving the primary > problem we were interested in, that we could dump many terabytes of > memory with crashkernel sizes fixed at 384 MB or below. Fair enough. Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-11-20 21:46 ` Vivek Goyal @ 2012-11-20 19:05 ` Lisa Mitchell 2012-11-21 13:54 ` Vivek Goyal 0 siblings, 1 reply; 15+ messages in thread From: Lisa Mitchell @ 2012-11-20 19:05 UTC (permalink / raw) To: Vivek Goyal Cc: kexec@lists.infradead.org, Atsushi Kumagai, Hoemann, Jerry, Cliff Wickman On Tue, 2012-11-20 at 21:46 +0000, Vivek Goyal wrote: > On Tue, Nov 20, 2012 at 06:03:20AM -0700, Lisa Mitchell wrote: > > On Tue, 2012-11-20 at 16:35 +0000, Vivek Goyal wrote: > > > On Tue, Nov 20, 2012 at 05:14:55AM -0700, Lisa Mitchell wrote: > > > > > > [..] > > > > I tested this makedumpfile v1.5.1-rc on a 4 TB DL980, on 2.6.32 based > > > > kernel, and got good results. With crashkernel=256M, and default > > > > settings (i.e. no cyclic buffer option selected), the dump successfully > > > > completed in about 2 hours, 40 minutes, and then I specified a cyclic > > > > buffer size of 48 M, and the dump completed in the same time, no > > > > measurable differences within the accuracy of our measurements. > > > > > > This sounds little odd to me. > > > > > > - With smaller buffer size of 48M, it should have taken much more time > > > to finish the dump as compared to when no restriction was put on > > > buffer size. I am assuming that out of 256M reserved, say around 128MB > > > was available for makedumpfile to use. > > > > > > - Also 2 hours 40 minutes sounds a lot. Is it practical to wait that > > > long for a machine to dump before it can be brought into service > > > again? Do you have any data w.r.t older makedumpfile (which did not > > > have cyclic buffer logic). > > > > > > I have some data which I collected in 2008. 128GB system took roughly > > > 4 minutes to filter and save dumpfile. So if we scale it linearly > > > then it should take around 32minutes per TB. Hence around 2 hours > > > 8 minutes for a 4TB systems. Your numbers do seems to be in roughly > > > inline. > > > > > > Still 2-2.5 hours seems too long to be able to filter and save core of a > > > 4TB system. We will probably need to figure out what's taking so much of > > > time. May be we need to look into cliff wickman's idea of kernel returning > > > list of pfns to dump and make dump 20 time faster. I will love to have 4TB > > > system dumped in 6 minutes as opposed to 2 hrs. :-) > > > > > > Thanks > > > Vivek > > > > As I stated, I don't really have precise performance data here, but the > > time I got was comparable to the rough 3-4 hours with a larger > > crashkernel size that I got a successful dump on this same system with a > > makedumpfile v1.4. We haven't made a good apples-apples comparison > > between the two at this point, but this is how long this 4 TB system has > > been taking to dump, dump level =31, so we feel we are in the same > > ballpark with makedumpfile v1.5.1. > > > > It does seem that the "Excluding pages" parts take up a lot of the time > > in the dump, as opposed to the copying, but I don't have a good > > breakdown. > > > > I have added the debug mem_level 3 to kdump.conf file, and have seen > > used memory on this machine recorded right before makedumpfile creates > > the bitmap and starts filtering be around 140 MB, and have seen > > makedumpfile fail, with OOM killer active after this point with a > > crashkernel size of 256 MB or 384 MB using makedumpfile v1.4. > > That's sounds right. makedumpfile requires roughly 64MB of memory per TB. > So to be able to filter out 4TB, one needs 256MB of free memory. So no > wonder makedumpfile v1.4 will fail with 140MB free. > > > > > So makedumpfile v1.5.1 solves the above problem, and allows us to > > successfully dump a 4 TB system with these smaller crashkernel sizes. > > Ok, great. Good to know v1.5.1 is atleast allowing dumping higher memory > systems with smaller reserved amount of memory. > > > > > We do need much better performance numbers to insure no regression from > > makedumpfile v1.4, but I wanted you to get the feedback at least of what > > testing we had done, and that it appears it is solving the primary > > problem we were interested in, that we could dump many terabytes of > > memory with crashkernel sizes fixed at 384 MB or below. > > Fair enough. > > Thanks > Vivek That said, I am very interested in seeing any changes to makedumpfile or anywhere in the kexec/kdump code that promise substantial improvements for dump performance, especially to the multi TB systems. The dump time currently is very long for these systems, and the customers for these large systems want minimum downtime, so improving the current status quo is a high priority. The changes proposed by Ciff Wickman in http://lists.infradead.org/pipermail/kexec/2012-November/007178.html sound like they could bring big improvements in performance, so these should be investigated. I would like to try a version of them built on top of makedumpfile v1.5.1-rc, to try on our 4 TB system, to see what performance gains we can get, as an experiment. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-11-20 19:05 ` Lisa Mitchell @ 2012-11-21 13:54 ` Vivek Goyal 2012-11-22 0:49 ` Hatayama, Daisuke 0 siblings, 1 reply; 15+ messages in thread From: Vivek Goyal @ 2012-11-21 13:54 UTC (permalink / raw) To: Lisa Mitchell Cc: kexec@lists.infradead.org, Atsushi Kumagai, Hoemann, Jerry, Cliff Wickman On Tue, Nov 20, 2012 at 12:05:41PM -0700, Lisa Mitchell wrote: > On Tue, 2012-11-20 at 21:46 +0000, Vivek Goyal wrote: > > On Tue, Nov 20, 2012 at 06:03:20AM -0700, Lisa Mitchell wrote: > > > On Tue, 2012-11-20 at 16:35 +0000, Vivek Goyal wrote: > > > > On Tue, Nov 20, 2012 at 05:14:55AM -0700, Lisa Mitchell wrote: > > > > > > > > [..] > > > > > I tested this makedumpfile v1.5.1-rc on a 4 TB DL980, on 2.6.32 based > > > > > kernel, and got good results. With crashkernel=256M, and default > > > > > settings (i.e. no cyclic buffer option selected), the dump successfully > > > > > completed in about 2 hours, 40 minutes, and then I specified a cyclic > > > > > buffer size of 48 M, and the dump completed in the same time, no > > > > > measurable differences within the accuracy of our measurements. > > > > > > > > This sounds little odd to me. > > > > > > > > - With smaller buffer size of 48M, it should have taken much more time > > > > to finish the dump as compared to when no restriction was put on > > > > buffer size. I am assuming that out of 256M reserved, say around 128MB > > > > was available for makedumpfile to use. > > > > > > > > - Also 2 hours 40 minutes sounds a lot. Is it practical to wait that > > > > long for a machine to dump before it can be brought into service > > > > again? Do you have any data w.r.t older makedumpfile (which did not > > > > have cyclic buffer logic). > > > > > > > > I have some data which I collected in 2008. 128GB system took roughly > > > > 4 minutes to filter and save dumpfile. So if we scale it linearly > > > > then it should take around 32minutes per TB. Hence around 2 hours > > > > 8 minutes for a 4TB systems. Your numbers do seems to be in roughly > > > > inline. > > > > > > > > Still 2-2.5 hours seems too long to be able to filter and save core of a > > > > 4TB system. We will probably need to figure out what's taking so much of > > > > time. May be we need to look into cliff wickman's idea of kernel returning > > > > list of pfns to dump and make dump 20 time faster. I will love to have 4TB > > > > system dumped in 6 minutes as opposed to 2 hrs. :-) > > > > > > > > Thanks > > > > Vivek > > > > > > As I stated, I don't really have precise performance data here, but the > > > time I got was comparable to the rough 3-4 hours with a larger > > > crashkernel size that I got a successful dump on this same system with a > > > makedumpfile v1.4. We haven't made a good apples-apples comparison > > > between the two at this point, but this is how long this 4 TB system has > > > been taking to dump, dump level =31, so we feel we are in the same > > > ballpark with makedumpfile v1.5.1. > > > > > > It does seem that the "Excluding pages" parts take up a lot of the time > > > in the dump, as opposed to the copying, but I don't have a good > > > breakdown. > > > > > > I have added the debug mem_level 3 to kdump.conf file, and have seen > > > used memory on this machine recorded right before makedumpfile creates > > > the bitmap and starts filtering be around 140 MB, and have seen > > > makedumpfile fail, with OOM killer active after this point with a > > > crashkernel size of 256 MB or 384 MB using makedumpfile v1.4. > > > > That's sounds right. makedumpfile requires roughly 64MB of memory per TB. > > So to be able to filter out 4TB, one needs 256MB of free memory. So no > > wonder makedumpfile v1.4 will fail with 140MB free. > > > > > > > > So makedumpfile v1.5.1 solves the above problem, and allows us to > > > successfully dump a 4 TB system with these smaller crashkernel sizes. > > > > Ok, great. Good to know v1.5.1 is atleast allowing dumping higher memory > > systems with smaller reserved amount of memory. > > > > > > > > We do need much better performance numbers to insure no regression from > > > makedumpfile v1.4, but I wanted you to get the feedback at least of what > > > testing we had done, and that it appears it is solving the primary > > > problem we were interested in, that we could dump many terabytes of > > > memory with crashkernel sizes fixed at 384 MB or below. > > > > Fair enough. > > > > Thanks > > Vivek > > That said, I am very interested in seeing any changes to makedumpfile or > anywhere in the kexec/kdump code that promise substantial improvements > for dump performance, especially to the multi TB systems. The dump time > currently is very long for these systems, and the customers for these > large systems want minimum downtime, so improving the current status quo > is a high priority. > > The changes proposed by Ciff Wickman in > http://lists.infradead.org/pipermail/kexec/2012-November/007178.html > sound like they could bring big improvements in performance, so these > should be investigated. I would like to try a version of them built on > top of makedumpfile v1.5.1-rc, to try on our 4 TB system, to see what > performance gains we can get, as an experiment. I am wondering if it is time to look into parallel processing. Somebody was working on bringing up more cpus in kdump kernel. If that works, the probably multiple makedumpfile threads can try to filter out different sections of physical memory. Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [RFC] makedumpfile-1.5.1 RC 2012-11-21 13:54 ` Vivek Goyal @ 2012-11-22 0:49 ` Hatayama, Daisuke 2012-11-26 16:02 ` Vivek Goyal 0 siblings, 1 reply; 15+ messages in thread From: Hatayama, Daisuke @ 2012-11-22 0:49 UTC (permalink / raw) To: Vivek Goyal Cc: Atsushi Kumagai, kexec@lists.infradead.org, Hoemann, Jerry, Lisa Mitchell, Cliff Wickman > -----Original Message----- > From: kexec-bounces@lists.infradead.org > [mailto:kexec-bounces@lists.infradead.org] On Behalf Of Vivek Goyal > Sent: Wednesday, November 21, 2012 10:54 PM > To: Lisa Mitchell > Cc: kexec@lists.infradead.org; Atsushi Kumagai; Hoemann, Jerry; Cliff > Wickman > Subject: Re: [RFC] makedumpfile-1.5.1 RC [...] > > The changes proposed by Ciff Wickman in > > http://lists.infradead.org/pipermail/kexec/2012-November/007178.html > > sound like they could bring big improvements in performance, so these > > should be investigated. I would like to try a version of them built on > > top of makedumpfile v1.5.1-rc, to try on our 4 TB system, to see what > > performance gains we can get, as an experiment. > > I am wondering if it is time to look into parallel processing. Somebody > was working on bringing up more cpus in kdump kernel. If that works, the > probably multiple makedumpfile threads can try to filter out different > sections of physical memory. > Makedumpfile has already had such parallel processing feature: $ ./makedumpfile --help ... [--split]: Split the dump data to multiple DUMPFILEs in parallel. If specifying DUMPFILEs on different storage devices, a device can share I/O load with other devices and it reduces time for saving the dump data. The file size of each DUMPFILE is smaller than the system memory size which is divided by the number of DUMPFILEs. This feature supports only the kdump-compressed format. Doing makedumpfile like: $ makedumpfile --split dumpfile vmcore1 vmcore2 [vmcore3 ... vmcore_n] original dumpfile are splitted into n vmcores of kdump-compressed formats, each of which has the same size basically and where n processes are used, not threads. (The author told me the reason why process was chosen that he didn't want to put relatively large libc library in the 2nd kernel at that time. But recently, libc library is present on the 2nd kernel as scp needs to use it. This might no longer pointless.) I think Cliff's idea works orthogonally to parallel processing. I'll also test it on our machine. Also, sorry for delaying the work on multiple cpus in the 2nd kernel. Posting new version of the patch set is delayed a few weeks more. But it's possible to wake up AP cpus in the 2nd kernel safely if BIOS always assigns 0 lapicid to BSP since then if kexec enteres 2nd kernel with some AP lcpu, kernel always assigns 1 lcpu number to BSP lcpu. So, maxcpus=1 and waking up cpus except for 1 lcpu works as a workaround. If anyone wants to bench with parallel processing, please do it like this. Thanks. HATAYAMA, Daisuke _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-11-22 0:49 ` Hatayama, Daisuke @ 2012-11-26 16:02 ` Vivek Goyal 0 siblings, 0 replies; 15+ messages in thread From: Vivek Goyal @ 2012-11-26 16:02 UTC (permalink / raw) To: Hatayama, Daisuke Cc: Atsushi Kumagai, kexec@lists.infradead.org, Hoemann, Jerry, Lisa Mitchell, Cliff Wickman On Thu, Nov 22, 2012 at 12:49:35AM +0000, Hatayama, Daisuke wrote: > > -----Original Message----- > > From: kexec-bounces@lists.infradead.org > > [mailto:kexec-bounces@lists.infradead.org] On Behalf Of Vivek Goyal > > Sent: Wednesday, November 21, 2012 10:54 PM > > To: Lisa Mitchell > > Cc: kexec@lists.infradead.org; Atsushi Kumagai; Hoemann, Jerry; Cliff > > Wickman > > Subject: Re: [RFC] makedumpfile-1.5.1 RC > [...] > > > The changes proposed by Ciff Wickman in > > > http://lists.infradead.org/pipermail/kexec/2012-November/007178.html > > > sound like they could bring big improvements in performance, so these > > > should be investigated. I would like to try a version of them built on > > > top of makedumpfile v1.5.1-rc, to try on our 4 TB system, to see what > > > performance gains we can get, as an experiment. > > > > I am wondering if it is time to look into parallel processing. Somebody > > was working on bringing up more cpus in kdump kernel. If that works, the > > probably multiple makedumpfile threads can try to filter out different > > sections of physical memory. > > > > Makedumpfile has already had such parallel processing feature: > > $ ./makedumpfile --help > ... > [--split]: > Split the dump data to multiple DUMPFILEs in parallel. If specifying > DUMPFILEs on different storage devices, a device can share I/O load with > other devices and it reduces time for saving the dump data. The file size > of each DUMPFILE is smaller than the system memory size which is divided > by the number of DUMPFILEs. > This feature supports only the kdump-compressed format. > > Doing makedumpfile like: > > $ makedumpfile --split dumpfile vmcore1 vmcore2 [vmcore3 ... vmcore_n] > Ok, this is interesting. So reassembling of various vmcore fragments happen later and user needs to explicitly do that? > original dumpfile are splitted into n vmcores of kdump-compressed formats, each of > which has the same size basically and where n processes are used, not threads. > (The author told me the reason why process was chosen that he didn't want to put > relatively large libc library in the 2nd kernel at that time. But recently, libc library is > present on the 2nd kernel as scp needs to use it. This might no longer pointless.) > > I think Cliff's idea works orthogonally to parallel processing. I'll also test it on our > machine. > > Also, sorry for delaying the work on multiple cpus in the 2nd kernel. Posting new > version of the patch set is delayed a few weeks more. But it's possible to wake up > AP cpus in the 2nd kernel safely if BIOS always assigns 0 lapicid to BSP since > then if kexec enteres 2nd kernel with some AP lcpu, kernel always assigns 1 lcpu > number to BSP lcpu. So, maxcpus=1 and waking up cpus except for 1 lcpu works > as a workaround. If anyone wants to bench with parallel processing, please do it > like this. Thanks. If you happen to do some benchmarking, please do share the numebrs. I am really curious to know if this parallel processing will speed up the things enough to have reasonable dump times on multi tera byte machines. Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-11-20 12:14 ` Lisa Mitchell 2012-11-20 16:35 ` Vivek Goyal @ 2012-12-04 13:31 ` Lisa Mitchell 2012-12-07 5:26 ` Atsushi Kumagai 1 sibling, 1 reply; 15+ messages in thread From: Lisa Mitchell @ 2012-12-04 13:31 UTC (permalink / raw) To: Atsushi Kumagai; +Cc: kexec@lists.infradead.org, jerry.hoemann On Tue, 2012-11-20 at 05:14 -0700, Lisa Mitchell wrote: > > > I tested this makedumpfile v1.5.1-rc on a 4 TB DL980, on 2.6.32 based > kernel, and got good results. With crashkernel=256M, and default > settings (i.e. no cyclic buffer option selected), the dump successfully > completed in about 2 hours, 40 minutes, and then I specified a cyclic > buffer size of 48 M, and the dump completed in the same time, no > measurable differences within the accuracy of our measurements. > > We are still evaluating perfomance data, and don't have very precise > measurements here for comparisons, but the results look promising so > far. > > Lisa Mitchell Update: I did another test over the last few days that was a better apples-to- apples comparison, contrasting the performance of makedumpfile 1.4 with makedumpfile v1.5.1-rc on a RHEL 6.3 system with 4 TB of memory. Earlier I had not taken good comparable measurements of the dump times, from the exact same machine configuration comparisons of the timing differences between the two makedumpfiles. I had noted that makedumpfile 1.5.1-rc seemed a performance improvemnt over makedumpfile v1.5.0 results seen earlier. Unfortunately this weekend, the results showed a significant performance regression still with makedumpfile v1.5.1-rc compared to makedumpfile 1.4 This time my performance measurements were based on comparing the file system timestamps in the /var/crash directory, showing the difference from when the crash directory was created by makedumpfile, to the timestamp on the vmcore file, to show when the copy of the memory to this file was complete. 1. Baseline: On the 4 TB DL980, with the RHEL 6.3 installation,(2.6.32 based kernel) with a crashkernel size of 512M or 384M (both big enough to contain the 256M bit map required, plus the kernel).The makedumpfile command line was the same for both tests: " -c --message-level 1 -d 31" The timestamps shown for the dump copy were: # cd /var/crash # ls 127.0.0.1-2012-11-30-15:28:22 # cd 127.0.0.1-2012-11-30-15:28:22 #ls -al 127.0.0.1-2012-11-30-15:28:22 total 10739980^M drwxr-xr-x. 2 root root 4096 Nov 30 17:07 .^M drwxr-xr-x. 3 root root 4096 Nov 30 15:28 ..^M -rw-------. 1 root root 10997727069 Nov 30 17:07 vmcore From the time stamps above the dump started at 15:28, completed at 17:07, the dump time was 1 hour, 41 minutes. 2. Makedumpfile-v1.5.1-rc on the same system configuration as (1.) above, but with crashkernel size set to 256 M to insure the use of the cyclic buffer feature to fit in smaller crashkernel space. The same makedumpfile command line of "-c --message-level 1 -d 31" was used. #cd /var/crash # ls -al total 12 drwxr-xr-x. 3 root root 4096 Nov 30 23:25 . drwxr-xr-x. 22 root root 4096 Nov 30 08:41 .. drwxr-xr-x. 2 root root 4096 Dec 1 02:05 127.0.0.1-2012-11-30-23:25:20 #ls -al * total 10734932 drwxr-xr-x. 2 root root 4096 Dec 1 02:05 . drwxr-xr-x. 3 root root 4096 Nov 30 23:25 .. -rw-------. 1 root root 10992554141 Dec 1 02:05 vmcore From the timestamps above, the dump started at 23:25 and completed at 2:05 after midnight, so the total dump time was 2 hours and 40 minutes. So for this 4 TB system, in this test, the dump write phase took 1 hour longer for makedumpfile-v1.5.1-rc, versus makedumpfile v1.4. This time seems dominated by the dump filtering activity, assuming the copy to disk times should have been the same, though I don't have a good breakdown. I look forward to the GA version of makedumpfile v1.5.1 to see if there are any improvements, but it now looks to me like there are still a lot of improvements needed before v1.5.1 will have performance parity with v1.4 Has anyone else done performance comparisons on multi-terabyte systems between makedumpfile 1.5.1 and makedumpfile 1.4, to see if others get similar results, or if my measurement method is inaccurate? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-12-04 13:31 ` Lisa Mitchell @ 2012-12-07 5:26 ` Atsushi Kumagai 2012-12-10 21:06 ` Lisa Mitchell 0 siblings, 1 reply; 15+ messages in thread From: Atsushi Kumagai @ 2012-12-07 5:26 UTC (permalink / raw) To: lisa.mitchell; +Cc: kexec, jerry.hoemann Hello Lisa, On Tue, 04 Dec 2012 06:31:39 -0700 Lisa Mitchell <lisa.mitchell@hp.com> wrote: > On Tue, 2012-11-20 at 05:14 -0700, Lisa Mitchell wrote: > > > > > > > I tested this makedumpfile v1.5.1-rc on a 4 TB DL980, on 2.6.32 based > > kernel, and got good results. With crashkernel=256M, and default > > settings (i.e. no cyclic buffer option selected), the dump successfully > > completed in about 2 hours, 40 minutes, and then I specified a cyclic > > buffer size of 48 M, and the dump completed in the same time, no > > measurable differences within the accuracy of our measurements. > > > > We are still evaluating perfomance data, and don't have very precise > > measurements here for comparisons, but the results look promising so > > far. > > > > Lisa Mitchell > > Update: > > I did another test over the last few days that was a better apples-to- > apples comparison, contrasting the performance of makedumpfile 1.4 with > makedumpfile v1.5.1-rc on a RHEL 6.3 system with 4 TB of memory. Thanks for your measurement. It's very helpful because I can't get a chance to use such a large memory machine. > Earlier I had not taken good comparable measurements of the dump times, > from the exact same machine configuration comparisons of the timing > differences between the two makedumpfiles. I had noted that > makedumpfile 1.5.1-rc seemed a performance improvemnt over makedumpfile > v1.5.0 results seen earlier. > > Unfortunately this weekend, the results showed a significant performance > regression still with makedumpfile v1.5.1-rc compared to makedumpfile > 1.4 > > This time my performance measurements were based on comparing the file > system timestamps in the /var/crash directory, showing the difference > from when the crash directory was created by makedumpfile, to the > timestamp on the vmcore file, to show when the copy of the memory to > this file was complete. > > 1. Baseline: On the 4 TB DL980, with the RHEL 6.3 installation,(2.6.32 > based kernel) with a crashkernel size of 512M or 384M (both big enough > to contain the 256M bit map required, plus the kernel).The makedumpfile > command line was the same for both tests: " -c --message-level 1 -d 31" > The timestamps shown for the dump copy were: > > # cd /var/crash > # ls > 127.0.0.1-2012-11-30-15:28:22 > # cd 127.0.0.1-2012-11-30-15:28:22 > #ls -al 127.0.0.1-2012-11-30-15:28:22 > total 10739980^M > drwxr-xr-x. 2 root root 4096 Nov 30 17:07 .^M > drwxr-xr-x. 3 root root 4096 Nov 30 15:28 ..^M > -rw-------. 1 root root 10997727069 Nov 30 17:07 vmcore > > >From the time stamps above the dump started at 15:28, completed at > 17:07, the dump time was 1 hour, 41 minutes. > > 2. Makedumpfile-v1.5.1-rc on the same system configuration as (1.) > above, but with crashkernel size set to 256 M to insure the use of the > cyclic buffer feature to fit in smaller crashkernel space. The same > makedumpfile command line of "-c --message-level 1 -d 31" was used. > > #cd /var/crash > # ls -al > total 12 > drwxr-xr-x. 3 root root 4096 Nov 30 23:25 . > drwxr-xr-x. 22 root root 4096 Nov 30 08:41 .. > drwxr-xr-x. 2 root root 4096 Dec 1 02:05 127.0.0.1-2012-11-30-23:25:20 > > #ls -al * > total 10734932 > drwxr-xr-x. 2 root root 4096 Dec 1 02:05 . > drwxr-xr-x. 3 root root 4096 Nov 30 23:25 .. > -rw-------. 1 root root 10992554141 Dec 1 02:05 vmcore > > >From the timestamps above, the dump started at 23:25 and completed at > 2:05 after midnight, so the total dump time was 2 hours and 40 minutes. > > So for this 4 TB system, in this test, the dump write phase took 1 hour > longer for makedumpfile-v1.5.1-rc, versus makedumpfile v1.4. This time > seems dominated by the dump filtering activity, assuming the copy to > disk times should have been the same, though I don't have a good > breakdown. As you may understand, the number of cycle is two (or larger) in your test (2.). And it seems that you used free_list logic because you specified neither -x vmlinux option nor -i vmcoreinfo_text_file option. (Please see release note for how to use mem_map array logic.) http://lists.infradead.org/pipermail/kexec/2012-December/007460.html This combination means that redundant scans was done in your test, I think makedumpfile-v1.5.1-rc couldn't show the best performance we expected. So, could you do the same test with v1.5.1-GA (but, the logic isn't different from rc) and -i vmcoreinfo_text_file option ? We should see its result and discuss it. In addition, you need to include vmcoreinfo_text_file in initramfs in order to use -i option. If you have RedHat OS, you can refer to /sbin/mkdumprd to know how to do it. Thanks Atushi Kumagai _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-12-07 5:26 ` Atsushi Kumagai @ 2012-12-10 21:06 ` Lisa Mitchell 2012-12-13 5:06 ` Atsushi Kumagai 0 siblings, 1 reply; 15+ messages in thread From: Lisa Mitchell @ 2012-12-10 21:06 UTC (permalink / raw) To: Atsushi Kumagai; +Cc: kexec@lists.infradead.org, Hoemann, Jerry On Fri, 2012-12-07 at 05:26 +0000, Atsushi Kumagai wrote: > As you may understand, the number of cycle is two (or larger) in your > test (2.). And it seems that you used free_list logic because you > specified neither -x vmlinux option nor -i vmcoreinfo_text_file option. > (Please see release note for how to use mem_map array logic.) > > http://lists.infradead.org/pipermail/kexec/2012-December/007460.html > > This combination means that redundant scans was done in your test, > I think makedumpfile-v1.5.1-rc couldn't show the best performance we expected. > > So, could you do the same test with v1.5.1-GA (but, the logic isn't different > from rc) and -i vmcoreinfo_text_file option ? We should see its result and > discuss it. > > In addition, you need to include vmcoreinfo_text_file in initramfs in order > to use -i option. If you have RedHat OS, you can refer to /sbin/mkdumprd > to know how to do it. > > > Thanks > Atushi Kumagai Atushi, I put the kernel patch from https://lkml.org/lkml/2012/11/21/90 that you had in the release notes, along with the modifications you specified for a 2.6.32 kernel in http://lists.infradead.org/pipermail/kexec/2012-December/007461.html on my RHEL 6.3 kernel source, and built a patched kernel in order to hopefully enable use of the mem map array logic feature during my dump testing. I do not have the use of the 4 TB system again, so I constrained a 256 GB system to a crashkernel size of 136M, which would cause the cyclic buffer feature to be used and timed some dumps. I compared the dump time on the system with the makedumpfile 1.4 version that ships with RHEL 6.3, using crashkernel=256M to contain the full bitmap, to both the patched and unpatched kernels using makedumpfilev1.5.1GA. Here were the results, using the file timestamps. All dumps were taken with core_collector makedumpfile -c --message-level 1 -d 31 1. RHEL 6.3 2.6.32.279 kernel, makedumpfile 1.4, crashkernel=256M ls -al --time-style=full-iso 127.0.0.1-2012-12-10-16:44 total 802160 drwxr-xr-x. 2 root root 4096 2012-12-10 16:51:36.909648053 -0700 . drwxr-xr-x. 12 root root 4096 2012-12-10 16:44:59.213529059 -0700 .. -rw-------. 1 root root 821396774 2012-12-10 16:51:36.821529854 -0700 vmcore Time to write out the dump file: 6.5 minutes 2. RHEL 6.3 2.6.32.279 kernel, makedumpfile 1.5.1GA, crashkernel=136M ls -al --time-style=full-iso 127.0.0.1-2012-12-10-15:17:18 total 806132 drwxr-xr-x. 2 root root 4096 2012-12-10 15:27:28.799600723 -0700 . drwxr-xr-x. 11 root root 4096 2012-12-10 15:17:19.202329188 -0700 .. -rw-------. 1 root root 825465058 2012-12-10 15:27:28.774327293 -0700 vmcore Time to write out the dump file: 10 minutes, 10 seconds 3. Patched RHEL 6.3 kernel, makedumpfile 1.5.1GA, crashkernel=136M ls -al --time-style=full-iso 127.0.0.1-2012-12-10-14:42 ^M:28^M total 808764^M drwxr-xr-x. 2 root root 4096 2012-12-10 14:50:04.263144379 -0700 .^M drwxr-xr-x. 10 root root 4096 2012-12-10 14:42:29.230903264 -0700 ..^M -rw-------. 1 root root 828160709 2012-12-10 14:50:04.212739485 -0700 vmcore^M Time to write out the dump file: 7.5 minutes The above indicates that with the kernel patch we got a dump file write time 2 minutes shorter than using makedumpfile 1.5.1 without the kernel patch. However, with the kernel patch (and hopefully this enabled the mem map array logic feature) I still got a dump time that was about 2 minutes longer, or in this case about 30% longer than the old makedumpfile 1.4, using the full bitmap. So I still see a regression, which will have to be projected to the multi TB systems. Atushi, am I using the new makedumpfile 1.5.1GA correctly with the kernel patch? I didn't understand how to use the options of makedumpfile you mentioned, and when I tried with a vmlinux file, and the -x option, makedumpfile didn't even start, just failed and reset. I was hoping with the kernel patch in place, that with the default settings of makedumpfile, the mem map array logic would automatically be used. If not, I am still puzzled as to how to invoke it. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-12-10 21:06 ` Lisa Mitchell @ 2012-12-13 5:06 ` Atsushi Kumagai 2012-12-18 17:20 ` Lisa Mitchell 0 siblings, 1 reply; 15+ messages in thread From: Atsushi Kumagai @ 2012-12-13 5:06 UTC (permalink / raw) To: lisa.mitchell; +Cc: kexec, jerry.hoemann Hello Lisa, On Mon, 10 Dec 2012 14:06:05 -0700 Lisa Mitchell <lisa.mitchell@hp.com> wrote: > On Fri, 2012-12-07 at 05:26 +0000, Atsushi Kumagai wrote: > > Atushi, I put the kernel patch from https://lkml.org/lkml/2012/11/21/90 > that you had in the release notes, along with the modifications you > specified for a 2.6.32 kernel in > http://lists.infradead.org/pipermail/kexec/2012-December/007461.html > on my RHEL 6.3 kernel source, and built a patched kernel in order to > hopefully enable use of the mem map array logic feature during my dump > testing. > > I do not have the use of the 4 TB system again, so I constrained a 256 > GB system to a crashkernel size of 136M, which would cause the cyclic > buffer feature to be used and timed some dumps. > > I compared the dump time on the system with the makedumpfile 1.4 version > that ships with RHEL 6.3, using crashkernel=256M to contain the full > bitmap, to both the patched and unpatched kernels using > makedumpfilev1.5.1GA. Here were the results, using the file timestamps. > All dumps were taken with core_collector makedumpfile -c --message-level > 1 -d 31 > > > 1. RHEL 6.3 2.6.32.279 kernel, makedumpfile 1.4, crashkernel=256M > ls -al --time-style=full-iso 127.0.0.1-2012-12-10-16:44 > total 802160 > drwxr-xr-x. 2 root root 4096 2012-12-10 16:51:36.909648053 -0700 . > drwxr-xr-x. 12 root root 4096 2012-12-10 16:44:59.213529059 > -0700 .. > -rw-------. 1 root root 821396774 2012-12-10 16:51:36.821529854 -0700 > vmcore > > Time to write out the dump file: 6.5 minutes > > > 2. RHEL 6.3 2.6.32.279 kernel, makedumpfile 1.5.1GA, crashkernel=136M > > ls -al --time-style=full-iso 127.0.0.1-2012-12-10-15:17:18 > total 806132 > drwxr-xr-x. 2 root root 4096 2012-12-10 15:27:28.799600723 -0700 . > drwxr-xr-x. 11 root root 4096 2012-12-10 15:17:19.202329188 > -0700 .. > -rw-------. 1 root root 825465058 2012-12-10 15:27:28.774327293 -0700 > vmcore > > Time to write out the dump file: 10 minutes, 10 seconds > > 3. Patched RHEL 6.3 kernel, makedumpfile 1.5.1GA, crashkernel=136M > > ls -al --time-style=full-iso 127.0.0.1-2012-12-10-14:42 ^M:28^M > total 808764^M > drwxr-xr-x. 2 root root 4096 2012-12-10 14:50:04.263144379 > -0700 .^M > drwxr-xr-x. 10 root root 4096 2012-12-10 14:42:29.230903264 > -0700 ..^M > -rw-------. 1 root root 828160709 2012-12-10 14:50:04.212739485 -0700 > vmcore^M > > Time to write out the dump file: 7.5 minutes > > > The above indicates that with the kernel patch we got a dump file write > time 2 minutes shorter than using makedumpfile 1.5.1 without the kernel > patch. However, with the kernel patch (and hopefully this enabled the > mem map array logic feature) I still got a dump time that was about 2 > minutes longer, or in this case about 30% longer than the old > makedumpfile 1.4, using the full bitmap. > > So I still see a regression, which will have to be projected to the > multi TB systems. In cyclic mode, we can save only a chunk of bitmap at a time, this fact forces us to scan each cyclic region twice as below: Step1: To determine the offset of kdump's page data region. Step2: To distinguish whether each page is unnecessary or not. Step1 should be done before writing phase (write_kdump_pages_and_bitmap_cyclic()) and step2 is run while writing phase, the whole scan is needed for each step. On the other hand, v1.4 can execute both step1 and step2 with the temporary bitmap file, the whole scan is done just one time to create the file. It's a disadvantage in performance, but I think it's unavoidable. (There is the exception when the number of cycles is 1, but current version also scan twice in spite of redundancy.) If more performance is needed, I think we should invent other approaches like the idea discussed in the thread below: http://lists.infradead.org/pipermail/kexec/2012-December/007494.html Besides, I think v1.4 with the local disc which can contain the temporary bitmap file is the fastest version for now. > Atushi, am I using the new makedumpfile 1.5.1GA correctly with the > kernel patch? Yes, I think you can use mem_map array logic correctly with the patch. And you can confirm it with -D option. If you didn't meet the conditions to use mem_map array logic, the message below will be showed. "Can't select page_is_buddy handler; follow free lists instead of mem_map array." > I didn't understand how to use the options of makedumpfile you > mentioned, and when I tried with a vmlinux file, and the -x option, > makedumpfile didn't even start, just failed and reset. It might be another problem related -x option. For investigation, could you run the command below and show its messages ? There is no need to run in 2nd kernel environment. # makedumpfile -g vmcoreinfo -x vmlinux Thanks Atsushi Kumagai > > I was hoping with the kernel patch in place, that with the default > settings of makedumpfile, the mem map array logic would automatically be > used. If not, I am still puzzled as to how to invoke it. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-12-13 5:06 ` Atsushi Kumagai @ 2012-12-18 17:20 ` Lisa Mitchell 2012-12-21 6:19 ` Atsushi Kumagai 0 siblings, 1 reply; 15+ messages in thread From: Lisa Mitchell @ 2012-12-18 17:20 UTC (permalink / raw) To: Atsushi Kumagai; +Cc: kexec@lists.infradead.org, Hoemann, Jerry On Thu, 2012-12-13 at 05:06 +0000, Atsushi Kumagai wrote: > > In cyclic mode, we can save only a chunk of bitmap at a time, > this fact forces us to scan each cyclic region twice as below: > > Step1: To determine the offset of kdump's page data region. > Step2: To distinguish whether each page is unnecessary or not. > > Step1 should be done before writing phase (write_kdump_pages_and_bitmap_cyclic()) > and step2 is run while writing phase, the whole scan is needed for > each step. > On the other hand, v1.4 can execute both step1 and step2 with the temporary > bitmap file, the whole scan is done just one time to create the file. > > It's a disadvantage in performance, but I think it's unavoidable. > (There is the exception when the number of cycles is 1, but current > version also scan twice in spite of redundancy.) > > If more performance is needed, I think we should invent other > approaches like the idea discussed in the thread below: > > http://lists.infradead.org/pipermail/kexec/2012-December/007494.html > > Besides, I think v1.4 with the local disc which can contain the temporary > bitmap file is the fastest version for now. > > > Atushi, am I using the new makedumpfile 1.5.1GA correctly with the > > kernel patch? > > Yes, I think you can use mem_map array logic correctly with the patch. > And you can confirm it with -D option. If you didn't meet the conditions > to use mem_map array logic, the message below will be showed. > > "Can't select page_is_buddy handler; follow free lists instead of mem_map array." > > > I didn't understand how to use the options of makedumpfile you > > mentioned, and when I tried with a vmlinux file, and the -x option, > > makedumpfile didn't even start, just failed and reset. > > It might be another problem related -x option. > For investigation, could you run the command below and show its messages ? > There is no need to run in 2nd kernel environment. > > # makedumpfile -g vmcoreinfo -x vmlinux > > > Thanks > Atsushi Kumagai > Thanks for this info, Atsushi. I was able to test makedumpfile-v1.5.1 on the 4 TB DL980 we had this weekend, along with the kexec patch to invoke the memory array logic, and I got encouraging results, in that the difference in dump time between makedumpfile 1.4 on a RHEL 6.3 system and makedumpfile-v1.5.1 with the memory array logic seems to be now very small: Here are my results (file system timestamp data and note the system had it's filesystem time set way in the past): 1. makedumpfile 1.4 (RHEL 6.3 default), crashkernel 512M: root@spb crash]# ls -al --time-style=full-iso 127.0.0.1-2012-05-09-19:55:50^M total 10757984^M drwxr-xr-x. 2 root root 4096 2012-05-09 21:53:21.289507559 -0600 .^M drwxr-xr-x. 4 root root 4096 2012-05-09 22:10:08.729553037 -0600 ..^M -rw-------. 1 root root 11016160846 2012-05-09 21:53:21.020384817 -0600 vmcore^ 21:53:21 - 19:55:50 Dump filter/copy time: 1 hour, 57 minutes, 29 seconds 2. makedumpfile-v1.5.1, with kexec patch, using memory array logic, took 3 dumps to see variations in times: ls -al --time-style=full-iso 127.0.0.1-2012-05-10-23:42:35^M total 10444952^M drwxr-xr-x. 2 root root 4096 2012-05-11 01:52:18.512639105 -0600 .^M drwxr-xr-x. 6 root root 4096 2012-05-10 23:42:39.270955565 -0600 ..^M -rw-------. 1 root root 10695618226 2012-05-11 01:52:18.479636812 -0600 vmcore^M Dump filter/copy time: 2 hours, 9 minutes, 11 sec 127.0.0.1-2012-05-12-20:57:08:^M total 10469304^M drwxr-xr-x. 2 root root 4096 2012-05-12 23:05:39.082084132 -0600 .^M drwxr-xr-x. 5 root root 4096 2012-05-12 20:57:12.627084279 -0600 ..^M -rw-------. 1 root root 10720553208 2012-05-12 23:05:39.051082490 -0600 vmcore^M Dump filter/copy time: 2 hours 8 minutes 26 seconds 27.0.0.1-2012-05-10-09:52:17:^M total 10650776^M drwxr-xr-x. 2 root root 4096 2012-05-10 12:04:22.456078284 -0600 .^M drwxr-xr-x. 6 root root 4096 2012-05-10 09:52:22.068605263 -0600 ..^M -rw-------. 1 root root 10906381384 2012-05-10 12:04:22.425076466 -0600 vmcore Dump filter/copy time: 2 hours 13 minutes So the dump times seem to vary + or minus 2-3 minutes, and the average was about 2 hours 10 minutes, or 10-12 minutes longer than the makedumpfile 1.4 dump time for a 4 TB system, when using a crashkernel constrained to 384 MB, and the cyclic buffer feature is used. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] makedumpfile-1.5.1 RC 2012-12-18 17:20 ` Lisa Mitchell @ 2012-12-21 6:19 ` Atsushi Kumagai 0 siblings, 0 replies; 15+ messages in thread From: Atsushi Kumagai @ 2012-12-21 6:19 UTC (permalink / raw) To: lisa.mitchell; +Cc: kexec, jerry.hoemann Hello Lisa, On Tue, 18 Dec 2012 10:20:43 -0700 Lisa Mitchell <lisa.mitchell@hp.com> wrote: > On Thu, 2012-12-13 at 05:06 +0000, Atsushi Kumagai wrote: > > > > > In cyclic mode, we can save only a chunk of bitmap at a time, > > this fact forces us to scan each cyclic region twice as below: > > > > Step1: To determine the offset of kdump's page data region. > > Step2: To distinguish whether each page is unnecessary or not. > > > > Step1 should be done before writing phase (write_kdump_pages_and_bitmap_cyclic()) > > and step2 is run while writing phase, the whole scan is needed for > > each step. > > On the other hand, v1.4 can execute both step1 and step2 with the temporary > > bitmap file, the whole scan is done just one time to create the file. > > > > It's a disadvantage in performance, but I think it's unavoidable. > > (There is the exception when the number of cycles is 1, but current > > version also scan twice in spite of redundancy.) > > > > If more performance is needed, I think we should invent other > > approaches like the idea discussed in the thread below: > > > > http://lists.infradead.org/pipermail/kexec/2012-December/007494.html > > > > Besides, I think v1.4 with the local disc which can contain the temporary > > bitmap file is the fastest version for now. > > > > > Atushi, am I using the new makedumpfile 1.5.1GA correctly with the > > > kernel patch? > > > > Yes, I think you can use mem_map array logic correctly with the patch. > > And you can confirm it with -D option. If you didn't meet the conditions > > to use mem_map array logic, the message below will be showed. > > > > "Can't select page_is_buddy handler; follow free lists instead of mem_map array." > > > > > I didn't understand how to use the options of makedumpfile you > > > mentioned, and when I tried with a vmlinux file, and the -x option, > > > makedumpfile didn't even start, just failed and reset. > > > > It might be another problem related -x option. > > For investigation, could you run the command below and show its messages ? > > There is no need to run in 2nd kernel environment. > > > > # makedumpfile -g vmcoreinfo -x vmlinux > > > > > > Thanks > > Atsushi Kumagai > > > > > Thanks for this info, Atsushi. > > I was able to test makedumpfile-v1.5.1 on the 4 TB DL980 we had this > weekend, along with the kexec patch to invoke the memory array logic, > and I got encouraging results, in that the difference in dump time > between makedumpfile 1.4 on a RHEL 6.3 system and makedumpfile-v1.5.1 > with the memory array logic seems to be now very small: Thanks for your hard work, it's good results. According to your measurements on 256 GB and 4 TB, the difference in dump time may be about ten percent of the total time in any memory size. I think it's acceptable overhead costs to keep memory consumption and the mem_map array logic works as we expected. Thanks Atsushi Kumagai > Here are my results (file system timestamp data and note the system had > it's filesystem time set way in the past): > > > > 1. makedumpfile 1.4 (RHEL 6.3 default), crashkernel 512M: > > root@spb crash]# ls -al --time-style=full-iso > 127.0.0.1-2012-05-09-19:55:50^M > total 10757984^M > drwxr-xr-x. 2 root root 4096 2012-05-09 21:53:21.289507559 > -0600 .^M > drwxr-xr-x. 4 root root 4096 2012-05-09 22:10:08.729553037 > -0600 ..^M > -rw-------. 1 root root 11016160846 2012-05-09 21:53:21.020384817 -0600 > vmcore^ > > 21:53:21 - 19:55:50 > > Dump filter/copy time: 1 hour, 57 minutes, 29 seconds > > > 2. makedumpfile-v1.5.1, with kexec patch, using memory array logic, took > 3 dumps to see variations in times: > > ls -al --time-style=full-iso 127.0.0.1-2012-05-10-23:42:35^M > total 10444952^M > drwxr-xr-x. 2 root root 4096 2012-05-11 01:52:18.512639105 > -0600 .^M > drwxr-xr-x. 6 root root 4096 2012-05-10 23:42:39.270955565 > -0600 ..^M > -rw-------. 1 root root 10695618226 2012-05-11 01:52:18.479636812 -0600 > vmcore^M > > Dump filter/copy time: 2 hours, 9 minutes, 11 sec > > > 127.0.0.1-2012-05-12-20:57:08:^M > total 10469304^M > drwxr-xr-x. 2 root root 4096 2012-05-12 23:05:39.082084132 > -0600 .^M > drwxr-xr-x. 5 root root 4096 2012-05-12 20:57:12.627084279 > -0600 ..^M > -rw-------. 1 root root 10720553208 2012-05-12 23:05:39.051082490 -0600 > vmcore^M > > > Dump filter/copy time: 2 hours 8 minutes 26 seconds > > 27.0.0.1-2012-05-10-09:52:17:^M > total 10650776^M > drwxr-xr-x. 2 root root 4096 2012-05-10 12:04:22.456078284 > -0600 .^M > drwxr-xr-x. 6 root root 4096 2012-05-10 09:52:22.068605263 > -0600 ..^M > -rw-------. 1 root root 10906381384 2012-05-10 12:04:22.425076466 -0600 > vmcore > > Dump filter/copy time: 2 hours 13 minutes > > So the dump times seem to vary + or minus 2-3 minutes, and the average > was about 2 hours 10 minutes, or 10-12 minutes longer than the > makedumpfile 1.4 dump time for a 4 TB system, when using a crashkernel > constrained to 384 MB, and the cyclic buffer feature is used. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-12-21 6:23 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-16 8:15 [RFC] makedumpfile-1.5.1 RC Atsushi Kumagai 2012-11-20 12:14 ` Lisa Mitchell 2012-11-20 16:35 ` Vivek Goyal 2012-11-20 13:03 ` Lisa Mitchell 2012-11-20 21:46 ` Vivek Goyal 2012-11-20 19:05 ` Lisa Mitchell 2012-11-21 13:54 ` Vivek Goyal 2012-11-22 0:49 ` Hatayama, Daisuke 2012-11-26 16:02 ` Vivek Goyal 2012-12-04 13:31 ` Lisa Mitchell 2012-12-07 5:26 ` Atsushi Kumagai 2012-12-10 21:06 ` Lisa Mitchell 2012-12-13 5:06 ` Atsushi Kumagai 2012-12-18 17:20 ` Lisa Mitchell 2012-12-21 6:19 ` Atsushi Kumagai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox