* [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
@ 2012-06-29 2:13 Atsushi Kumagai
2012-06-29 2:16 ` [RFC PATCH v2 1/10] Add flag to enable cyclic processing Atsushi Kumagai
` (11 more replies)
0 siblings, 12 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-06-29 2:13 UTC (permalink / raw)
To: kexec
Hello,
I improved prototype of cyclic processing as version 2.
If there is no objection to basic idea, I want to consider the things
related to performance as next step. (Concretely, buffer size and the patch set
HATAYAMA-san sent a short time ago.)
Version 1:
http://lists.infradead.org/pipermail/kexec/2012-May/006363.html
Introduction:
- The purpose of cyclic processing is to fix memory consumption.
- cyclic processing doesn't use temporary bitmap file, store partial bitmap data
in memory only for each cycle instead.
- The prototype was passed the regression test which is used by every release.
How to use:
Specify '--cyclic' option, then makedumpfile works cyclically.
Example:
$ makedumpfile --cyclic -cd31 vmcore testdump.cd31
Copying data : [ 5 %]
Excluding free pages : [100 %]
...
Excluding free pages : [100 %]
Copying data : [100 %]
The dumpfile is saved to testdump.cd31.
makedumpfile Completed.
Changelog:
v1 => v2:
- Fix the process of increasing target region.
- Change method for kdump-compressed format.
- Add support for ELF format.
- Add support for --split option.
Memory consumption:
I measured the RSS of makedumpfile with ps(1) as memory consumption.
a. working for 5G memory:
| RSS [KB]
| no option | -cd31 | -Ed31
-----------+-------------+-------------+-------------
v1.4.4 | 1108 | 1184 | 868
cyclic | 2976 | 3252 | 2984
b. working for 8G memory:
| RSS [KB]
| no option | -cd31 | -Ed31
-----------+-------------+-------------+-------------
v1.4.4 | 1108 | 1180 | 864
cyclic | 2972 | 3256 | 2984
This result seems to say that v1.4.4 is better than cyclic, but the size
of temporary bitmap file grows based on memory size. The increasing rate
can be represented as (memory size / 4K / 8 ) * 2.
memory size [GB] | bitmap size [KB]
------------------+------------------
5 | 320
8 | 512
... | ...
1,024 | 65,536
Even above size will be counted as memory consumption, if the system
doesn't mount rootfs. This is the cause of the memory consumption issue
we discussed.
On the other hand, cyclic processing doesn't use temporary bitmap files,
all memory consumption will be appeared in RSS.
The memory consumption to store bitmap will be kept around 2MB(BUFSIZE_CYCLIC * 2).
Thanks
Atsushi Kumagai
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC PATCH v2 1/10] Add flag to enable cyclic processing.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
@ 2012-06-29 2:16 ` Atsushi Kumagai
2012-06-29 2:17 ` [RFC PATCH v2 2/10] Prepare partial bitmap for " Atsushi Kumagai
` (10 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-06-29 2:16 UTC (permalink / raw)
To: kexec
Introduce --cyclic option to enable cyclic processing. If --cyclic option is
specified, then makedumpfile works cyclically and the memory usage will be constant.
Usage:
# makedumpfile --cyclic /proc/vmcore dumpfile
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
---
makedumpfile.8 | 12 ++++++++++++
makedumpfile.c | 4 ++++
makedumpfile.h | 1 +
print_info.c | 9 +++++++++
4 files changed, 26 insertions(+)
diff --git a/makedumpfile.8 b/makedumpfile.8
index 1fb9756..6ab2599 100644
--- a/makedumpfile.8
+++ b/makedumpfile.8
@@ -346,6 +346,18 @@ on the following example.
.br
# makedumpfile \-\-reassemble dumpfile1 dumpfile2 dumpfile
+
+.TP
+\fB\-\-cyclic\fR
+Creating bitmaps and writing pages cyclically for constant target memory range.
+As a result, makedumpfile can works in constant memory space regardless of
+system memory size.
+This feature is \fITESTING VERSION\fR now.
+.br
+.B Example:
+.br
+# makedumpfile \-\-cyclic \-d 31 \-x vmlinux /proc/vmcore dumpfile
+
.TP
\fB\-\-xen-syms\fR \fIXEN-SYMS\fR
Specify the \fIXEN-SYMS\fR with debug information to analyze the xen's memory usage.
diff --git a/makedumpfile.c b/makedumpfile.c
index d024e95..87bd680 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -6945,6 +6945,7 @@ static struct option longopts[] = {
{"config", required_argument, NULL, 'C'},
{"help", no_argument, NULL, 'h'},
{"diskset", required_argument, NULL, 'k'},
+ {"cyclic", no_argument, NULL, 'Y'},
{0, 0, 0, 0}
};
@@ -7053,6 +7054,9 @@ main(int argc, char *argv[])
case 'y':
info->name_xen_syms = optarg;
break;
+ case 'Y':
+ info->flag_cyclic = TRUE;
+ break;
case 'z':
info->flag_read_vmcoreinfo = 1;
info->name_vmcoreinfo = optarg;
diff --git a/makedumpfile.h b/makedumpfile.h
index 6f5489d..287e055 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -801,6 +801,7 @@ struct DumpInfo {
int flag_rearrange; /* flag of creating dumpfile from
flattened format */
int flag_split; /* splitting vmcore */
+ int flag_cyclic; /* cyclic processing to keep memory consumption */
int flag_reassemble; /* reassemble multiple dumpfiles into one */
int flag_refiltering; /* refilter from kdump-compressed file */
int flag_force; /* overwrite existing stuff */
diff --git a/print_info.c b/print_info.c
index 61cafed..0d8a7b3 100644
--- a/print_info.c
+++ b/print_info.c
@@ -62,6 +62,10 @@ print_usage(void)
MSG("\n");
MSG(" Reassemble multiple DUMPFILEs:\n");
MSG(" # makedumpfile --reassemble DUMPFILE1 DUMPFILE2 [DUMPFILE3 ..] DUMPFILE\n");
+ MSG(" DUMPFILE2 [DUMPFILE3 ..]\n");
+ MSG("\n");
+ MSG(" Creating DUMPFILE with cyclic processing:\n");
+ MSG(" # makedumpfile --cyclic [OPTION] [-x VMLINUX|-i VMCOREINFO] VMCORE DUMPFILE\n");
MSG("\n");
MSG(" Generating VMCOREINFO:\n");
MSG(" # makedumpfile -g VMCOREINFO -x VMLINUX\n");
@@ -162,6 +166,11 @@ print_usage(void)
MSG(" Reassemble multiple DUMPFILEs, which are created by --split option,\n");
MSG(" into one DUMPFILE. dumpfile1 and dumpfile2 are reassembled into dumpfile.\n");
MSG("\n");
+ MSG(" [--cyclic]:\n");
+ MSG(" Creating bitmaps and writing pages for certain target memory range.\n");
+ MSG(" As a result, makedumpfile can works in constant memory space regardless of \n");
+ MSG(" system memory size. This feature is TESTING VERSION now.\n");
+ MSG("\n");
MSG(" [--xen-syms XEN-SYMS]:\n");
MSG(" Specify the XEN-SYMS to analyze Xen's memory usage.\n");
MSG("\n");
--
1.7.9.2
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [RFC PATCH v2 2/10] Prepare partial bitmap for cyclic processing.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
2012-06-29 2:16 ` [RFC PATCH v2 1/10] Add flag to enable cyclic processing Atsushi Kumagai
@ 2012-06-29 2:17 ` Atsushi Kumagai
2012-06-29 2:17 ` [RFC PATCH v2 3/10] Change the function related to excluding unnecessary pages Atsushi Kumagai
` (9 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-06-29 2:17 UTC (permalink / raw)
To: kexec
cyclic processing uses partial bitmap instead of temporary bitmap file.
partial bitmap is saved in memory only for each cycle.
This patch introduce partial bitmap and extend some accessor functions
to manage partial bitmap.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
---
makedumpfile.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++------
makedumpfile.h | 23 +++++++++++++
2 files changed, 113 insertions(+), 9 deletions(-)
diff --git a/makedumpfile.c b/makedumpfile.c
index 87bd680..9e77913 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -165,6 +165,7 @@ is_in_same_page(unsigned long vaddr1, unsigned long vaddr2)
#define BITMAP_SECT_LEN 4096
static inline int is_dumpable(struct dump_bitmap *, unsigned long long);
+static inline int is_dumpable_cyclic(char *bitmap, unsigned long long);
unsigned long
pfn_to_pos(unsigned long long pfn)
{
@@ -2719,6 +2720,12 @@ initialize_bitmap(struct dump_bitmap *bitmap)
}
void
+initialize_bitmap_cyclic(char *bitmap)
+{
+ memset(bitmap, 0, BUFSIZE_CYCLIC);
+}
+
+void
initialize_1st_bitmap(struct dump_bitmap *bitmap)
{
initialize_bitmap(bitmap);
@@ -2782,6 +2789,27 @@ set_bitmap(struct dump_bitmap *bitmap, unsigned long long pfn,
}
int
+set_bitmap_cyclic(char *bitmap, unsigned long long pfn, int val)
+{
+ int byte, bit;
+
+ if (pfn < info->cyclic_start_pfn || info->cyclic_end_pfn <= pfn)
+ return FALSE;
+
+ /*
+ * If val is 0, clear bit on the bitmap.
+ */
+ byte = (pfn - info->cyclic_start_pfn)>>3;
+ bit = (pfn - info->cyclic_start_pfn) & 7;
+ if (val)
+ bitmap[byte] |= 1<<bit;
+ else
+ bitmap[byte] &= ~(1<<bit);
+
+ return TRUE;
+}
+
+int
sync_bitmap(struct dump_bitmap *bitmap)
{
off_t offset;
@@ -2823,19 +2851,31 @@ sync_2nd_bitmap(void)
int
set_bit_on_1st_bitmap(unsigned long long pfn)
{
- return set_bitmap(info->bitmap1, pfn, 1);
+ if (info->flag_cyclic) {
+ return set_bitmap_cyclic(info->partial_bitmap1, pfn, 1);
+ } else {
+ return set_bitmap(info->bitmap1, pfn, 1);
+ }
}
int
clear_bit_on_1st_bitmap(unsigned long long pfn)
{
- return set_bitmap(info->bitmap1, pfn, 0);
+ if (info->flag_cyclic) {
+ return set_bitmap_cyclic(info->partial_bitmap1, pfn, 0);
+ } else {
+ return set_bitmap(info->bitmap1, pfn, 0);
+ }
}
int
clear_bit_on_2nd_bitmap(unsigned long long pfn)
{
- return set_bitmap(info->bitmap2, pfn, 0);
+ if (info->flag_cyclic) {
+ return set_bitmap_cyclic(info->partial_bitmap2, pfn, 0);
+ } else {
+ return set_bitmap(info->bitmap2, pfn, 0);
+ }
}
int
@@ -3914,6 +3954,38 @@ prepare_bitmap_buffer(void)
return TRUE;
}
+int
+prepare_bitmap_buffer_cyclic(void)
+{
+ unsigned long tmp;
+
+ /*
+ * Create 2 bitmaps (1st-bitmap & 2nd-bitmap) on block_size boundary.
+ * The crash utility requires both of them to be aligned to block_size
+ * boundary.
+ */
+ tmp = divideup(divideup(info->max_mapnr, BITPERBYTE), info->page_size);
+ info->len_bitmap = tmp*info->page_size*2;
+
+ /*
+ * Prepare partial bitmap buffers for cyclic processing.
+ */
+ if ((info->partial_bitmap1 = (char *)malloc(BUFSIZE_CYCLIC)) == NULL) {
+ ERRMSG("Can't allocate memory for the 1st-bitmap. %s\n",
+ strerror(errno));
+ return FALSE;
+ }
+ if ((info->partial_bitmap2 = (char *)malloc(BUFSIZE_CYCLIC)) == NULL) {
+ ERRMSG("Can't allocate memory for the 2nd-bitmap. %s\n",
+ strerror(errno));
+ return FALSE;
+ }
+ initialize_bitmap_cyclic(info->partial_bitmap1);
+ initialize_bitmap_cyclic(info->partial_bitmap2);
+
+ return TRUE;
+}
+
void
free_bitmap_buffer(void)
{
@@ -3934,14 +4006,19 @@ create_dump_bitmap(void)
{
int ret = FALSE;
- if (!prepare_bitmap_buffer())
- goto out;
+ if (info->flag_cyclic) {
+ if (!prepare_bitmap_buffer_cyclic())
+ goto out;
+ } else {
+ if (!prepare_bitmap_buffer())
+ goto out;
- if (!create_1st_bitmap())
- goto out;
+ if (!create_1st_bitmap())
+ goto out;
- if (!create_2nd_bitmap())
- goto out;
+ if (!create_2nd_bitmap())
+ goto out;
+ }
ret = TRUE;
out:
@@ -7203,6 +7280,10 @@ out:
free(info->splitting_info);
if (info->p2m_mfn_frame_list != NULL)
free(info->p2m_mfn_frame_list);
+ if (info->partial_bitmap1 != NULL)
+ free(info->partial_bitmap1);
+ if (info->partial_bitmap2 != NULL)
+ free(info->partial_bitmap2);
free(info);
}
free_elf_info();
diff --git a/makedumpfile.h b/makedumpfile.h
index 287e055..bf5dd43 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -173,6 +173,12 @@ isAnon(unsigned long mapping)
#define FILENAME_STDOUT "STDOUT"
/*
+ * For cyclic processing
+ */
+#define BUFSIZE_CYCLIC (1024 * 1024)
+#define PFN_CYCLIC (BUFSIZE_CYCLIC * BITPERBYTE)
+
+/*
* Minimam vmcore has 2 ProgramHeaderTables(PT_NOTE and PT_LOAD).
*/
#define MIN_ELF32_HEADER_SIZE \
@@ -926,6 +932,14 @@ struct DumpInfo {
unsigned long long split_end_pfn;
/*
+ * for cyclic processing
+ */
+ char *partial_bitmap1;
+ char *partial_bitmap2;
+ unsigned long long cyclic_start_pfn;
+ unsigned long long cyclic_end_pfn;
+
+ /*
* sadump info:
*/
int flag_sadump_diskset;
@@ -1395,6 +1409,15 @@ is_dumpable(struct dump_bitmap *bitmap, unsigned long long pfn)
}
static inline int
+is_dumpable_cyclic(char *bitmap, unsigned long long pfn)
+{
+ if (pfn < info->cyclic_start_pfn || info->cyclic_end_pfn <= pfn)
+ return FALSE;
+ else
+ return is_on(bitmap, pfn - info->cyclic_start_pfn);
+}
+
+static inline int
is_zero_page(unsigned char *buf, long page_size)
{
size_t i;
--
1.7.9.2
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [RFC PATCH v2 3/10] Change the function related to excluding unnecessary pages.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
2012-06-29 2:16 ` [RFC PATCH v2 1/10] Add flag to enable cyclic processing Atsushi Kumagai
2012-06-29 2:17 ` [RFC PATCH v2 2/10] Prepare partial bitmap for " Atsushi Kumagai
@ 2012-06-29 2:17 ` Atsushi Kumagai
2012-06-29 2:18 ` [RFC PATCH v2 4/10] Add function to update target region Atsushi Kumagai
` (8 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-06-29 2:17 UTC (permalink / raw)
To: kexec
Extend the function related to excluding unnecessary pages.
The function creates partial bitmap corresponding to range of target region
when cyclic flag is on.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
---
makedumpfile.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 75 insertions(+), 11 deletions(-)
diff --git a/makedumpfile.c b/makedumpfile.c
index 9e77913..e7c9f5e 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3264,9 +3264,9 @@ reset_bitmap_of_free_pages(unsigned long node_zones)
}
for (i = 0; i < (1<<order); i++) {
pfn = start_pfn + i;
- clear_bit_on_2nd_bitmap_for_kernel(pfn);
+ if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
+ found_free_pages++;
}
- found_free_pages += i;
previous = curr;
if (!readmem(VADDR, curr+OFFSET(list_head.next),
@@ -3300,7 +3300,7 @@ reset_bitmap_of_free_pages(unsigned long node_zones)
ERRMSG("Can't get free_pages.\n");
return FALSE;
}
- if (free_pages != found_free_pages) {
+ if (free_pages != found_free_pages && !info->flag_cyclic) {
/*
* On linux-2.6.21 or later, the number of free_pages is
* sometimes different from the one of the list "free_area",
@@ -3640,6 +3640,32 @@ create_1st_bitmap(void)
return TRUE;
}
+int
+create_1st_bitmap_cyclic()
+{
+ unsigned long long pfn, pfn_bitmap1;
+
+ /*
+ * At first, clear all the bits on the 1st-bitmap.
+ */
+ initialize_bitmap_cyclic(info->partial_bitmap1);
+
+ /*
+ * If page is on memory hole, set bit on the 1st-bitmap.
+ */
+ pfn_bitmap1 = 0;
+
+ for (pfn = info->cyclic_start_pfn; pfn <info->cyclic_end_pfn; pfn++) {
+ if (is_in_segs(pfn_to_paddr(pfn))) {
+ set_bit_on_1st_bitmap(pfn);
+ pfn_bitmap1++;
+ }
+ }
+ pfn_memhole -= pfn_bitmap1;
+
+ return TRUE;
+}
+
/*
* Exclude the page filled with zero in case of creating an elf dumpfile.
*/
@@ -3680,8 +3706,8 @@ exclude_zero_pages(void)
}
}
if (is_zero_page(buf, info->page_size)) {
- clear_bit_on_2nd_bitmap(pfn);
- pfn_zero++;
+ if (clear_bit_on_2nd_bitmap(pfn))
+ pfn_zero++;
}
}
@@ -3758,8 +3784,8 @@ __exclude_unnecessary_pages(unsigned long mem_map,
if ((info->dump_level & DL_EXCLUDE_CACHE)
&& (isLRU(flags) || isSwapCache(flags))
&& !isPrivate(flags) && !isAnon(mapping)) {
- clear_bit_on_2nd_bitmap_for_kernel(pfn);
- pfn_cache++;
+ if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
+ pfn_cache++;
}
/*
* Exclude the cache page with the private page.
@@ -3767,16 +3793,16 @@ __exclude_unnecessary_pages(unsigned long mem_map,
else if ((info->dump_level & DL_EXCLUDE_CACHE_PRI)
&& (isLRU(flags) || isSwapCache(flags))
&& !isAnon(mapping)) {
- clear_bit_on_2nd_bitmap_for_kernel(pfn);
- pfn_cache_private++;
+ if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
+ pfn_cache_private++;
}
/*
* Exclude the data page of the user process.
*/
else if ((info->dump_level & DL_EXCLUDE_USER_DATA)
&& isAnon(mapping)) {
- clear_bit_on_2nd_bitmap_for_kernel(pfn);
- pfn_user++;
+ if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
+ pfn_user++;
}
}
return TRUE;
@@ -3813,6 +3839,44 @@ exclude_unnecessary_pages(void)
return TRUE;
}
+void
+copy_bitmap_cyclic(void)
+{
+ memcpy(info->partial_bitmap2, info->partial_bitmap1, BUFSIZE_CYCLIC);
+}
+
+int
+exclude_unnecessary_pages_cyclic(void)
+{
+ unsigned int mm;
+ struct mem_map_data *mmd;
+
+ /*
+ * Copy 1st-bitmap to 2nd-bitmap.
+ */
+ copy_bitmap_cyclic();
+
+ if (info->dump_level & DL_EXCLUDE_FREE)
+ if (!exclude_free_page())
+ return FALSE;
+
+ for (mm = 0; mm < info->num_mem_map; mm++) {
+
+ mmd = &info->mem_map_data[mm];
+
+ if (mmd->mem_map == NOT_MEMMAP_ADDR)
+ continue;
+
+ if (mmd->pfn_end >= info->cyclic_start_pfn || mmd->pfn_start <= info->cyclic_end_pfn) {
+ if (!__exclude_unnecessary_pages(mmd->mem_map,
+ mmd->pfn_start, mmd->pfn_end))
+ return FALSE;
+ }
+ }
+
+ return TRUE;
+}
+
int
copy_bitmap(void)
{
--
1.7.9.2
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [RFC PATCH v2 4/10] Add function to update target region.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
` (2 preceding siblings ...)
2012-06-29 2:17 ` [RFC PATCH v2 3/10] Change the function related to excluding unnecessary pages Atsushi Kumagai
@ 2012-06-29 2:18 ` Atsushi Kumagai
2012-06-29 2:19 ` [RFC PATCH v2 5/10] Add function to get num_dumpable for cyclic processing Atsushi Kumagai
` (7 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-06-29 2:18 UTC (permalink / raw)
To: kexec
update_cyclic_region() updates target region with recalculating partial bitmap
when traverse target region.
This function helps cyclic processing to update partial bitmap for target region.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
---
makedumpfile.c | 21 +++++++++++++++++++++
makedumpfile.h | 9 +++++++++
2 files changed, 30 insertions(+)
diff --git a/makedumpfile.c b/makedumpfile.c
index e7c9f5e..67b3499 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3878,6 +3878,27 @@ exclude_unnecessary_pages_cyclic(void)
}
int
+update_cyclic_region(unsigned long long pfn)
+{
+ if (is_cyclic_region(pfn))
+ return TRUE;
+
+ info->cyclic_start_pfn = round(pfn, PFN_CYCLIC);
+ info->cyclic_end_pfn = info->cyclic_start_pfn + PFN_CYCLIC;
+
+ if (info->cyclic_end_pfn > info->max_mapnr)
+ info->cyclic_end_pfn = info->max_mapnr;
+
+ if (!create_1st_bitmap_cyclic())
+ return FALSE;
+
+ if (!exclude_unnecessary_pages_cyclic())
+ return FALSE;
+
+ return TRUE;
+}
+
+int
copy_bitmap(void)
{
off_t offset;
diff --git a/makedumpfile.h b/makedumpfile.h
index bf5dd43..1a49aa2 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -1418,6 +1418,15 @@ is_dumpable_cyclic(char *bitmap, unsigned long long pfn)
}
static inline int
+is_cyclic_region(unsigned long long pfn)
+{
+ if (pfn < info->cyclic_start_pfn || info->cyclic_end_pfn <= pfn)
+ return FALSE;
+ else
+ return TRUE;
+}
+
+static inline int
is_zero_page(unsigned char *buf, long page_size)
{
size_t i;
--
1.7.9.2
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [RFC PATCH v2 5/10] Add function to get num_dumpable for cyclic processing.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
` (3 preceding siblings ...)
2012-06-29 2:18 ` [RFC PATCH v2 4/10] Add function to update target region Atsushi Kumagai
@ 2012-06-29 2:19 ` Atsushi Kumagai
2012-06-29 2:21 ` [RFC PATCH v2 6/10] Implement the main routine of cyclic processing for kdump-compressed format Atsushi Kumagai
` (6 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-06-29 2:19 UTC (permalink / raw)
To: kexec
get_num_dumpable_cyclic() gets the number of dumpable pages with cyclic processing.
This function is used to get info->num_dumpable and it is necessary to decide
the offset of kdump-compressed page data.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
---
makedumpfile.c | 17 +++++++++++++++++
makedumpfile.h | 6 ++++++
2 files changed, 23 insertions(+)
diff --git a/makedumpfile.c b/makedumpfile.c
index 67b3499..30857e3 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -4094,6 +4094,8 @@ create_dump_bitmap(void)
if (info->flag_cyclic) {
if (!prepare_bitmap_buffer_cyclic())
goto out;
+
+ info->num_dumpable = get_num_dumpable_cyclic();
} else {
if (!prepare_bitmap_buffer())
goto out;
@@ -4555,6 +4557,21 @@ get_num_dumpable(void)
return num_dumpable;
}
+unsigned long long
+get_num_dumpable_cyclic(void)
+{
+ unsigned long long pfn, num_dumpable=0;
+
+ for (pfn = 0; pfn < info->max_mapnr; pfn++) {
+ if (!update_cyclic_region(pfn))
+ return FALSE;
+
+ if (is_dumpable_cyclic(info->partial_bitmap2, pfn))
+ num_dumpable++;
+ }
+
+ return num_dumpable;
+}
int
write_elf_load_segment(struct cache_data *cd_page, unsigned long long paddr,
diff --git a/makedumpfile.h b/makedumpfile.h
index 1a49aa2..e336814 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -938,6 +938,7 @@ struct DumpInfo {
char *partial_bitmap2;
unsigned long long cyclic_start_pfn;
unsigned long long cyclic_end_pfn;
+ unsigned long long num_dumpable;
/*
* sadump info:
@@ -1511,4 +1512,9 @@ struct elf_prstatus {
#endif
+/*
+ * Function Prototype.
+ */
+unsigned long long get_num_dumpable_cyclic(void);
+
#endif /* MAKEDUMPFILE_H */
--
1.7.9.2
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [RFC PATCH v2 6/10] Implement the main routine of cyclic processing for kdump-compressed format.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
` (4 preceding siblings ...)
2012-06-29 2:19 ` [RFC PATCH v2 5/10] Add function to get num_dumpable for cyclic processing Atsushi Kumagai
@ 2012-06-29 2:21 ` Atsushi Kumagai
2012-06-29 2:22 ` [RFC PATCH v2 7/10] Add function to get number of PT_LOAD for cyclic processing Atsushi Kumagai
` (5 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-06-29 2:21 UTC (permalink / raw)
To: kexec
Implement the function which write out kdump-compressed dumpfile cyclically.
The function is similar to current write_kdump_XXX(), but not use temporary bitmap file.
Instead, use partial bitmap with updating it each cycle by cycle.
As result, makedumpfile can work with constant memory consumption even in
large memory system.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
---
makedumpfile.c | 250 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 250 insertions(+)
diff --git a/makedumpfile.c b/makedumpfile.c
index 30857e3..25d857a 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -5061,6 +5061,154 @@ out:
return ret;
}
+int
+write_kdump_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page,
+ struct page_desc *pd_zero, off_t *offset_data)
+{
+ unsigned long long pfn, per;
+ unsigned long long start_pfn, end_pfn;
+ unsigned long size_out;
+ struct page_desc pd;
+ unsigned char buf[info->page_size], *buf_out = NULL;
+ unsigned long len_buf_out;
+ unsigned long long num_dumped=0;
+ struct timeval tv_start;
+ const off_t failed = (off_t)-1;
+
+ int ret = FALSE;
+
+ if (info->flag_elf_dumpfile)
+ return FALSE;
+
+#ifdef USELZO
+ unsigned long len_buf_out_zlib, len_buf_out_lzo;
+ lzo_bytep wrkmem;
+
+ if ((wrkmem = malloc(LZO1X_1_MEM_COMPRESS)) == NULL) {
+ ERRMSG("Can't allocate memory for the working memory. %s\n",
+ strerror(errno));
+ goto out;
+ }
+
+ len_buf_out_zlib = compressBound(info->page_size);
+ len_buf_out_lzo = info->page_size + info->page_size / 16 + 64 + 3;
+ len_buf_out = MAX(len_buf_out_zlib, len_buf_out_lzo);
+#else
+ len_buf_out = compressBound(info->page_size);
+#endif
+
+ if ((buf_out = malloc(len_buf_out)) == NULL) {
+ ERRMSG("Can't allocate memory for the compression buffer. %s\n",
+ strerror(errno));
+ goto out;
+ }
+
+ per = info->num_dumpable / 100;
+
+ /*
+ * Set a fileoffset of Physical Address 0x0.
+ */
+ if (lseek(info->fd_memory, get_offset_pt_load_memory(), SEEK_SET)
+ == failed) {
+ ERRMSG("Can't seek the dump memory(%s). %s\n",
+ info->name_memory, strerror(errno));
+ goto out;
+ }
+
+ gettimeofday(&tv_start, NULL);
+
+ start_pfn = info->cyclic_start_pfn;
+ end_pfn = info->cyclic_end_pfn;
+
+ for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+
+ if ((num_dumped % per) == 0)
+ print_progress(PROGRESS_COPY, num_dumped, info->num_dumpable);
+
+ /*
+ * Check the excluded page.
+ */
+ if (!is_dumpable_cyclic(info->partial_bitmap2, pfn))
+ continue;
+
+ num_dumped++;
+
+ if (!read_pfn(pfn, buf))
+ goto out;
+
+ /*
+ * Exclude the page filled with zeros.
+ */
+ if ((info->dump_level & DL_EXCLUDE_ZERO)
+ && is_zero_page(buf, info->page_size)) {
+ if (!write_cache(cd_header, pd_zero, sizeof(page_desc_t)))
+ goto out;
+ pfn_zero++;
+ continue;
+ }
+ /*
+ * Compress the page data.
+ */
+ size_out = len_buf_out;
+ if ((info->flag_compress & DUMP_DH_COMPRESSED_ZLIB)
+ && ((size_out = len_buf_out),
+ compress2(buf_out, &size_out, buf, info->page_size,
+ Z_BEST_SPEED) == Z_OK)
+ && (size_out < info->page_size)) {
+ pd.flags = DUMP_DH_COMPRESSED_ZLIB;
+ pd.size = size_out;
+ memcpy(buf, buf_out, pd.size);
+#ifdef USELZO
+ } else if (info->flag_lzo_support
+ && (info->flag_compress & DUMP_DH_COMPRESSED_LZO)
+ && ((size_out = info->page_size),
+ lzo1x_1_compress(buf, info->page_size, buf_out,
+ &size_out, wrkmem) == LZO_E_OK)
+ && (size_out < info->page_size)) {
+ pd.flags = DUMP_DH_COMPRESSED_LZO;
+ pd.size = size_out;
+ memcpy(buf, buf_out, pd.size);
+#endif
+ } else {
+ pd.flags = 0;
+ pd.size = info->page_size;
+ }
+ pd.page_flags = 0;
+ pd.offset = *offset_data;
+ *offset_data += pd.size;
+
+ /*
+ * Write the page header.
+ */
+ if (!write_cache(cd_header, &pd, sizeof(page_desc_t)))
+ goto out;
+
+ /*
+ * Write the page data.
+ */
+ if (!write_cache(cd_page, buf, pd.size))
+ goto out;
+ }
+
+ /*
+ * print [100 %]
+ */
+ print_progress(PROGRESS_COPY, num_dumped, info->num_dumpable);
+ print_execution_time(PROGRESS_COPY, &tv_start);
+ PROGRESS_MSG("\n");
+
+ ret = TRUE;
+out:
+ if (buf_out != NULL)
+ free(buf_out);
+#ifdef USELZO
+ if (wrkmem != NULL)
+ free(wrkmem);
+#endif
+
+ return ret;
+}
+
/*
* Copy eraseinfo from input dumpfile/vmcore to output dumpfile.
*/
@@ -5330,6 +5478,101 @@ out:
return ret;
}
+int
+write_kdump_bitmap_cyclic(void)
+{
+ off_t offset;
+ int increment;
+ int ret = FALSE;
+
+ increment = divideup(info->cyclic_end_pfn - info->cyclic_start_pfn, BITPERBYTE);
+
+ if (info->flag_elf_dumpfile)
+ return FALSE;
+
+ offset = info->offset_bitmap1;
+ if (!write_buffer(info->fd_dumpfile, offset,
+ info->partial_bitmap1, increment, info->name_dumpfile))
+ goto out;
+
+ offset += info->len_bitmap / 2;
+ if (!write_buffer(info->fd_dumpfile, offset,
+ info->partial_bitmap2, increment, info->name_dumpfile))
+ goto out;
+
+ info->offset_bitmap1 += increment;
+
+ ret = TRUE;
+out:
+
+ return ret;
+}
+
+int
+write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
+{
+ struct page_desc pd_zero;
+ off_t offset_data=0;
+ struct disk_dump_header *dh = info->dump_header;
+ unsigned char buf[info->page_size];
+ unsigned long long pfn;
+
+ /*
+ * Reset counter for debug message.
+ */
+ pfn_zero = pfn_cache = pfn_cache_private = pfn_user = pfn_free = 0;
+ pfn_memhole = info->max_mapnr;
+
+ cd_header->offset
+ = (DISKDUMP_HEADER_BLOCKS + dh->sub_hdr_size + dh->bitmap_blocks)
+ * dh->block_size;
+ cd_page->offset = cd_header->offset + sizeof(page_desc_t)*info->num_dumpable;
+ offset_data = cd_page->offset;
+
+ /*
+ * Write the data of zero-filled page.
+ */
+ if (info->dump_level & DL_EXCLUDE_ZERO) {
+ pd_zero.size = info->page_size;
+ pd_zero.flags = 0;
+ pd_zero.offset = offset_data;
+ pd_zero.page_flags = 0;
+ memset(buf, 0, pd_zero.size);
+ if (!write_cache(cd_page, buf, pd_zero.size))
+ return FALSE;
+ offset_data += pd_zero.size;
+ }
+
+ /*
+ * Write pages and bitmap cyclically.
+ */
+ info->cyclic_start_pfn = 0;
+ info->cyclic_end_pfn = 0;
+ for (pfn = 0; pfn < info->max_mapnr; pfn++) {
+ if (is_cyclic_region(pfn))
+ continue;
+
+ if (!update_cyclic_region(pfn))
+ return FALSE;
+
+ if (!write_kdump_pages_cyclic(cd_header, cd_page, &pd_zero, &offset_data))
+ return FALSE;
+
+ if (!write_kdump_bitmap_cyclic())
+ return FALSE;
+ }
+
+ /*
+ * Write the remainder.
+ */
+ if (!write_cache_bufsz(cd_page))
+ return FALSE;
+ if (!write_cache_bufsz(cd_header))
+ return FALSE;
+
+ return TRUE;
+}
+
void
close_vmcoreinfo(void)
{
@@ -6107,6 +6350,13 @@ writeout_dumpfile(void)
goto out;
if (!write_elf_eraseinfo(&cd_header))
goto out;
+ } else if (info->flag_cyclic) {
+ if (!write_kdump_header())
+ goto out;
+ if (!write_kdump_pages_and_bitmap_cyclic(&cd_header, &cd_page))
+ goto out;
+ if (!write_kdump_eraseinfo(&cd_page))
+ goto out;
} else {
if (!write_kdump_header())
goto out;
--
1.7.9.2
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [RFC PATCH v2 7/10] Add function to get number of PT_LOAD for cyclic processing.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
` (5 preceding siblings ...)
2012-06-29 2:21 ` [RFC PATCH v2 6/10] Implement the main routine of cyclic processing for kdump-compressed format Atsushi Kumagai
@ 2012-06-29 2:22 ` Atsushi Kumagai
2012-06-29 2:23 ` [RFC PATCH v2 8/10] Implement the main routine of cyclic processing for ELF format Atsushi Kumagai
` (4 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-06-29 2:22 UTC (permalink / raw)
To: kexec
get_loads_dumpfile_cyclic() gets the final number of PT_LOAD with cyclic processing.
This function is necessary to decide the offset of PT_LOAD for writing ELF format.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
---
makedumpfile.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
makedumpfile.h | 2 ++
2 files changed, 86 insertions(+)
diff --git a/makedumpfile.c b/makedumpfile.c
index 25d857a..420f103 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -4875,6 +4875,90 @@ read_pfn(unsigned long long pfn, unsigned char *buf)
}
int
+get_loads_dumpfile_cyclic(void)
+{
+ int i, phnum, num_new_load = 0;
+ long page_size = info->page_size;
+ unsigned char buf[info->page_size];
+ unsigned long long pfn, pfn_start, pfn_end, num_excluded;
+ unsigned long frac_head, frac_tail;
+ Elf64_Phdr load;
+
+ /*
+ * Initialize target reggion and bitmap.
+ */
+ info->cyclic_start_pfn = 0;
+ info->cyclic_end_pfn = PFN_CYCLIC;
+ if (!create_1st_bitmap_cyclic())
+ return FALSE;
+ if (!exclude_unnecessary_pages_cyclic())
+ return FALSE;
+
+ if (!(phnum = get_phnum_memory()))
+ return FALSE;
+
+ for (i = 0; i < phnum; i++) {
+ if (!get_phdr_memory(i, &load))
+ return FALSE;
+ if (load.p_type != PT_LOAD)
+ continue;
+
+ pfn_start = paddr_to_pfn(load.p_paddr);
+ pfn_end = paddr_to_pfn(load.p_paddr + load.p_memsz);
+ frac_head = page_size - (load.p_paddr % page_size);
+ frac_tail = (load.p_paddr + load.p_memsz) % page_size;
+
+ num_new_load++;
+ num_excluded = 0;
+
+ if (frac_head && (frac_head != page_size))
+ pfn_start++;
+ if (frac_tail)
+ pfn_end++;
+
+ for (pfn = pfn_start; pfn < pfn_end; pfn++) {
+ /*
+ * Update target region and bitmap
+ */
+ if (!is_cyclic_region(pfn)) {
+ if (!update_cyclic_region(pfn))
+ return FALSE;
+ }
+
+ if (!is_dumpable_cyclic(info->partial_bitmap2, pfn)) {
+ num_excluded++;
+ continue;
+ }
+
+ /*
+ * Exclude zero pages.
+ */
+ if (info->dump_level & DL_EXCLUDE_ZERO) {
+ if (!read_pfn(pfn, buf))
+ return FALSE;
+ if (is_zero_page(buf, page_size)) {
+ num_excluded++;
+ continue;
+ }
+ }
+
+ info->num_dumpable++;
+
+ /*
+ * If the number of the contiguous pages to be excluded
+ * is 256 or more, those pages are excluded really.
+ * And a new PT_LOAD segment is created.
+ */
+ if (num_excluded >= PFN_EXCLUDED) {
+ num_new_load++;
+ }
+ num_excluded = 0;
+ }
+ }
+ return num_new_load;
+}
+
+int
write_kdump_pages(struct cache_data *cd_header, struct cache_data *cd_page)
{
unsigned long long pfn, per, num_dumpable, num_dumped = 0;
diff --git a/makedumpfile.h b/makedumpfile.h
index e336814..77b824e 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -1516,5 +1516,7 @@ struct elf_prstatus {
* Function Prototype.
*/
unsigned long long get_num_dumpable_cyclic(void);
+int get_loads_dumpfile_cyclic(void);
+
#endif /* MAKEDUMPFILE_H */
--
1.7.9.2
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [RFC PATCH v2 8/10] Implement the main routine of cyclic processing for ELF format.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
` (6 preceding siblings ...)
2012-06-29 2:22 ` [RFC PATCH v2 7/10] Add function to get number of PT_LOAD for cyclic processing Atsushi Kumagai
@ 2012-06-29 2:23 ` Atsushi Kumagai
2012-06-29 2:24 ` [RFC PATCH v2 9/10] Enabling --split option with cyclic processing Atsushi Kumagai
` (3 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-06-29 2:23 UTC (permalink / raw)
To: kexec
Implement the function which write out ELF dumpfile cyclically.
The basic idea is same as the routine for kdump-compressed format.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
---
makedumpfile.c | 243 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 238 insertions(+), 5 deletions(-)
diff --git a/makedumpfile.c b/makedumpfile.c
index 420f103..5670bcf 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -4290,9 +4290,16 @@ write_elf_header(struct cache_data *cd_header)
/*
* Get the PT_LOAD number of the dumpfile.
*/
- if (!(num_loads_dumpfile = get_loads_dumpfile())) {
- ERRMSG("Can't get a number of PT_LOAD.\n");
- goto out;
+ if (info->flag_cyclic) {
+ if (!(num_loads_dumpfile = get_loads_dumpfile_cyclic())) {
+ ERRMSG("Can't get a number of PT_LOAD.\n");
+ goto out;
+ }
+ } else {
+ if (!(num_loads_dumpfile = get_loads_dumpfile())) {
+ ERRMSG("Can't get a number of PT_LOAD.\n");
+ goto out;
+ }
}
if (is_elf64_memory()) { /* ELF64 */
@@ -4959,6 +4966,227 @@ get_loads_dumpfile_cyclic(void)
}
int
+write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
+{
+ int i, phnum;
+ long page_size = info->page_size;
+ unsigned char buf[info->page_size];
+ unsigned long long pfn, pfn_start, pfn_end, paddr, num_excluded;
+ unsigned long long num_dumpable, per, num_dumped=0;
+ unsigned long long memsz, filesz;
+ unsigned long frac_head, frac_tail;
+ off_t off_seg_load, off_memory;
+ Elf64_Phdr load;
+ struct timeval tv_start;
+
+ if (!info->flag_elf_dumpfile)
+ return FALSE;
+
+ num_dumpable = info->num_dumpable;
+ per = num_dumpable / 100;
+
+ off_seg_load = info->offset_load_dumpfile;
+ cd_page->offset = info->offset_load_dumpfile;
+
+ /*
+ * Reset counter for debug message.
+ */
+ pfn_zero = pfn_cache = pfn_cache_private = pfn_user = pfn_free = 0;
+ pfn_memhole = info->max_mapnr;
+
+ info->cyclic_start_pfn = 0;
+ info->cyclic_end_pfn = 0;
+ if (!update_cyclic_region(0))
+ return FALSE;
+
+ if (!(phnum = get_phnum_memory()))
+ return FALSE;
+
+ gettimeofday(&tv_start, NULL);
+
+ for (i = 0; i < phnum; i++) {
+ if (!get_phdr_memory(i, &load))
+ return FALSE;
+
+ if (load.p_type != PT_LOAD)
+ continue;
+
+ off_memory= load.p_offset;
+ paddr = load.p_paddr;
+ pfn_start = paddr_to_pfn(load.p_paddr);
+ pfn_end = paddr_to_pfn(load.p_paddr + load.p_memsz);
+ frac_head = page_size - (load.p_paddr % page_size);
+ frac_tail = (load.p_paddr + load.p_memsz)%page_size;
+
+ num_excluded = 0;
+ memsz = 0;
+ filesz = 0;
+ if (frac_head && (frac_head != page_size)) {
+ memsz = frac_head;
+ filesz = frac_head;
+ pfn_start++;
+ }
+
+ if (frac_tail)
+ pfn_end++;
+
+ for (pfn = pfn_start; pfn < pfn_end; pfn++) {
+ /*
+ * Update target region and partial bitmap if necessary.
+ */
+ if (!update_cyclic_region(pfn))
+ return FALSE;
+
+ if (!is_dumpable_cyclic(info->partial_bitmap2, pfn)) {
+ num_excluded++;
+ if ((pfn == pfn_end - 1) && frac_tail)
+ memsz += frac_tail;
+ else
+ memsz += page_size;
+ continue;
+ }
+
+ /*
+ * Exclude zero pages.
+ */
+ if (info->dump_level & DL_EXCLUDE_ZERO) {
+ if (!read_pfn(pfn, buf))
+ return FALSE;
+ if (is_zero_page(buf, page_size)) {
+ pfn_zero++;
+ num_excluded++;
+ if ((pfn == pfn_end - 1) && frac_tail)
+ memsz += frac_tail;
+ else
+ memsz += page_size;
+ continue;
+ }
+ }
+
+ if ((num_dumped % per) == 0)
+ print_progress(PROGRESS_COPY, num_dumped, num_dumpable);
+
+ num_dumped++;
+
+ /*
+ * The dumpable pages are continuous.
+ */
+ if (!num_excluded) {
+ if ((pfn == pfn_end - 1) && frac_tail) {
+ memsz += frac_tail;
+ filesz += frac_tail;
+ } else {
+ memsz += page_size;
+ filesz += page_size;
+ }
+ continue;
+ /*
+ * If the number of the contiguous pages to be excluded
+ * is 255 or less, those pages are not excluded.
+ */
+ } else if (num_excluded < PFN_EXCLUDED) {
+ if ((pfn == pfn_end - 1) && frac_tail) {
+ memsz += frac_tail;
+ filesz += (page_size*num_excluded
+ + frac_tail);
+ }else {
+ memsz += page_size;
+ filesz += (page_size*num_excluded
+ + page_size);
+ }
+ num_excluded = 0;
+ continue;
+ }
+
+ /*
+ * If the number of the contiguous pages to be excluded
+ * is 256 or more, those pages are excluded really.
+ * And a new PT_LOAD segment is created.
+ */
+ load.p_memsz = memsz;
+ load.p_filesz = filesz;
+ if (load.p_filesz)
+ load.p_offset = off_seg_load;
+ else
+ /*
+ * If PT_LOAD segment does not have real data
+ * due to the all excluded pages, the file
+ * offset is not effective and it should be 0.
+ */
+ load.p_offset = 0;
+
+ /*
+ * Write a PT_LOAD header.
+ */
+ if (!write_elf_phdr(cd_header, &load))
+ return FALSE;
+
+ /*
+ * Write a PT_LOAD segment.
+ */
+ if (load.p_filesz)
+ if (!write_elf_load_segment(cd_page, paddr,
+ off_memory, load.p_filesz))
+ return FALSE;
+
+ load.p_paddr += load.p_memsz;
+#ifdef __x86__
+ /*
+ * FIXME:
+ * (x86) Fill PT_LOAD headers with appropriate
+ * virtual addresses.
+ */
+ if (load.p_paddr < MAXMEM)
+ load.p_vaddr += load.p_memsz;
+#else
+ load.p_vaddr += load.p_memsz;
+#endif /* x86 */
+ paddr = load.p_paddr;
+ off_seg_load += load.p_filesz;
+
+ num_excluded = 0;
+ memsz = page_size;
+ filesz = page_size;
+ }
+ /*
+ * Write the last PT_LOAD.
+ */
+ load.p_memsz = memsz;
+ load.p_filesz = filesz;
+ load.p_offset = off_seg_load;
+
+ /*
+ * Write a PT_LOAD header.
+ */
+ if (!write_elf_phdr(cd_header, &load))
+ return FALSE;
+
+ /*
+ * Write a PT_LOAD segment.
+ */
+ if (load.p_filesz)
+ if (!write_elf_load_segment(cd_page, paddr,
+ off_memory, load.p_filesz))
+ return FALSE;
+
+ off_seg_load += load.p_filesz;
+ }
+ if (!write_cache_bufsz(cd_header))
+ return FALSE;
+ if (!write_cache_bufsz(cd_page))
+ return FALSE;
+
+ /*
+ * print [100 %]
+ */
+ print_progress(PROGRESS_COPY, num_dumpable, num_dumpable);
+ print_execution_time(PROGRESS_COPY, &tv_start);
+ PROGRESS_MSG("\n");
+
+ return TRUE;
+}
+
+int
write_kdump_pages(struct cache_data *cd_header, struct cache_data *cd_page)
{
unsigned long long pfn, per, num_dumpable, num_dumped = 0;
@@ -6430,8 +6658,13 @@ writeout_dumpfile(void)
if (info->flag_elf_dumpfile) {
if (!write_elf_header(&cd_header))
goto out;
- if (!write_elf_pages(&cd_header, &cd_page))
- goto out;
+ if (info->flag_cyclic) {
+ if (!write_elf_pages_cyclic(&cd_header, &cd_page))
+ goto out;
+ } else {
+ if (!write_elf_pages(&cd_header, &cd_page))
+ goto out;
+ }
if (!write_elf_eraseinfo(&cd_header))
goto out;
} else if (info->flag_cyclic) {
--
1.7.9.2
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [RFC PATCH v2 9/10] Enabling --split option with cyclic processing.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
` (7 preceding siblings ...)
2012-06-29 2:23 ` [RFC PATCH v2 8/10] Implement the main routine of cyclic processing for ELF format Atsushi Kumagai
@ 2012-06-29 2:24 ` Atsushi Kumagai
2012-06-29 2:25 ` [RFC PATCH v2 10/10] Change num_dumped value to global for debug messages Atsushi Kumagai
` (2 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-06-29 2:24 UTC (permalink / raw)
To: kexec
This patch enable cyclic processing to split the dump data to multiple dumpfiles.
At this time, the buffer of partial bitmap is prepared for each child process,
you need to consider the amount of memory consumption.
This patch split the dump data based on only max_mapnr and num_dumpfile, so the size
of each splitted dumpfiles will be different by excluding unnecessary pages.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
---
makedumpfile.c | 42 +++++++++++++++++++++++++++++-------------
1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/makedumpfile.c b/makedumpfile.c
index 5670bcf..be0e15e 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -5432,6 +5432,13 @@ write_kdump_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_pag
start_pfn = info->cyclic_start_pfn;
end_pfn = info->cyclic_end_pfn;
+ if (info->flag_split) {
+ if (start_pfn < info->split_start_pfn)
+ start_pfn = info->split_start_pfn;
+ if (end_pfn > info->split_end_pfn)
+ end_pfn = info->split_end_pfn;
+ }
+
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
if ((num_dumped % per) == 0)
@@ -6714,22 +6721,31 @@ setup_splitting(void)
if (info->num_dumpfile <= 1)
return FALSE;
- initialize_2nd_bitmap(&bitmap2);
+ if (info->flag_cyclic) {
+ for (i = 0; i < info->num_dumpfile; i++) {
+ SPLITTING_START_PFN(i) = divideup(info->max_mapnr, info->num_dumpfile) * i;
+ SPLITTING_END_PFN(i) = divideup(info->max_mapnr, info->num_dumpfile) * (i + 1);
+ }
+ if (SPLITTING_END_PFN(i-1) > info->max_mapnr)
+ SPLITTING_END_PFN(i-1) = info->max_mapnr;
+ } else {
+ initialize_2nd_bitmap(&bitmap2);
- pfn_per_dumpfile = num_dumpable / info->num_dumpfile;
- start_pfn = end_pfn = 0;
- for (i = 0; i < info->num_dumpfile; i++) {
- start_pfn = end_pfn;
- if (i == (info->num_dumpfile - 1)) {
- end_pfn = info->max_mapnr;
- } else {
- for (j = 0; j < pfn_per_dumpfile; end_pfn++) {
- if (is_dumpable(&bitmap2, end_pfn))
- j++;
+ pfn_per_dumpfile = num_dumpable / info->num_dumpfile;
+ start_pfn = end_pfn = 0;
+ for (i = 0; i < info->num_dumpfile; i++) {
+ start_pfn = end_pfn;
+ if (i == (info->num_dumpfile - 1)) {
+ end_pfn = info->max_mapnr;
+ } else {
+ for (j = 0; j < pfn_per_dumpfile; end_pfn++) {
+ if (is_dumpable(&bitmap2, end_pfn))
+ j++;
+ }
}
+ SPLITTING_START_PFN(i) = start_pfn;
+ SPLITTING_END_PFN(i) = end_pfn;
}
- SPLITTING_START_PFN(i) = start_pfn;
- SPLITTING_END_PFN(i) = end_pfn;
}
return TRUE;
--
1.7.9.2
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [RFC PATCH v2 10/10] Change num_dumped value to global for debug messages.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
` (8 preceding siblings ...)
2012-06-29 2:24 ` [RFC PATCH v2 9/10] Enabling --split option with cyclic processing Atsushi Kumagai
@ 2012-06-29 2:25 ` Atsushi Kumagai
2012-07-02 12:39 ` [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Vivek Goyal
2012-08-06 20:47 ` Vivek Goyal
11 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-06-29 2:25 UTC (permalink / raw)
To: kexec
num_dumped value is used to count the number of dumped pages.
The value is defined as local value and doesn't work correctly for cyclic
processing. (Because the value will be initialized by each cycle.)
This patch fix the above issue.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
---
makedumpfile.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/makedumpfile.c b/makedumpfile.c
index be0e15e..67e3727 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -44,6 +44,8 @@ unsigned long long pfn_cache_private;
unsigned long long pfn_user;
unsigned long long pfn_free;
+unsigned long long num_dumped;
+
int retcd = FAILED; /* return code */
#define INITIALIZE_LONG_TABLE(table, value) \
@@ -4627,7 +4629,7 @@ write_elf_pages(struct cache_data *cd_header, struct cache_data *cd_page)
int i, phnum;
long page_size = info->page_size;
unsigned long long pfn, pfn_start, pfn_end, paddr, num_excluded;
- unsigned long long num_dumpable, num_dumped = 0, per;
+ unsigned long long num_dumpable, per;
unsigned long long memsz, filesz;
unsigned long frac_head, frac_tail;
off_t off_seg_load, off_memory;
@@ -4972,7 +4974,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
long page_size = info->page_size;
unsigned char buf[info->page_size];
unsigned long long pfn, pfn_start, pfn_end, paddr, num_excluded;
- unsigned long long num_dumpable, per, num_dumped=0;
+ unsigned long long num_dumpable, per;
unsigned long long memsz, filesz;
unsigned long frac_head, frac_tail;
off_t off_seg_load, off_memory;
@@ -5189,7 +5191,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
int
write_kdump_pages(struct cache_data *cd_header, struct cache_data *cd_page)
{
- unsigned long long pfn, per, num_dumpable, num_dumped = 0;
+ unsigned long long pfn, per, num_dumpable;
unsigned long long start_pfn, end_pfn;
unsigned long size_out;
struct page_desc pd, pd_zero;
@@ -5383,7 +5385,6 @@ write_kdump_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_pag
struct page_desc pd;
unsigned char buf[info->page_size], *buf_out = NULL;
unsigned long len_buf_out;
- unsigned long long num_dumped=0;
struct timeval tv_start;
const off_t failed = (off_t)-1;
@@ -7279,7 +7280,7 @@ reassemble_kdump_pages(void)
off_t offset_first_ph, offset_ph_org, offset_eraseinfo;
off_t offset_data_new, offset_zero_page = 0;
unsigned long long pfn, start_pfn, end_pfn;
- unsigned long long num_dumpable, num_dumped;
+ unsigned long long num_dumpable;
unsigned long size_eraseinfo;
struct dump_bitmap bitmap2;
struct disk_dump_header dh;
--
1.7.9.2
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
` (9 preceding siblings ...)
2012-06-29 2:25 ` [RFC PATCH v2 10/10] Change num_dumped value to global for debug messages Atsushi Kumagai
@ 2012-07-02 12:39 ` Vivek Goyal
2012-07-04 5:54 ` Atsushi Kumagai
2012-08-06 20:47 ` Vivek Goyal
11 siblings, 1 reply; 34+ messages in thread
From: Vivek Goyal @ 2012-07-02 12:39 UTC (permalink / raw)
To: Atsushi Kumagai; +Cc: kexec
On Fri, Jun 29, 2012 at 11:13:20AM +0900, Atsushi Kumagai wrote:
> Hello,
>
> I improved prototype of cyclic processing as version 2.
> If there is no objection to basic idea, I want to consider the things
> related to performance as next step. (Concretely, buffer size and the patch set
> HATAYAMA-san sent a short time ago.)
>
Hi Atushi san,
Good to see this work making progress. I have few queries.
- Do you have some numbers for bigger machines like 1TB or higher memory.
I am curious to know how bad is the time penalty.
- Will this work with option -F (flattned format). Often people save
filtered dump over ssh and we need to make sure it does work with -F
option too.
> > Version 1:
>
> http://lists.infradead.org/pipermail/kexec/2012-May/006363.html
- I have few queries about the diagram in the link above.
- What is 1st cycle, 2nd cycle and 3rd cycle. Are we cycling thorough
all the pages 3 times for everything?
- What is 1st bitmap and 2nd bitmap and page_header? And why 3 cycles for
each.
- And why 3 cycles for page_data.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-07-02 12:39 ` [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Vivek Goyal
@ 2012-07-04 5:54 ` Atsushi Kumagai
2012-07-04 8:52 ` HATAYAMA Daisuke
0 siblings, 1 reply; 34+ messages in thread
From: Atsushi Kumagai @ 2012-07-04 5:54 UTC (permalink / raw)
To: vgoyal; +Cc: kexec
Hello Vivek,
On Mon, 2 Jul 2012 08:39:05 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Jun 29, 2012 at 11:13:20AM +0900, Atsushi Kumagai wrote:
> > Hello,
> >
> > I improved prototype of cyclic processing as version 2.
> > If there is no objection to basic idea, I want to consider the things
> > related to performance as next step. (Concretely, buffer size and the patch set
> > HATAYAMA-san sent a short time ago.)
> >
>
> Hi Atushi san,
>
> Good to see this work making progress. I have few queries.
>
> - Do you have some numbers for bigger machines like 1TB or higher memory.
> I am curious to know how bad is the time penalty.
I'm afraid that I don't have such a large machine, so I need someone who can
measure execution time in large machine.
> - Will this work with option -F (flattned format). Often people save
> filtered dump over ssh and we need to make sure it does work with -F
> option too.
Yes, the cyclic processing supports flattened format too.
> > > Version 1:
> >
> > http://lists.infradead.org/pipermail/kexec/2012-May/006363.html
>
> - I have few queries about the diagram in the link above.
>
> - What is 1st cycle, 2nd cycle and 3rd cycle. Are we cycling thorough
> all the pages 3 times for everything?
First, "3 times" is only for example. Practically, the number of cycle is
determined based on system memory size and BUFSIZE_CYCLIC:
number of cycle = memory size / page size(4k) / bit per byte(8) / BUFSIZE_CYCLIC
The beginning, the cause of the issue we discussed is to save the analytical
data(called bitmap) for whole memory at a time. The bitmap size increases
linearly based on system memory size, this issue will be clearly in large system.
Therefore, we considered cyclic processing to work in constant memory space.
In cyclic processing mode, makedumpfile reads a constant region of memory,
analyzes it, and writes pages to dumpfile repeatedly from start of memory to the end.
We called the processing for a constant region "one cycle".
Each cycle creates the partial bitmap only for a constant region, so bitmap size
will be kept constantly regardless of system memory size.
> - What is 1st bitmap and 2nd bitmap and page_header? And why 3 cycles for
> each.
>
> - And why 3 cycles for page_data.
The kdump compressed format was described by Ohmichi-san, please see below.
http://www.redhat.com/archives/crash-utility/2008-August/msg00001.html
1st bitmap, 2nd bitmap, page_header and page_data correspond to each constant region,
so it is necessary to write them at the same cycle.
Thanks
Atsushi Kumagai
> Thanks
> Vivek
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-07-04 5:54 ` Atsushi Kumagai
@ 2012-07-04 8:52 ` HATAYAMA Daisuke
2012-07-11 5:23 ` HATAYAMA Daisuke
0 siblings, 1 reply; 34+ messages in thread
From: HATAYAMA Daisuke @ 2012-07-04 8:52 UTC (permalink / raw)
To: kumagai-atsushi; +Cc: kexec, vgoyal
From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
Date: Wed, 4 Jul 2012 14:54:06 +0900
> Hello Vivek,
>
> On Mon, 2 Jul 2012 08:39:05 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
>
>> On Fri, Jun 29, 2012 at 11:13:20AM +0900, Atsushi Kumagai wrote:
>> > Hello,
>> >
>> > I improved prototype of cyclic processing as version 2.
>> > If there is no objection to basic idea, I want to consider the things
>> > related to performance as next step. (Concretely, buffer size and the patch set
>> > HATAYAMA-san sent a short time ago.)
>> >
>>
>> Hi Atushi san,
>>
>> Good to see this work making progress. I have few queries.
>>
>> - Do you have some numbers for bigger machines like 1TB or higher memory.
>> I am curious to know how bad is the time penalty.
>
> I'm afraid that I don't have such a large machine, so I need someone who can
> measure execution time in large machine.
>
I can prepare such machine but I cannot say I can use the machine on
this day precisely for example. But I think it must be until the end
of this month at most. So, I would like to fix content of benchmark
first.
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-07-04 8:52 ` HATAYAMA Daisuke
@ 2012-07-11 5:23 ` HATAYAMA Daisuke
2012-07-13 0:36 ` Atsushi Kumagai
0 siblings, 1 reply; 34+ messages in thread
From: HATAYAMA Daisuke @ 2012-07-11 5:23 UTC (permalink / raw)
To: kumagai-atsushi; +Cc: kexec, vgoyal
From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
Date: Wed, 4 Jul 2012 17:52:11 +0900
> From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
> Date: Wed, 4 Jul 2012 14:54:06 +0900
>
>> Hello Vivek,
>>
>> On Mon, 2 Jul 2012 08:39:05 -0400
>> Vivek Goyal <vgoyal@redhat.com> wrote:
>>
>>> On Fri, Jun 29, 2012 at 11:13:20AM +0900, Atsushi Kumagai wrote:
>>> > Hello,
>>> >
>>> > I improved prototype of cyclic processing as version 2.
>>> > If there is no objection to basic idea, I want to consider the things
>>> > related to performance as next step. (Concretely, buffer size and the patch set
>>> > HATAYAMA-san sent a short time ago.)
>>> >
>>>
>>> Hi Atushi san,
>>>
>>> Good to see this work making progress. I have few queries.
>>>
>>> - Do you have some numbers for bigger machines like 1TB or higher memory.
>>> I am curious to know how bad is the time penalty.
>>
>> I'm afraid that I don't have such a large machine, so I need someone who can
>> measure execution time in large machine.
>>
>
> I can prepare such machine but I cannot say I can use the machine on
> this day precisely for example. But I think it must be until the end
> of this month at most. So, I would like to fix content of benchmark
> first.
>
Hello Kumagai-san,
I'm now trying to reserve the machine with big memory, but I'm not
accurately sure when I can use the machine; expecting within this
month? In advance, I want to make sure what I measure in this
benchmark in more detail.
I'm going to collect at least what you showed in your benchmark: RSS
size for no option, -cd31 and -Ed31. Is there another to collect?
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-07-11 5:23 ` HATAYAMA Daisuke
@ 2012-07-13 0:36 ` Atsushi Kumagai
2012-07-13 5:18 ` HATAYAMA Daisuke
0 siblings, 1 reply; 34+ messages in thread
From: Atsushi Kumagai @ 2012-07-13 0:36 UTC (permalink / raw)
To: d.hatayama; +Cc: kexec, vgoyal
Hello HATAYAMA-san,
On Wed, 11 Jul 2012 14:23:59 +0900 (JST)
HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
[...]
> >>> Hi Atushi san,
> >>>
> >>> Good to see this work making progress. I have few queries.
> >>>
> >>> - Do you have some numbers for bigger machines like 1TB or higher memory.
> >>> I am curious to know how bad is the time penalty.
> >>
> >> I'm afraid that I don't have such a large machine, so I need someone who can
> >> measure execution time in large machine.
> >>
> >
> > I can prepare such machine but I cannot say I can use the machine on
> > this day precisely for example. But I think it must be until the end
> > of this month at most. So, I would like to fix content of benchmark
> > first.
> >
>
> Hello Kumagai-san,
>
> I'm now trying to reserve the machine with big memory, but I'm not
> accurately sure when I can use the machine; expecting within this
> month? In advance, I want to make sure what I measure in this
> benchmark in more detail.
OK, I'm glad for your help.
> I'm going to collect at least what you showed in your benchmark: RSS
> size for no option, -cd31 and -Ed31. Is there another to collect?
Would you collect RSS also for --split ?
While I expect the RSS will be increase based on number of child processes,
I want to see measured data.
Thanks
Atsushi Kumagai
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-07-13 0:36 ` Atsushi Kumagai
@ 2012-07-13 5:18 ` HATAYAMA Daisuke
2012-07-13 8:10 ` Atsushi Kumagai
0 siblings, 1 reply; 34+ messages in thread
From: HATAYAMA Daisuke @ 2012-07-13 5:18 UTC (permalink / raw)
To: kumagai-atsushi; +Cc: kexec, vgoyal
From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
Date: Fri, 13 Jul 2012 09:36:11 +0900
> Hello HATAYAMA-san,
>
> On Wed, 11 Jul 2012 14:23:59 +0900 (JST)
> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>
> [...]
>> >>> Hi Atushi san,
>> >>>
>> >>> Good to see this work making progress. I have few queries.
>> >>>
>> >>> - Do you have some numbers for bigger machines like 1TB or higher memory.
>> >>> I am curious to know how bad is the time penalty.
>> >>
>> >> I'm afraid that I don't have such a large machine, so I need someone who can
>> >> measure execution time in large machine.
>> >>
>> >
>> > I can prepare such machine but I cannot say I can use the machine on
>> > this day precisely for example. But I think it must be until the end
>> > of this month at most. So, I would like to fix content of benchmark
>> > first.
>> >
>>
>> Hello Kumagai-san,
>>
>> I'm now trying to reserve the machine with big memory, but I'm not
>> accurately sure when I can use the machine; expecting within this
>> month? In advance, I want to make sure what I measure in this
>> benchmark in more detail.
>
> OK, I'm glad for your help.
>
>> I'm going to collect at least what you showed in your benchmark: RSS
>> size for no option, -cd31 and -Ed31. Is there another to collect?
>
> Would you collect RSS also for --split ?
> While I expect the RSS will be increase based on number of child processes,
> I want to see measured data.
>
Yes, I'll collect RSS information. But due to COW, child processes
share parent process's memory until they modify their memory, so
simply looking value of RSS is not enough. We need to measure size of
shared part e.g. by looking at /proc/<PID>/smaps.
And, what version of makedumpfile should I use for this bench? If you
have additonal changes locally, it might be better to use the version.
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-07-13 5:18 ` HATAYAMA Daisuke
@ 2012-07-13 8:10 ` Atsushi Kumagai
2012-07-18 0:57 ` HATAYAMA Daisuke
0 siblings, 1 reply; 34+ messages in thread
From: Atsushi Kumagai @ 2012-07-13 8:10 UTC (permalink / raw)
To: d.hatayama; +Cc: kexec, vgoyal
Hello HATAYAMA-san,
On Fri, 13 Jul 2012 14:18:01 +0900 (JST)
HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
> Date: Fri, 13 Jul 2012 09:36:11 +0900
>
> > Hello HATAYAMA-san,
> >
> > On Wed, 11 Jul 2012 14:23:59 +0900 (JST)
> > HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> >
> > [...]
> >> >>> Hi Atushi san,
> >> >>>
> >> >>> Good to see this work making progress. I have few queries.
> >> >>>
> >> >>> - Do you have some numbers for bigger machines like 1TB or higher memory.
> >> >>> I am curious to know how bad is the time penalty.
> >> >>
> >> >> I'm afraid that I don't have such a large machine, so I need someone who can
> >> >> measure execution time in large machine.
> >> >>
> >> >
> >> > I can prepare such machine but I cannot say I can use the machine on
> >> > this day precisely for example. But I think it must be until the end
> >> > of this month at most. So, I would like to fix content of benchmark
> >> > first.
> >> >
> >>
> >> Hello Kumagai-san,
> >>
> >> I'm now trying to reserve the machine with big memory, but I'm not
> >> accurately sure when I can use the machine; expecting within this
> >> month? In advance, I want to make sure what I measure in this
> >> benchmark in more detail.
> >
> > OK, I'm glad for your help.
> >
> >> I'm going to collect at least what you showed in your benchmark: RSS
> >> size for no option, -cd31 and -Ed31. Is there another to collect?
> >
> > Would you collect RSS also for --split ?
> > While I expect the RSS will be increase based on number of child processes,
> > I want to see measured data.
> >
>
> Yes, I'll collect RSS information. But due to COW, child processes
> share parent process's memory until they modify their memory, so
> simply looking value of RSS is not enough. We need to measure size of
> shared part e.g. by looking at /proc/<PID>/smaps.
You're right. Would you choose the way which can correct effective
memory consumption ?
> And, what version of makedumpfile should I use for this bench? If you
> have additonal changes locally, it might be better to use the version.
Please use v1.4.4 + v2 patch of cyclic + performance improvement patch which
I sent today, it's the latest version internally.
Thanks
Atsushi Kumagai
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-07-13 8:10 ` Atsushi Kumagai
@ 2012-07-18 0:57 ` HATAYAMA Daisuke
0 siblings, 0 replies; 34+ messages in thread
From: HATAYAMA Daisuke @ 2012-07-18 0:57 UTC (permalink / raw)
To: kumagai-atsushi; +Cc: kexec, vgoyal
From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
Date: Fri, 13 Jul 2012 17:10:39 +0900
> Hello HATAYAMA-san,
>
> On Fri, 13 Jul 2012 14:18:01 +0900 (JST)
> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>
>> From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
>> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
>> Date: Fri, 13 Jul 2012 09:36:11 +0900
>>
>> > Hello HATAYAMA-san,
>> >
>> > On Wed, 11 Jul 2012 14:23:59 +0900 (JST)
>> > HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>> >
>> > [...]
>> >> >>> Hi Atushi san,
>> >> >>>
>> >> >>> Good to see this work making progress. I have few queries.
>> >> >>>
>> >> >>> - Do you have some numbers for bigger machines like 1TB or higher memory.
>> >> >>> I am curious to know how bad is the time penalty.
>> >> >>
>> >> >> I'm afraid that I don't have such a large machine, so I need someone who can
>> >> >> measure execution time in large machine.
>> >> >>
>> >> >
>> >> > I can prepare such machine but I cannot say I can use the machine on
>> >> > this day precisely for example. But I think it must be until the end
>> >> > of this month at most. So, I would like to fix content of benchmark
>> >> > first.
>> >> >
>> >>
>> >> Hello Kumagai-san,
>> >>
>> >> I'm now trying to reserve the machine with big memory, but I'm not
>> >> accurately sure when I can use the machine; expecting within this
>> >> month? In advance, I want to make sure what I measure in this
>> >> benchmark in more detail.
>> >
>> > OK, I'm glad for your help.
>> >
>> >> I'm going to collect at least what you showed in your benchmark: RSS
>> >> size for no option, -cd31 and -Ed31. Is there another to collect?
>> >
>> > Would you collect RSS also for --split ?
>> > While I expect the RSS will be increase based on number of child processes,
>> > I want to see measured data.
>> >
>>
>> Yes, I'll collect RSS information. But due to COW, child processes
>> share parent process's memory until they modify their memory, so
>> simply looking value of RSS is not enough. We need to measure size of
>> shared part e.g. by looking at /proc/<PID>/smaps.
>
> You're right. Would you choose the way which can correct effective
> memory consumption ?
>
OK. PSS in smaps seems proper here, and though /proc/<PID>/pagemap
would also allow us to evaluate exact RSS, it's very slow. This is not
RSS, valgrind seems useful for evaluating usage of heap memory.
I'll consider how to evalute..
>> And, what version of makedumpfile should I use for this bench? If you
>> have additonal changes locally, it might be better to use the version.
>
> Please use v1.4.4 + v2 patch of cyclic + performance improvement patch which
> I sent today, it's the latest version internally.
>
I'll use this version.
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
` (10 preceding siblings ...)
2012-07-02 12:39 ` [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Vivek Goyal
@ 2012-08-06 20:47 ` Vivek Goyal
2012-08-07 7:31 ` HATAYAMA Daisuke
2012-08-08 5:14 ` Atsushi Kumagai
11 siblings, 2 replies; 34+ messages in thread
From: Vivek Goyal @ 2012-08-06 20:47 UTC (permalink / raw)
To: Atsushi Kumagai; +Cc: kexec
On Fri, Jun 29, 2012 at 11:13:20AM +0900, Atsushi Kumagai wrote:
> Hello,
>
> I improved prototype of cyclic processing as version 2.
> If there is no objection to basic idea, I want to consider the things
> related to performance as next step. (Concretely, buffer size and the patch set
> HATAYAMA-san sent a short time ago.)
Hi Atsushi San,
Just checking that what's the state of these patches now. Are they ready
to be included in makedumpfile?
I would love to see new makedumpfile where memory usage does not grow
by physical memory present in the system. (Assuming computig overhead
of cycles is bearable).
Thanks
Vivek
>
>
> Version 1:
>
> http://lists.infradead.org/pipermail/kexec/2012-May/006363.html
>
> Introduction:
>
> - The purpose of cyclic processing is to fix memory consumption.
> - cyclic processing doesn't use temporary bitmap file, store partial bitmap data
> in memory only for each cycle instead.
> - The prototype was passed the regression test which is used by every release.
>
> How to use:
>
> Specify '--cyclic' option, then makedumpfile works cyclically.
>
> Example:
>
> $ makedumpfile --cyclic -cd31 vmcore testdump.cd31
> Copying data : [ 5 %]
> Excluding free pages : [100 %]
> ...
> Excluding free pages : [100 %]
> Copying data : [100 %]
>
> The dumpfile is saved to testdump.cd31.
>
> makedumpfile Completed.
>
> Changelog:
>
> v1 => v2:
>
> - Fix the process of increasing target region.
> - Change method for kdump-compressed format.
> - Add support for ELF format.
> - Add support for --split option.
>
> Memory consumption:
>
> I measured the RSS of makedumpfile with ps(1) as memory consumption.
>
> a. working for 5G memory:
> | RSS [KB]
> | no option | -cd31 | -Ed31
> -----------+-------------+-------------+-------------
> v1.4.4 | 1108 | 1184 | 868
> cyclic | 2976 | 3252 | 2984
>
>
> b. working for 8G memory:
> | RSS [KB]
> | no option | -cd31 | -Ed31
> -----------+-------------+-------------+-------------
> v1.4.4 | 1108 | 1180 | 864
> cyclic | 2972 | 3256 | 2984
>
>
> This result seems to say that v1.4.4 is better than cyclic, but the size
> of temporary bitmap file grows based on memory size. The increasing rate
> can be represented as (memory size / 4K / 8 ) * 2.
>
> memory size [GB] | bitmap size [KB]
> ------------------+------------------
> 5 | 320
> 8 | 512
> ... | ...
> 1,024 | 65,536
>
> Even above size will be counted as memory consumption, if the system
> doesn't mount rootfs. This is the cause of the memory consumption issue
> we discussed.
>
> On the other hand, cyclic processing doesn't use temporary bitmap files,
> all memory consumption will be appeared in RSS.
> The memory consumption to store bitmap will be kept around 2MB(BUFSIZE_CYCLIC * 2).
>
>
> Thanks
> Atsushi Kumagai
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-06 20:47 ` Vivek Goyal
@ 2012-08-07 7:31 ` HATAYAMA Daisuke
2012-08-10 8:39 ` HATAYAMA Daisuke
2012-08-08 5:14 ` Atsushi Kumagai
1 sibling, 1 reply; 34+ messages in thread
From: HATAYAMA Daisuke @ 2012-08-07 7:31 UTC (permalink / raw)
To: vgoyal; +Cc: kexec, kumagai-atsushi
From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
Date: Mon, 6 Aug 2012 16:47:31 -0400
> On Fri, Jun 29, 2012 at 11:13:20AM +0900, Atsushi Kumagai wrote:
>> Hello,
>>
>> I improved prototype of cyclic processing as version 2.
>> If there is no objection to basic idea, I want to consider the things
>> related to performance as next step. (Concretely, buffer size and the patch set
>> HATAYAMA-san sent a short time ago.)
>
> Hi Atsushi San,
>
> Just checking that what's the state of these patches now. Are they ready
> to be included in makedumpfile?
>
> I would love to see new makedumpfile where memory usage does not grow
> by physical memory present in the system. (Assuming computig overhead
> of cycles is bearable).
>
> Thanks
> Vivek
>
Hello Vivek,
I'm just now benchmarking cycle processing on our machine. Please wait
for a while.
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-06 20:47 ` Vivek Goyal
2012-08-07 7:31 ` HATAYAMA Daisuke
@ 2012-08-08 5:14 ` Atsushi Kumagai
2012-08-08 13:25 ` Vivek Goyal
1 sibling, 1 reply; 34+ messages in thread
From: Atsushi Kumagai @ 2012-08-08 5:14 UTC (permalink / raw)
To: vgoyal; +Cc: kexec
Hello Vivek,
On Mon, 6 Aug 2012 16:47:31 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Jun 29, 2012 at 11:13:20AM +0900, Atsushi Kumagai wrote:
> > Hello,
> >
> > I improved prototype of cyclic processing as version 2.
> > If there is no objection to basic idea, I want to consider the things
> > related to performance as next step. (Concretely, buffer size and the patch set
> > HATAYAMA-san sent a short time ago.)
>
> Hi Atsushi San,
>
> Just checking that what's the state of these patches now. Are they ready
> to be included in makedumpfile?
Yes, v2 patches with some fixes are ready to be merged into mainline.
At this time, makedumpfile can work in constant memory space.
(To make sure it's correct, we need to see the result of HATAYAMA-san's benchmark.)
> I would love to see new makedumpfile where memory usage does not grow
> by physical memory present in the system. (Assuming computig overhead
> of cycles is bearable).
I planed to release the next version with the new method to exclude free pages
(based on HATAYAMA-san's RFC patches). This method looks up members of each page
instead of free_list, it's only for performance.
http://lists.infradead.org/pipermail/kexec/2012-June/006441.html
But, this method need more time for consideration, review and test.
If you need the new makedumpfile which can work in constant memory space soon,
shall I release the new version as soon as possible ?
(But, this version still looks up free_list to exclude free pages same as v1.4.4.)
Thanks
Atsushi Kumagai
> Thanks
> Vivek
>
> >
> >
> > Version 1:
> >
> > http://lists.infradead.org/pipermail/kexec/2012-May/006363.html
> >
> > Introduction:
> >
> > - The purpose of cyclic processing is to fix memory consumption.
> > - cyclic processing doesn't use temporary bitmap file, store partial bitmap data
> > in memory only for each cycle instead.
> > - The prototype was passed the regression test which is used by every release.
> >
> > How to use:
> >
> > Specify '--cyclic' option, then makedumpfile works cyclically.
> >
> > Example:
> >
> > $ makedumpfile --cyclic -cd31 vmcore testdump.cd31
> > Copying data : [ 5 %]
> > Excluding free pages : [100 %]
> > ...
> > Excluding free pages : [100 %]
> > Copying data : [100 %]
> >
> > The dumpfile is saved to testdump.cd31.
> >
> > makedumpfile Completed.
> >
> > Changelog:
> >
> > v1 => v2:
> >
> > - Fix the process of increasing target region.
> > - Change method for kdump-compressed format.
> > - Add support for ELF format.
> > - Add support for --split option.
> >
> > Memory consumption:
> >
> > I measured the RSS of makedumpfile with ps(1) as memory consumption.
> >
> > a. working for 5G memory:
> > | RSS [KB]
> > | no option | -cd31 | -Ed31
> > -----------+-------------+-------------+-------------
> > v1.4.4 | 1108 | 1184 | 868
> > cyclic | 2976 | 3252 | 2984
> >
> >
> > b. working for 8G memory:
> > | RSS [KB]
> > | no option | -cd31 | -Ed31
> > -----------+-------------+-------------+-------------
> > v1.4.4 | 1108 | 1180 | 864
> > cyclic | 2972 | 3256 | 2984
> >
> >
> > This result seems to say that v1.4.4 is better than cyclic, but the size
> > of temporary bitmap file grows based on memory size. The increasing rate
> > can be represented as (memory size / 4K / 8 ) * 2.
> >
> > memory size [GB] | bitmap size [KB]
> > ------------------+------------------
> > 5 | 320
> > 8 | 512
> > ... | ...
> > 1,024 | 65,536
> >
> > Even above size will be counted as memory consumption, if the system
> > doesn't mount rootfs. This is the cause of the memory consumption issue
> > we discussed.
> >
> > On the other hand, cyclic processing doesn't use temporary bitmap files,
> > all memory consumption will be appeared in RSS.
> > The memory consumption to store bitmap will be kept around 2MB(BUFSIZE_CYCLIC * 2).
> >
> >
> > Thanks
> > Atsushi Kumagai
> >
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-08 5:14 ` Atsushi Kumagai
@ 2012-08-08 13:25 ` Vivek Goyal
2012-08-09 6:44 ` Atsushi Kumagai
0 siblings, 1 reply; 34+ messages in thread
From: Vivek Goyal @ 2012-08-08 13:25 UTC (permalink / raw)
To: Atsushi Kumagai; +Cc: kexec
On Wed, Aug 08, 2012 at 02:14:00PM +0900, Atsushi Kumagai wrote:
> Hello Vivek,
>
> On Mon, 6 Aug 2012 16:47:31 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
>
> > On Fri, Jun 29, 2012 at 11:13:20AM +0900, Atsushi Kumagai wrote:
> > > Hello,
> > >
> > > I improved prototype of cyclic processing as version 2.
> > > If there is no objection to basic idea, I want to consider the things
> > > related to performance as next step. (Concretely, buffer size and the patch set
> > > HATAYAMA-san sent a short time ago.)
> >
> > Hi Atsushi San,
> >
> > Just checking that what's the state of these patches now. Are they ready
> > to be included in makedumpfile?
>
> Yes, v2 patches with some fixes are ready to be merged into mainline.
> At this time, makedumpfile can work in constant memory space.
> (To make sure it's correct, we need to see the result of HATAYAMA-san's benchmark.)
>
> > I would love to see new makedumpfile where memory usage does not grow
> > by physical memory present in the system. (Assuming computig overhead
> > of cycles is bearable).
>
> I planed to release the next version with the new method to exclude free pages
> (based on HATAYAMA-san's RFC patches). This method looks up members of each page
> instead of free_list, it's only for performance.
>
> http://lists.infradead.org/pipermail/kexec/2012-June/006441.html
>
> But, this method need more time for consideration, review and test.
>
>
> If you need the new makedumpfile which can work in constant memory space soon,
> shall I release the new version as soon as possible ?
> (But, this version still looks up free_list to exclude free pages same as v1.4.4.)
>
I think once Hatayama's testing is done, it is a good idea to merge cyclic
patches and make a release. And soon after review and testing, merge
hatayama's other patches of walking through mem_map array. My understanding
is that walking through mem_map array will save us cpu cycles in fixed
memory usage mode.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-08 13:25 ` Vivek Goyal
@ 2012-08-09 6:44 ` Atsushi Kumagai
0 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-08-09 6:44 UTC (permalink / raw)
To: vgoyal; +Cc: kexec
Hello Vivek,
On Wed, 8 Aug 2012 09:25:10 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Aug 08, 2012 at 02:14:00PM +0900, Atsushi Kumagai wrote:
> > Hello Vivek,
> >
> > On Mon, 6 Aug 2012 16:47:31 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > > On Fri, Jun 29, 2012 at 11:13:20AM +0900, Atsushi Kumagai wrote:
> > > > Hello,
> > > >
> > > > I improved prototype of cyclic processing as version 2.
> > > > If there is no objection to basic idea, I want to consider the things
> > > > related to performance as next step. (Concretely, buffer size and the patch set
> > > > HATAYAMA-san sent a short time ago.)
> > >
> > > Hi Atsushi San,
> > >
> > > Just checking that what's the state of these patches now. Are they ready
> > > to be included in makedumpfile?
> >
> > Yes, v2 patches with some fixes are ready to be merged into mainline.
> > At this time, makedumpfile can work in constant memory space.
> > (To make sure it's correct, we need to see the result of HATAYAMA-san's benchmark.)
> >
> > > I would love to see new makedumpfile where memory usage does not grow
> > > by physical memory present in the system. (Assuming computig overhead
> > > of cycles is bearable).
> >
> > I planed to release the next version with the new method to exclude free pages
> > (based on HATAYAMA-san's RFC patches). This method looks up members of each page
> > instead of free_list, it's only for performance.
> >
> > http://lists.infradead.org/pipermail/kexec/2012-June/006441.html
> >
> > But, this method need more time for consideration, review and test.
> >
> >
> > If you need the new makedumpfile which can work in constant memory space soon,
> > shall I release the new version as soon as possible ?
> > (But, this version still looks up free_list to exclude free pages same as v1.4.4.)
> >
>
> I think once Hatayama's testing is done, it is a good idea to merge cyclic
> patches and make a release. And soon after review and testing, merge
> hatayama's other patches of walking through mem_map array. My understanding
> is that walking through mem_map array will save us cpu cycles in fixed
> memory usage mode.
OK, I start to prepare for the release of the next version with cyclic patches.
Your understanding of walking through mem_map array is correct, it's expected
to reduce wasteful process in fixed memory usage mode.
http://lists.infradead.org/pipermail/kexec/2012-July/006543.html
Thanks
Atsushi Kumagai
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-07 7:31 ` HATAYAMA Daisuke
@ 2012-08-10 8:39 ` HATAYAMA Daisuke
2012-08-10 14:36 ` Vivek Goyal
0 siblings, 1 reply; 34+ messages in thread
From: HATAYAMA Daisuke @ 2012-08-10 8:39 UTC (permalink / raw)
To: vgoyal, kumagai-atsushi; +Cc: kexec
From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
Date: Tue, 7 Aug 2012 16:31:20 +0900
> From: Vivek Goyal <vgoyal@redhat.com>
> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
> Date: Mon, 6 Aug 2012 16:47:31 -0400
>
>> On Fri, Jun 29, 2012 at 11:13:20AM +0900, Atsushi Kumagai wrote:
>>> Hello,
>>>
>>> I improved prototype of cyclic processing as version 2.
>>> If there is no objection to basic idea, I want to consider the things
>>> related to performance as next step. (Concretely, buffer size and the patch set
>>> HATAYAMA-san sent a short time ago.)
>>
>> Hi Atsushi San,
>>
>> Just checking that what's the state of these patches now. Are they ready
>> to be included in makedumpfile?
>>
>> I would love to see new makedumpfile where memory usage does not grow
>> by physical memory present in the system. (Assuming computig overhead
>> of cycles is bearable).
>>
>> Thanks
>> Vivek
>>
>
> Hello Vivek,
>
> I'm just now benchmarking cycle processing on our machine. Please wait
> for a while.
>
> Thanks.
> HATAYAMA, Daisuke
>
I finished benchmarking filtering time and demonstrate the result.
But I failed to collect amount of memory consumption by my mistake. If
they are necessary, I'll again try to collect them. But we have 9 days
vacation starting tommorow, so I'll do that after the vacation.
The machine spec I used is as follows:
Memory: 2TB
CPU: Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz
(8 sockets, 10 cores, 2 threads)
In the first step, I chosed buffer size 10KB and it took about 3h 45m
57s. So, next I changed the buffer size to 512KB and measured up to
8MB.
The result is as follows:
| buffer size | time |
|-------------+------------|
| 8 MB | 48.32 sec |
| 4 MB | 55.76 sec |
| 2 MB | 69.91 sec |
| 1 MB | 98.25 sec |
| 512 KB | 154.42 sec |
BTW, the existing free_list logic took about 48 sec for the same
vmcore as below.
STEP [Excluding free pages ] : 49.846321 seconds
STEP [Excluding unnecessary pages] : 6.339228 seconds
STEP [Excluding free pages ] : 48.595884 seconds
STEP [Excluding unnecessary pages] : 6.530479 seconds
STEP [Excluding free pages ] : 48.598879 seconds
STEP [Excluding unnecessary pages] : 6.527133 seconds
STEP [Excluding free pages ] : 48.602401 seconds
STEP [Excluding unnecessary pages] : 6.502681 seconds
STEP [Excluding free pages ] : 48.602010 seconds
STEP [Excluding unnecessary pages] : 6.469853 seconds
STEP [Excluding free pages ] : 48.601637 seconds
STEP [Excluding unnecessary pages] : 6.431381 seconds
STEP [Excluding free pages ] : 48.601195 seconds
STEP [Excluding unnecessary pages] : 6.416676 seconds
STEP [Excluding free pages ] : 48.602221 seconds
STEP [Excluding unnecessary pages] : 6.387611 seconds
STEP [Excluding free pages ] : 48.589972 seconds
STEP [Excluding unnecessary pages] : 0.816955 seconds
Original pages : 0x0000000040049690
Excluded pages : 0x000000001f3c1564
Pages filled with zero : 0x0000000000000000
Cache pages : 0x000000000000467d
Cache pages + private : 0x000000000000103c
User process data pages : 0x00000000000015d6
Free pages : 0x000000001f3ba8d5
Remaining pages : 0x0000000020c8812c
(The number of pages is reduced to 51%.)
Memory Hole : 0xffffffffe0036970
--------------------------------------------------
Total pages : 0x0000000020080000
There are other log files. I can directly email them if necessary; the
reason why I didn't attach the log files in this mail is that sending
the mail with attachment to this ML requires authentication and it
would take some time.
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-10 8:39 ` HATAYAMA Daisuke
@ 2012-08-10 14:36 ` Vivek Goyal
2012-08-14 11:55 ` HATAYAMA Daisuke
0 siblings, 1 reply; 34+ messages in thread
From: Vivek Goyal @ 2012-08-10 14:36 UTC (permalink / raw)
To: HATAYAMA Daisuke; +Cc: kexec, kumagai-atsushi
On Fri, Aug 10, 2012 at 05:39:38PM +0900, HATAYAMA Daisuke wrote:
[..]
>
> I finished benchmarking filtering time and demonstrate the result.
> But I failed to collect amount of memory consumption by my mistake. If
> they are necessary, I'll again try to collect them. But we have 9 days
> vacation starting tommorow, so I'll do that after the vacation.
>
Thanks a lot for doing this benchmarking.
> The machine spec I used is as follows:
>
> Memory: 2TB
> CPU: Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz
> (8 sockets, 10 cores, 2 threads)
>
> In the first step, I chosed buffer size 10KB and it took about 3h 45m
> 57s. So, next I changed the buffer size to 512KB and measured up to
> 8MB.
What is this buffer size? Is user supposed to specify it? Is it some fixed
size buffer which makedumpfile can use to read in memory and once we cross
the buffer size we need to let some data from buffer go?
>
> The result is as follows:
>
> | buffer size | time |
> |-------------+------------|
> | 8 MB | 48.32 sec |
> | 4 MB | 55.76 sec |
> | 2 MB | 69.91 sec |
> | 1 MB | 98.25 sec |
> | 512 KB | 154.42 sec |
So, on a 2TB system, with 8MB buffer, we could filter and save vmcore in
around 48 seconds? Or is it just filtering time.
48 seconds for 2TB system, sounds pretty decent to me.
Are these results with existing free_list implementation or with your
patches of walking through mem_map array?
>
> BTW, the existing free_list logic took about 48 sec for the same
> vmcore as below.
I guess above results were with your patches of walking mem_map array.
>
> STEP [Excluding free pages ] : 49.846321 seconds
> STEP [Excluding unnecessary pages] : 6.339228 seconds
> STEP [Excluding free pages ] : 48.595884 seconds
> STEP [Excluding unnecessary pages] : 6.530479 seconds
> STEP [Excluding free pages ] : 48.598879 seconds
> STEP [Excluding unnecessary pages] : 6.527133 seconds
> STEP [Excluding free pages ] : 48.602401 seconds
> STEP [Excluding unnecessary pages] : 6.502681 seconds
> STEP [Excluding free pages ] : 48.602010 seconds
> STEP [Excluding unnecessary pages] : 6.469853 seconds
> STEP [Excluding free pages ] : 48.601637 seconds
> STEP [Excluding unnecessary pages] : 6.431381 seconds
> STEP [Excluding free pages ] : 48.601195 seconds
> STEP [Excluding unnecessary pages] : 6.416676 seconds
> STEP [Excluding free pages ] : 48.602221 seconds
> STEP [Excluding unnecessary pages] : 6.387611 seconds
> STEP [Excluding free pages ] : 48.589972 seconds
> STEP [Excluding unnecessary pages] : 0.816955 seconds
So what does above represent. Each step is taking 48 seconds or total
time taken to filter vmcore is 48 seconds? What's the buffer size used
here.
Does that mean that filtering time for both mem_map array approach and
free_list approach are same?
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-10 14:36 ` Vivek Goyal
@ 2012-08-14 11:55 ` HATAYAMA Daisuke
2012-08-15 6:27 ` Atsushi Kumagai
0 siblings, 1 reply; 34+ messages in thread
From: HATAYAMA Daisuke @ 2012-08-14 11:55 UTC (permalink / raw)
To: vgoyal; +Cc: kexec, kumagai-atsushi
From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
Date: Fri, 10 Aug 2012 10:36:55 -0400
> On Fri, Aug 10, 2012 at 05:39:38PM +0900, HATAYAMA Daisuke wrote:
>
> [..]
>>
>> I finished benchmarking filtering time and demonstrate the result.
>> But I failed to collect amount of memory consumption by my mistake. If
>> they are necessary, I'll again try to collect them. But we have 9 days
>> vacation starting tommorow, so I'll do that after the vacation.
>>
>
> Thanks a lot for doing this benchmarking.
>
>> The machine spec I used is as follows:
>>
>> Memory: 2TB
>> CPU: Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz
>> (8 sockets, 10 cores, 2 threads)
>>
>> In the first step, I chosed buffer size 10KB and it took about 3h 45m
>> 57s. So, next I changed the buffer size to 512KB and measured up to
>> 8MB.
>
> What is this buffer size? Is user supposed to specify it? Is it some fixed
> size buffer which makedumpfile can use to read in memory and once we cross
> the buffer size we need to let some data from buffer go?
>
The buffer size is just the size for a single bitmap. A single bitmap
has the given buffer size length. If passing 512kB as buffer size, two
512kB bitmaps, so 1MB in total, are allocated.
The next equation holds:
number_of_cycles == system_memory / ( bit_per_bytes * page_size * the_buf_size )
On this benchmarking,
system_memory := 2TB
bit_per_bytes := 8
page_size := 4KB
The buffer size was fixed in Kumagai-san's version, but in this
benchmarking I added --bufsize command-line option for flexibility.
>>
>> The result is as follows:
>>
>> | buffer size | time |
>> |-------------+------------|
>> | 8 MB | 48.32 sec |
>> | 4 MB | 55.76 sec |
>> | 2 MB | 69.91 sec |
>> | 1 MB | 98.25 sec |
>> | 512 KB | 154.42 sec |
>
> So, on a 2TB system, with 8MB buffer, we could filter and save vmcore in
> around 48 seconds? Or is it just filtering time.
>
Just filtering time. I first tried with writing memory but it took
very long time over a few hours. There are several test cases for
several buffer size, I needed to reduce the time required.
> 48 seconds for 2TB system, sounds pretty decent to me.
>
I think so too, because the time for writing data is by far larger
than this time.
> Are these results with existing free_list implementation or with your
> patches of walking through mem_map array?
>
>>
>> BTW, the existing free_list logic took about 48 sec for the same
>> vmcore as below.
>
> I guess above results were with your patches of walking mem_map array.
>
Yes, the above results are mem_map array case.
>>
>> STEP [Excluding free pages ] : 49.846321 seconds
>> STEP [Excluding unnecessary pages] : 6.339228 seconds
>> STEP [Excluding free pages ] : 48.595884 seconds
>> STEP [Excluding unnecessary pages] : 6.530479 seconds
>> STEP [Excluding free pages ] : 48.598879 seconds
>> STEP [Excluding unnecessary pages] : 6.527133 seconds
>> STEP [Excluding free pages ] : 48.602401 seconds
>> STEP [Excluding unnecessary pages] : 6.502681 seconds
>> STEP [Excluding free pages ] : 48.602010 seconds
>> STEP [Excluding unnecessary pages] : 6.469853 seconds
>> STEP [Excluding free pages ] : 48.601637 seconds
>> STEP [Excluding unnecessary pages] : 6.431381 seconds
>> STEP [Excluding free pages ] : 48.601195 seconds
>> STEP [Excluding unnecessary pages] : 6.416676 seconds
>> STEP [Excluding free pages ] : 48.602221 seconds
>> STEP [Excluding unnecessary pages] : 6.387611 seconds
>> STEP [Excluding free pages ] : 48.589972 seconds
>> STEP [Excluding unnecessary pages] : 0.816955 seconds
>
> So what does above represent. Each step is taking 48 seconds or total
> time taken to filter vmcore is 48 seconds? What's the buffer size used
> here.
>
The free_list logic always filteres a whole memory range even if the
range we need to filter is only a cerntain part, so it took about 48
seconds at each cycle.
> Does that mean that filtering time for both mem_map array approach and
> free_list approach are same?
>
No.
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-14 11:55 ` HATAYAMA Daisuke
@ 2012-08-15 6:27 ` Atsushi Kumagai
2012-08-15 13:31 ` Vivek Goyal
` (2 more replies)
0 siblings, 3 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-08-15 6:27 UTC (permalink / raw)
To: d.hatayama; +Cc: kexec, vgoyal
Hello HATAYAMA-san,
On Tue, 14 Aug 2012 20:55:32 +0900 (JST)
HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
> Date: Fri, 10 Aug 2012 10:36:55 -0400
>
> > On Fri, Aug 10, 2012 at 05:39:38PM +0900, HATAYAMA Daisuke wrote:
> >
> > [..]
> >>
> >> I finished benchmarking filtering time and demonstrate the result.
> >> But I failed to collect amount of memory consumption by my mistake. If
> >> they are necessary, I'll again try to collect them. But we have 9 days
> >> vacation starting tommorow, so I'll do that after the vacation.
> >>
Thank you for your help.
Could you continue to measure amount of memory consumption ?
> >
> > Thanks a lot for doing this benchmarking.
> >
> >> The machine spec I used is as follows:
> >>
> >> Memory: 2TB
> >> CPU: Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz
> >> (8 sockets, 10 cores, 2 threads)
> >>
> >> In the first step, I chosed buffer size 10KB and it took about 3h 45m
> >> 57s. So, next I changed the buffer size to 512KB and measured up to
> >> 8MB.
> >
> > What is this buffer size? Is user supposed to specify it? Is it some fixed
> > size buffer which makedumpfile can use to read in memory and once we cross
> > the buffer size we need to let some data from buffer go?
> >
>
> The buffer size is just the size for a single bitmap. A single bitmap
> has the given buffer size length. If passing 512kB as buffer size, two
> 512kB bitmaps, so 1MB in total, are allocated.
>
> The next equation holds:
>
> number_of_cycles == system_memory / ( bit_per_bytes * page_size * the_buf_size )
>
> On this benchmarking,
>
> system_memory := 2TB
> bit_per_bytes := 8
> page_size := 4KB
>
> The buffer size was fixed in Kumagai-san's version, but in this
> benchmarking I added --bufsize command-line option for flexibility.
I'll add the --cyclic-buffer option to specify the buffer size in the release version.
> >>
> >> The result is as follows:
> >>
> >> | buffer size | time |
> >> |-------------+------------|
> >> | 8 MB | 48.32 sec |
> >> | 4 MB | 55.76 sec |
> >> | 2 MB | 69.91 sec |
> >> | 1 MB | 98.25 sec |
> >> | 512 KB | 154.42 sec |
> >
> > So, on a 2TB system, with 8MB buffer, we could filter and save vmcore in
> > around 48 seconds? Or is it just filtering time.
> >
>
> Just filtering time. I first tried with writing memory but it took
> very long time over a few hours. There are several test cases for
> several buffer size, I needed to reduce the time required.
>
> > 48 seconds for 2TB system, sounds pretty decent to me.
> >
>
> I think so too, because the time for writing data is by far larger
> than this time.
>
> > Are these results with existing free_list implementation or with your
> > patches of walking through mem_map array?
> >
> >>
> >> BTW, the existing free_list logic took about 48 sec for the same
> >> vmcore as below.
> >
> > I guess above results were with your patches of walking mem_map array.
> >
>
> Yes, the above results are mem_map array case.
>
> >>
> >> STEP [Excluding free pages ] : 49.846321 seconds
> >> STEP [Excluding unnecessary pages] : 6.339228 seconds
> >> STEP [Excluding free pages ] : 48.595884 seconds
> >> STEP [Excluding unnecessary pages] : 6.530479 seconds
> >> STEP [Excluding free pages ] : 48.598879 seconds
> >> STEP [Excluding unnecessary pages] : 6.527133 seconds
> >> STEP [Excluding free pages ] : 48.602401 seconds
> >> STEP [Excluding unnecessary pages] : 6.502681 seconds
> >> STEP [Excluding free pages ] : 48.602010 seconds
> >> STEP [Excluding unnecessary pages] : 6.469853 seconds
> >> STEP [Excluding free pages ] : 48.601637 seconds
> >> STEP [Excluding unnecessary pages] : 6.431381 seconds
> >> STEP [Excluding free pages ] : 48.601195 seconds
> >> STEP [Excluding unnecessary pages] : 6.416676 seconds
> >> STEP [Excluding free pages ] : 48.602221 seconds
> >> STEP [Excluding unnecessary pages] : 6.387611 seconds
> >> STEP [Excluding free pages ] : 48.589972 seconds
> >> STEP [Excluding unnecessary pages] : 0.816955 seconds
> >
> > So what does above represent. Each step is taking 48 seconds or total
> > time taken to filter vmcore is 48 seconds? What's the buffer size used
> > here.
> >
>
> The free_list logic always filteres a whole memory range even if the
> range we need to filter is only a cerntain part, so it took about 48
> seconds at each cycle.
It seems that the mem_map array logic is effective especially in large machine.
I'll review your mem_map array patchset after the next version is released.
Thanks
Atsushi Kumagai
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-15 6:27 ` Atsushi Kumagai
@ 2012-08-15 13:31 ` Vivek Goyal
2012-08-20 0:12 ` HATAYAMA Daisuke
2012-08-29 2:50 ` HATAYAMA Daisuke
2 siblings, 0 replies; 34+ messages in thread
From: Vivek Goyal @ 2012-08-15 13:31 UTC (permalink / raw)
To: Atsushi Kumagai; +Cc: d.hatayama, kexec
On Wed, Aug 15, 2012 at 03:27:10PM +0900, Atsushi Kumagai wrote:
[..]
> > >>
> > >> STEP [Excluding free pages ] : 49.846321 seconds
> > >> STEP [Excluding unnecessary pages] : 6.339228 seconds
> > >> STEP [Excluding free pages ] : 48.595884 seconds
> > >> STEP [Excluding unnecessary pages] : 6.530479 seconds
> > >> STEP [Excluding free pages ] : 48.598879 seconds
> > >> STEP [Excluding unnecessary pages] : 6.527133 seconds
> > >> STEP [Excluding free pages ] : 48.602401 seconds
> > >> STEP [Excluding unnecessary pages] : 6.502681 seconds
> > >> STEP [Excluding free pages ] : 48.602010 seconds
> > >> STEP [Excluding unnecessary pages] : 6.469853 seconds
> > >> STEP [Excluding free pages ] : 48.601637 seconds
> > >> STEP [Excluding unnecessary pages] : 6.431381 seconds
> > >> STEP [Excluding free pages ] : 48.601195 seconds
> > >> STEP [Excluding unnecessary pages] : 6.416676 seconds
> > >> STEP [Excluding free pages ] : 48.602221 seconds
> > >> STEP [Excluding unnecessary pages] : 6.387611 seconds
> > >> STEP [Excluding free pages ] : 48.589972 seconds
> > >> STEP [Excluding unnecessary pages] : 0.816955 seconds
> > >
> > > So what does above represent. Each step is taking 48 seconds or total
> > > time taken to filter vmcore is 48 seconds? What's the buffer size used
> > > here.
> > >
> >
> > The free_list logic always filteres a whole memory range even if the
> > range we need to filter is only a cerntain part, so it took about 48
> > seconds at each cycle.
>
> It seems that the mem_map array logic is effective especially in large machine.
> I'll review your mem_map array patchset after the next version is released.
Yes please review and merge mem_map array changes also. This seems to be
resulting in signifacnt saving. According to numbers above, looks like it cuts
down filtering time by 9 times.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-15 6:27 ` Atsushi Kumagai
2012-08-15 13:31 ` Vivek Goyal
@ 2012-08-20 0:12 ` HATAYAMA Daisuke
2012-08-29 2:50 ` HATAYAMA Daisuke
2 siblings, 0 replies; 34+ messages in thread
From: HATAYAMA Daisuke @ 2012-08-20 0:12 UTC (permalink / raw)
To: kumagai-atsushi; +Cc: kexec, vgoyal
From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
Date: Wed, 15 Aug 2012 15:27:10 +0900
> Hello HATAYAMA-san,
>
> On Tue, 14 Aug 2012 20:55:32 +0900 (JST)
> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>
>> From: Vivek Goyal <vgoyal@redhat.com>
>> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
>> Date: Fri, 10 Aug 2012 10:36:55 -0400
>>
>> > On Fri, Aug 10, 2012 at 05:39:38PM +0900, HATAYAMA Daisuke wrote:
>> >
>> > [..]
>> >>
>> >> I finished benchmarking filtering time and demonstrate the result.
>> >> But I failed to collect amount of memory consumption by my mistake. If
>> >> they are necessary, I'll again try to collect them. But we have 9 days
>> >> vacation starting tommorow, so I'll do that after the vacation.
>> >>
>
> Thank you for your help.
> Could you continue to measure amount of memory consumption ?
>
OK, Kumagai-san, please wait for a while.
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-15 6:27 ` Atsushi Kumagai
2012-08-15 13:31 ` Vivek Goyal
2012-08-20 0:12 ` HATAYAMA Daisuke
@ 2012-08-29 2:50 ` HATAYAMA Daisuke
2012-08-29 12:35 ` Vivek Goyal
2 siblings, 1 reply; 34+ messages in thread
From: HATAYAMA Daisuke @ 2012-08-29 2:50 UTC (permalink / raw)
To: kumagai-atsushi; +Cc: kexec, vgoyal
From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
Date: Wed, 15 Aug 2012 15:27:10 +0900
> Hello HATAYAMA-san,
>
> On Tue, 14 Aug 2012 20:55:32 +0900 (JST)
> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>
>> From: Vivek Goyal <vgoyal@redhat.com>
>> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
>> Date: Fri, 10 Aug 2012 10:36:55 -0400
>>
>> > On Fri, Aug 10, 2012 at 05:39:38PM +0900, HATAYAMA Daisuke wrote:
>> >
>> > [..]
>> >>
>> >> I finished benchmarking filtering time and demonstrate the result.
>> >> But I failed to collect amount of memory consumption by my mistake. If
>> >> they are necessary, I'll again try to collect them. But we have 9 days
>> >> vacation starting tommorow, so I'll do that after the vacation.
>> >>
>
> Thank you for your help.
> Could you continue to measure amount of memory consumption ?
>
Hello Kumagai-san, here is the benchmark result for the amount of
actual memory consumption when there are multiple child processes
using --split option.
I measured the data in the following way:
* for buffer size, 1MB, 2MB, 4MB, 8MB and 16MB are chosen.
* for the number of splitted dumpfiles, 2, 4, 8 and 16 are chosen.
* I collected RSS, SHA(RED) and PRI(VATE) memory for each process
from /proc/self/smaps; RSS is the value of the column "Rss:", SHA
is the value of the column "Shared_Clean" plus the value of the
column "Shared_Dirty" and PRI is the value of the column
"Private_Clean" plus the value of the clumn "Private_Dirty".
Note that RSS is the amount of memory actually mapped on physical
memory, and RSS = SHA + PRI always holds.
Note also that the reason of collecting SHA is the fact that
makedumpfile --split is implemented using fork and child processes
shares the same parent's memory by COW.
Notice that I write only one child process data below because each
result is exactly the same and also purely due to space issue;
there's no room to put all childs' data here.
* In the following table, I'm using [BEGIN] => [END] notation, such
as 3.16 => 0.57. This indicates at what point the data is
measured.
For the parent process, the left-hand side, [BEGIN] part, is
before waitpid() in writeout_multiple_dumpfiles(), while the
right-hand side, [END] part, after waitpid().
For child process, the left-hand side, [BEGIN] part, is the
beginning of write_kdump_pages_and_bitmap_cyclic(), while the
right-hand side, [END] part, the end of
write_kdump_pages_and_bitmap_cyclic().
buffer size 1MB
split parent child
------------------------------------------
RSS 3.42 => 3.42 2.98 => 3.27
SHA 2 3.16 => 0.57 2.85 => 0.89
PRI 0.25 => 2.84 0.13 => 2.38
------------------------------------------
RSS 3.42 => 3.42 2.98 => 3.27
SHA 4 3.18 => 0.57 2.85 => 0.89
PRI 0.24 => 2.84 0.13 => 2.38
------------------------------------------
RSS 3.42 => 3.42 2.98 => 3.27
SHA 8 3.18 => 0.57 2.85 => 0.89
PRI 0.23 => 2.84 0.13 => 2.37
------------------------------------------
RSS 3.41 => 3.41 2.98 => 3.26
SHA 16 3.18 => 0.57 2.84 => 0.89
PRI 0.23 => 2.84 0.13 => 2.37
buffer size 2MB
split parent child
------------------------------------------
RSS 5.42 => 5.42 4.98 => 5.27
SHA 2 5.17 => 0.57 4.85 => 0.89
PRI 0.25 => 4.84 0.13 => 4.37
------------------------------------------
RSS 5.42 => 5.42 4.98 => 5.27
SHA 4 5.17 => 0.57 4.85 => 0.89
PRI 0.25 => 4.84 0.13 => 4.37
------------------------------------------
RSS 5.41 => 5.41 4.98 => 5.26
SHA 8 5.18 => 0.57 4.84 => 0.89
PRI 0.23 => 4.84 0.13 => 4.37
------------------------------------------
RSS 5.42 => 5.42 4.98 => 5.27
SHA 16 5.19 => 0.57 4.85 => 0.89
PRI 0.23 => 4.84 0.13 => 4.37
buffer size 4MB
split parent child
------------------------------------------
RSS 9.42 => 9.42 8.98 => 9.27
SHA 2 9.18 => 0.57 8.85 => 0.89
PRI 0.24 => 8.84 0.13 => 8.38
------------------------------------------
RSS 9.41 => 9.41 8.98 => 9.26
SHA 4 9.18 => 0.57 8.84 => 0.89
PRI 0.24 => 8.84 0.13 => 8.37
------------------------------------------
RSS 9.41 => 9.41 8.98 => 9.26
SHA 8 9.18 => 0.57 8.84 => 0.89
PRI 0.24 => 8.84 0.13 => 8.37
------------------------------------------
RSS 9.42 => 9.42 8.98 => 9.27
SHA 16 9.18 => 0.57 8.85 => 0.89
PRI 0.23 => 8.84 0.13 => 8.38
buffer size 8MB
split parent child
---------------------------------------------
RSS 17.41 => 17.41 16.98 => 17.26
SHA 2 17.17 => 0.57 16.84 => 0.89
PRI 0.25 => 16.84 0.13 => 16.37
---------------------------------------------
RSS 17.41 => 17.41 16.98 => 17.27
SHA 4 17.18 => 0.57 16.84 => 0.89
PRI 0.24 => 16.84 0.13 => 16.38
---------------------------------------------
RSS 17.42 => 17.42 16.98 => 17.27
SHA 8 17.17 => 0.57 16.85 => 0.89
PRI 0.25 => 16.84 0.13 => 16.37
---------------------------------------------
RSS 17.42 => 17.42 16.98 => 17.27
SHA 16 17.18 => 0.57 16.85 => 0.89
PRI 0.23 => 16.84 0.13 => 16.38
buffer size 16MB
split parent child
---------------------------------------------
RSS 33.41 => 33.41 32.98 => 33.27
SHA 2 33.16 => 0.57 32.84 => 0.89
PRI 0.25 => 32.84 0.13 => 32.38
---------------------------------------------
RSS 33.42 => 33.42 32.98 => 33.27
SHA 4 33.18 => 0.57 32.85 => 0.89
PRI 0.24 => 32.84 0.13 => 32.38
---------------------------------------------
RSS 33.42 => 33.42 32.98 => 33.27
SHA 8 33.18 => 0.57 32.85 => 0.89
PRI 0.23 => 32.84 0.13 => 32.37
---------------------------------------------
RSS 33.41 => 33.41 32.98 => 33.26
SHA 16 33.18 => 0.57 32.84 => 0.89
PRI 0.24 => 32.84 0.13 => 32.37
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-29 2:50 ` HATAYAMA Daisuke
@ 2012-08-29 12:35 ` Vivek Goyal
2012-08-30 0:55 ` HATAYAMA Daisuke
0 siblings, 1 reply; 34+ messages in thread
From: Vivek Goyal @ 2012-08-29 12:35 UTC (permalink / raw)
To: HATAYAMA Daisuke; +Cc: kexec, kumagai-atsushi
On Wed, Aug 29, 2012 at 11:50:31AM +0900, HATAYAMA Daisuke wrote:
> From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
> Date: Wed, 15 Aug 2012 15:27:10 +0900
>
> > Hello HATAYAMA-san,
> >
> > On Tue, 14 Aug 2012 20:55:32 +0900 (JST)
> > HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> >
> >> From: Vivek Goyal <vgoyal@redhat.com>
> >> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
> >> Date: Fri, 10 Aug 2012 10:36:55 -0400
> >>
> >> > On Fri, Aug 10, 2012 at 05:39:38PM +0900, HATAYAMA Daisuke wrote:
> >> >
> >> > [..]
> >> >>
> >> >> I finished benchmarking filtering time and demonstrate the result.
> >> >> But I failed to collect amount of memory consumption by my mistake. If
> >> >> they are necessary, I'll again try to collect them. But we have 9 days
> >> >> vacation starting tommorow, so I'll do that after the vacation.
> >> >>
> >
> > Thank you for your help.
> > Could you continue to measure amount of memory consumption ?
> >
>
> Hello Kumagai-san, here is the benchmark result for the amount of
> actual memory consumption when there are multiple child processes
> using --split option.
>
Hi,
Thanks for testing results. I had few questions.
- What's the objective of this testing? Are we just trying to figure out
memory footprint of makedumpfile in cyclic mode using struct page
filtering and compare with free list filtering?
- Why are we testing using --split option. How does that help. In kdump
kernel we boot only 1 cpu. And if all the dump files are being saved
to same disk, it might not give lot of performance boost.
Even if does give performance boost, this seems to be orthogonal to
the idea of going using struct page for filtering. Will single thread
dumping not give a good idea about memory footprint?
- What's the conclusion of below measurements and numbers.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-29 12:35 ` Vivek Goyal
@ 2012-08-30 0:55 ` HATAYAMA Daisuke
2012-08-30 6:29 ` Atsushi Kumagai
0 siblings, 1 reply; 34+ messages in thread
From: HATAYAMA Daisuke @ 2012-08-30 0:55 UTC (permalink / raw)
To: vgoyal; +Cc: kumagai-atsushi, kexec
From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
Date: Wed, 29 Aug 2012 08:35:39 -0400
> On Wed, Aug 29, 2012 at 11:50:31AM +0900, HATAYAMA Daisuke wrote:
>> From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
>> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
>> Date: Wed, 15 Aug 2012 15:27:10 +0900
>>
>> > Hello HATAYAMA-san,
>> >
>> > On Tue, 14 Aug 2012 20:55:32 +0900 (JST)
>> > HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>> >
>> >> From: Vivek Goyal <vgoyal@redhat.com>
>> >> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
>> >> Date: Fri, 10 Aug 2012 10:36:55 -0400
>> >>
>> >> > On Fri, Aug 10, 2012 at 05:39:38PM +0900, HATAYAMA Daisuke wrote:
>> >> >
>> >> > [..]
>> >> >>
>> >> >> I finished benchmarking filtering time and demonstrate the result.
>> >> >> But I failed to collect amount of memory consumption by my mistake. If
>> >> >> they are necessary, I'll again try to collect them. But we have 9 days
>> >> >> vacation starting tommorow, so I'll do that after the vacation.
>> >> >>
>> >
>> > Thank you for your help.
>> > Could you continue to measure amount of memory consumption ?
>> >
>>
>> Hello Kumagai-san, here is the benchmark result for the amount of
>> actual memory consumption when there are multiple child processes
>> using --split option.
>>
>
> Hi,
>
> Thanks for testing results. I had few questions.
>
> - What's the objective of this testing? Are we just trying to figure out
> memory footprint of makedumpfile in cyclic mode using struct page
> filtering and compare with free list filtering?
>
Basically yes, and in addition to that, I wanted to see when using
--split because then makedumpfile process forks and memory usage
increases.
> - Why are we testing using --split option. How does that help. In kdump
> kernel we boot only 1 cpu. And if all the dump files are being saved
> to same disk, it might not give lot of performance boost.
>
> Even if does give performance boost, this seems to be orthogonal to
> the idea of going using struct page for filtering. Will single thread
> dumping not give a good idea about memory footprint?
>
On this benchmark, I didn't aim at seeing performance gain here, only
seeing memory footprint, so I used only one disk and one cpu to write
without any special configuration.
In cyclic mode, each child process refers to different part of
physical memory, and the bitmaps each process handles are also
different each other; in normal mode, each child process shares the
unique two bitmaps. This bench aims at evaluating amount of memory
footprint for the former case precisely.
>
> - What's the conclusion of below measurements and numbers.
>
These are expected results, I think. Please forcus on buffer size and
value of child's PRI. Each child process has two bitmaps and each
bitmap has buffer size length. So, for 1MB buffer size, PRI finally
amounts to 2.38 MB where 2.00 MB is the 2 bitmap part, and for other
buffer sizes, these also hold similarly, and there rest of bitmaps
part is all 0.37 or 0.38 MB.
During the measurement I was temporarily thinking virtual memory usage
profiler, like valgrind massif, might be enough, but now I think this
kind of benchmarking is needed becaues the issue we are now on is the
severe 2nd kernel environment's, requiring to figure out actual memory
usage precisly.
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
2012-08-30 0:55 ` HATAYAMA Daisuke
@ 2012-08-30 6:29 ` Atsushi Kumagai
0 siblings, 0 replies; 34+ messages in thread
From: Atsushi Kumagai @ 2012-08-30 6:29 UTC (permalink / raw)
To: d.hatayama; +Cc: kexec, vgoyal
Hello HATAYAMA-san,
On Thu, 30 Aug 2012 09:55:06 +0900 (JST)
HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
> Date: Wed, 29 Aug 2012 08:35:39 -0400
>
> > On Wed, Aug 29, 2012 at 11:50:31AM +0900, HATAYAMA Daisuke wrote:
> >> From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
> >> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
> >> Date: Wed, 15 Aug 2012 15:27:10 +0900
> >>
> >> > Hello HATAYAMA-san,
> >> >
> >> > On Tue, 14 Aug 2012 20:55:32 +0900 (JST)
> >> > HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> >> >
> >> >> From: Vivek Goyal <vgoyal@redhat.com>
> >> >> Subject: Re: [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption.
> >> >> Date: Fri, 10 Aug 2012 10:36:55 -0400
> >> >>
> >> >> > On Fri, Aug 10, 2012 at 05:39:38PM +0900, HATAYAMA Daisuke wrote:
> >> >> >
> >> >> > [..]
> >> >> >>
> >> >> >> I finished benchmarking filtering time and demonstrate the result.
> >> >> >> But I failed to collect amount of memory consumption by my mistake. If
> >> >> >> they are necessary, I'll again try to collect them. But we have 9 days
> >> >> >> vacation starting tommorow, so I'll do that after the vacation.
> >> >> >>
> >> >
> >> > Thank you for your help.
> >> > Could you continue to measure amount of memory consumption ?
> >> >
> >>
> >> Hello Kumagai-san, here is the benchmark result for the amount of
> >> actual memory consumption when there are multiple child processes
> >> using --split option.
Thank you for all your help, I could make sure that cyclic mode is effective.
> >
> > Hi,
> >
> > Thanks for testing results. I had few questions.
> >
> > - What's the objective of this testing? Are we just trying to figure out
> > memory footprint of makedumpfile in cyclic mode using struct page
> > filtering and compare with free list filtering?
> >
>
> Basically yes, and in addition to that, I wanted to see when using
> --split because then makedumpfile process forks and memory usage
> increases.
>
> > - Why are we testing using --split option. How does that help. In kdump
> > kernel we boot only 1 cpu. And if all the dump files are being saved
> > to same disk, it might not give lot of performance boost.
> >
> > Even if does give performance boost, this seems to be orthogonal to
> > the idea of going using struct page for filtering. Will single thread
> > dumping not give a good idea about memory footprint?
> >
>
> On this benchmark, I didn't aim at seeing performance gain here, only
> seeing memory footprint, so I used only one disk and one cpu to write
> without any special configuration.
>
> In cyclic mode, each child process refers to different part of
> physical memory, and the bitmaps each process handles are also
> different each other; in normal mode, each child process shares the
> unique two bitmaps. This bench aims at evaluating amount of memory
> footprint for the former case precisely.
>
> >
> > - What's the conclusion of below measurements and numbers.
> >
>
> These are expected results, I think. Please forcus on buffer size and
> value of child's PRI. Each child process has two bitmaps and each
> bitmap has buffer size length. So, for 1MB buffer size, PRI finally
> amounts to 2.38 MB where 2.00 MB is the 2 bitmap part, and for other
> buffer sizes, these also hold similarly, and there rest of bitmaps
> part is all 0.37 or 0.38 MB.
Yes, I think so. According to the results, we can estimate memory footprint
with the buffer size even when using --split option. It's what I expected.
BTW, I'll post the patchset as v1.5.0-rc soon.
Thanks
Atsushi Kumagai
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2012-08-30 7:37 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-29 2:13 [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Atsushi Kumagai
2012-06-29 2:16 ` [RFC PATCH v2 1/10] Add flag to enable cyclic processing Atsushi Kumagai
2012-06-29 2:17 ` [RFC PATCH v2 2/10] Prepare partial bitmap for " Atsushi Kumagai
2012-06-29 2:17 ` [RFC PATCH v2 3/10] Change the function related to excluding unnecessary pages Atsushi Kumagai
2012-06-29 2:18 ` [RFC PATCH v2 4/10] Add function to update target region Atsushi Kumagai
2012-06-29 2:19 ` [RFC PATCH v2 5/10] Add function to get num_dumpable for cyclic processing Atsushi Kumagai
2012-06-29 2:21 ` [RFC PATCH v2 6/10] Implement the main routine of cyclic processing for kdump-compressed format Atsushi Kumagai
2012-06-29 2:22 ` [RFC PATCH v2 7/10] Add function to get number of PT_LOAD for cyclic processing Atsushi Kumagai
2012-06-29 2:23 ` [RFC PATCH v2 8/10] Implement the main routine of cyclic processing for ELF format Atsushi Kumagai
2012-06-29 2:24 ` [RFC PATCH v2 9/10] Enabling --split option with cyclic processing Atsushi Kumagai
2012-06-29 2:25 ` [RFC PATCH v2 10/10] Change num_dumped value to global for debug messages Atsushi Kumagai
2012-07-02 12:39 ` [RFC PATCH v2 0/10] makedumpfile: cyclic processing to keep memory consumption Vivek Goyal
2012-07-04 5:54 ` Atsushi Kumagai
2012-07-04 8:52 ` HATAYAMA Daisuke
2012-07-11 5:23 ` HATAYAMA Daisuke
2012-07-13 0:36 ` Atsushi Kumagai
2012-07-13 5:18 ` HATAYAMA Daisuke
2012-07-13 8:10 ` Atsushi Kumagai
2012-07-18 0:57 ` HATAYAMA Daisuke
2012-08-06 20:47 ` Vivek Goyal
2012-08-07 7:31 ` HATAYAMA Daisuke
2012-08-10 8:39 ` HATAYAMA Daisuke
2012-08-10 14:36 ` Vivek Goyal
2012-08-14 11:55 ` HATAYAMA Daisuke
2012-08-15 6:27 ` Atsushi Kumagai
2012-08-15 13:31 ` Vivek Goyal
2012-08-20 0:12 ` HATAYAMA Daisuke
2012-08-29 2:50 ` HATAYAMA Daisuke
2012-08-29 12:35 ` Vivek Goyal
2012-08-30 0:55 ` HATAYAMA Daisuke
2012-08-30 6:29 ` Atsushi Kumagai
2012-08-08 5:14 ` Atsushi Kumagai
2012-08-08 13:25 ` Vivek Goyal
2012-08-09 6:44 ` Atsushi Kumagai
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.