From: "\"Zhou, Wenjian/周文剑\"" <zhouwj-fnst@cn.fujitsu.com>
To: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Cc: "kexec@lists.infradead.org" <kexec@lists.infradead.org>
Subject: Re: [PATCH v2 00/10] makedumpfile: parallel processing
Date: Mon, 6 Jul 2015 21:19:13 +0800 [thread overview]
Message-ID: <559A8051.5090905@cn.fujitsu.com> (raw)
In-Reply-To: <0910DD04CBD6DE4193FCF86B9C00BE9701DCAB4E@BPXM01GP.gisp.nec.co.jp>
[-- Attachment #1: Type: text/plain, Size: 3208 bytes --]
Hello Atsushi Kumagai,
I have tried a lot, and I think the big performance degradation only
occurs in special CPU.
I thought about two reasons, and I need your help to confirm which is
the real one.
The following tests will also be OK by using dumpfile instead of /proc/vmcore
Test 1: distinguish whether it is resulted by multi-threads.
apply patch: test1
command1: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1
command2: ./makedumpfile -c /proc/vmcore vmcore --num-threads 8
better to do some test in -l too.
command1: ./makedumpfile -l /proc/vmcore vmcore
command2: ./makedumpfile -l /proc/vmcore vmcore --num-threads 1
command3: ./makedumpfile -l /proc/vmcore vmcore --num-threads 8
Test 2: distinguish whether it is resulted by doing compress in thread
2.1:
apply patch: test2.1
command: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1
2.2:
apply patch: test2.2
command: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1
Thanks a lot.
BTW, could you show me the cpu name, zlib version and glibc version ?
--
Thanks
Zhou Wenjian
On 06/30/2015 05:06 PM, Atsushi Kumagai wrote:
>> On 06/26/2015 03:49 PM, Atsushi Kumagai wrote:
>>> I attached 5 processors to the VM and I confirmed that all threads
>>> consumed full cpu time by top(1) on the host:
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>> 17614 qemu 20 0 5792m 4.9g 5652 R 435.1 72.8 29:02.17 qemu-kvm
>>>
>>> So I think the performance must be improved...
>>
>> Since I can't get that result in all machines here, could you test it with the patch:time
>> and show me the output?
>> Using "./makedumpfile -c --num-threads 4 /proc/vmcore dumpfile" is OK.
>>
>> The attachment is the patch time.
>
> Here is the result:
>
> / # makedumpfile -c --num-threads 4 /proc/vmcore /mnt/dumpfile
> Copying data : [100.0 %] |
> Copying data : [100.0 %] \
>
> The dumpfile is saved to /mnt/dumpfile.
>
> makedumpfile Completed.
> lock time: 310s935500us
> write time: 3s970037us
> hit time: 6s316043us
> find time: 317s926654us
> loop_time: 37s321800us
> thread consume_time: 0s0us
> thread timea: 0s0us
> thread timeb: 0s0us
> read_time[0]: 8s637011us
> lock_current_time[0]: 0s284428us
> found_time[0]: 60s366795us
> lock_consumed_time[0]: 2s782596us
> compress_time[0]: 301s427073us
> read_time[1]: 8s435914us
> lock_current_time[1]: 0s271680us
> found_time[1]: 60s329026us
> lock_consumed_time[1]: 2s849061us
> compress_time[1]: 302s98620us
> read_time[2]: 8s380550us
> lock_current_time[2]: 0s270388us
> found_time[2]: 60s209376us
> lock_consumed_time[2]: 3s297574us
> compress_time[2]: 301s486768us
> read_time[3]: 8s550662us
> lock_current_time[3]: 0s278997us
> found_time[3]: 60s476702us
> lock_consumed_time[3]: 3s49184us
> compress_time[3]: 301s718390us
> count1: 172
> count2: 70921401
> count3: 0
> count4: 0
> count5: 0
> count6: 0
> count7: 0
> exec time: 380s125494us
>
>
> BTW, I fixed a small mistake before testing like:
>
> - write_time = (write2.tv_sec - write1.tv_sec) * 1000000 + (write2.tv_usec - write1.tv_usec);
> + write_time += (write2.tv_sec - write1.tv_sec) * 1000000 + (write2.tv_usec - write1.tv_usec);
>
[-- Attachment #2: test1 --]
[-- Type: text/plain, Size: 7560 bytes --]
--- makedumpfile.ori 2015-07-06 11:56:11.300000000 -0400
+++ makedumpfile.c 2015-07-06 11:56:28.127000000 -0400
@@ -26,6 +26,42 @@
#include <limits.h>
#include <assert.h>
+unsigned long write_kdump_pages_parallel_cyclic_time = 0;
+struct timeval write_kdump_pages_parallel_cyclic1, write_kdump_pages_parallel_cyclic2;
+unsigned long loop_time = 0;
+unsigned long consume_time = 0;
+struct timeval loop1, loop2;
+unsigned long check_time = 0;
+struct timeval check1, check2;
+struct timeval write1, write2;
+unsigned long write_time;
+struct timeval lock1, lock2;
+unsigned long lock_time;
+struct timeval hit1, hit2;
+unsigned long hit_time;
+struct timeval find1, find2;
+unsigned long find_time;
+struct timeval timea1, timea2;
+unsigned long timea;
+struct timeval timeb1, timeb2;
+unsigned long filter_time[20];
+struct timeval filter1[20], filter2[20];
+unsigned long cp_time[20];
+struct timeval cp1[20], cp2[20];
+unsigned long compress_time[20];
+struct timeval compress_time1[20], compress_time2[20];
+unsigned long timeb;
+unsigned long count1 = 0;
+unsigned long count2 = 0;
+unsigned long count3 = 0;
+unsigned long count4 = 0;
+unsigned long count5 = 0;
+unsigned long count6 = 0;
+unsigned long count7 = 0;
+unsigned long count8 = 0;
+unsigned long count9 = 0;
+
+
struct symbol_table symbol_table;
struct size_table size_table;
struct offset_table offset_table;
@@ -6919,6 +6955,17 @@
return TRUE;
}
+int compress2015(unsigned char *buf_out, unsigned long *size_out, unsigned char *buf, int page_size, int a, int thread_num)
+{
+int ret;
+gettimeofday(&compress_time1[thread_num], NULL);
+ret = compress2(buf_out, size_out, buf, info->page_size, Z_BEST_SPEED);
+gettimeofday(&compress_time2[thread_num], NULL);
+compress_time[thread_num] += (compress_time2[thread_num].tv_sec - compress_time1[thread_num].tv_sec) * 1000000 + (compress_time2[thread_num].tv_usec - compress_time1[thread_num].tv_usec);
+
+return ret;
+}
+
void *
kdump_thread_function_cyclic(void *arg) {
void *retval = PTHREAD_FAIL;
@@ -6944,6 +6991,7 @@
unsigned long len_buf_out_snappy =
snappy_max_compressed_length(info->page_size);
#endif
+int thread_num = kdump_thread_args->thread_num;
buf = BUF_PARALLEL(kdump_thread_args->thread_num);
buf_out = BUF_OUT_PARALLEL(kdump_thread_args->thread_num);
@@ -7030,9 +7078,13 @@
mmap_cache))
goto fail;
+
+gettimeofday(&filter1[thread_num], NULL);
filter_data_buffer_parallel(buf, pfn_to_paddr(pfn),
info->page_size,
&info->filter_mutex);
+gettimeofday(&filter2[thread_num], NULL);
+filter_time[thread_num] += (filter2[thread_num].tv_sec - filter1[thread_num].tv_sec) * 1000000 + (filter2[thread_num].tv_usec - filter1[thread_num].tv_usec);
if ((info->dump_level & DL_EXCLUDE_ZERO)
&& is_zero_page(buf, info->page_size)) {
@@ -7048,14 +7100,17 @@
size_out = kdump_thread_args->len_buf_out;
if ((info->flag_compress & DUMP_DH_COMPRESSED_ZLIB)
&& ((size_out = kdump_thread_args->len_buf_out),
- compress2(buf_out, &size_out, buf,
+ compress2015(buf_out, &size_out, buf,
info->page_size,
- Z_BEST_SPEED) == Z_OK)
+ Z_BEST_SPEED, thread_num) == Z_OK)
&& (size_out < info->page_size)) {
page_data_buf[index].flags =
DUMP_DH_COMPRESSED_ZLIB;
page_data_buf[index].size = size_out;
+gettimeofday(&cp1[thread_num], NULL);
memcpy(page_data_buf[index].buf, buf_out, size_out);
+gettimeofday(&cp2[thread_num], NULL);
+cp_time[thread_num] += (cp2[thread_num].tv_sec - cp1[thread_num].tv_sec) * 1000000 + (cp2[thread_num].tv_usec - cp1[thread_num].tv_usec);
#ifdef USELZO
} else if (info->flag_lzo_support
&& (info->flag_compress
@@ -7220,6 +7275,7 @@
gettimeofday(&last, NULL);
while (consuming_pfn < end_pfn) {
+gettimeofday(&loop1, NULL);
index = consuming_pfn % page_data_num;
gettimeofday(&new, NULL);
@@ -7232,17 +7288,26 @@
* check pfn first without mutex locked to reduce the time
* trying to lock the mutex
*/
- if (page_data_buf[index].pfn != consuming_pfn)
+ if (page_data_buf[index].pfn != consuming_pfn){
+gettimeofday(&loop2, NULL);
+loop_time += (loop2.tv_sec - loop1.tv_sec) * 1000000 + (loop2.tv_usec - loop1.tv_usec);
continue;
+}
+gettimeofday(&find1, NULL);
+gettimeofday(&lock1, NULL);
pthread_mutex_lock(&page_data_buf[index].mutex);
+gettimeofday(&lock2, NULL);
+lock_time += (lock2.tv_sec - lock1.tv_sec) * 1000000 + (lock2.tv_usec - lock1.tv_usec);
+
/* check whether the found one is ready to be consumed */
if (page_data_buf[index].pfn != consuming_pfn ||
page_data_buf[index].ready != 1) {
goto unlock;
}
+gettimeofday(&hit1, NULL);
if ((num_dumped % per) == 0)
print_progress(PROGRESS_COPY, num_dumped, info->num_dumpable);
@@ -7256,6 +7321,7 @@
num_dumped++;
+gettimeofday(&write1, NULL);
if (page_data_buf[index].zero == TRUE) {
if (!write_cache(cd_header, pd_zero, sizeof(page_desc_t)))
goto out;
@@ -7278,8 +7344,17 @@
goto out;
}
+
+gettimeofday(&write2, NULL);
+write_time += (write2.tv_sec - write1.tv_sec) * 1000000 + (write2.tv_usec - write1.tv_usec);
+
unlock:
pthread_mutex_unlock(&page_data_buf[index].mutex);
+gettimeofday(&hit2, NULL);
+gettimeofday(&find2, NULL);
+hit_time += (hit2.tv_sec - hit1.tv_sec) * 1000000 + (hit2.tv_usec - hit1.tv_usec);
+find_time += (find2.tv_sec - find1.tv_sec) * 1000000 + (find2.tv_usec - find1.tv_usec);
+
}
ret = TRUE;
@@ -7438,8 +7513,8 @@
size_out = len_buf_out;
if ((info->flag_compress & DUMP_DH_COMPRESSED_ZLIB)
&& ((size_out = len_buf_out),
- compress2(buf_out, &size_out, buf, info->page_size,
- Z_BEST_SPEED) == Z_OK)
+ compress2015(buf_out, &size_out, buf, info->page_size,
+ Z_BEST_SPEED, 0) == Z_OK)
&& (size_out < info->page_size)) {
pd.flags = DUMP_DH_COMPRESSED_ZLIB;
pd.size = size_out;
@@ -10314,6 +10389,11 @@
int
main(int argc, char *argv[])
{
+ unsigned long total_time = 0;
+ struct timeval start_time, end_time;
+ gettimeofday(&start_time, NULL);
+
+
int i, opt, flag_debug = FALSE;
if ((info = calloc(1, sizeof(struct DumpInfo))) == NULL) {
@@ -10328,7 +10408,6 @@
goto out;
}
initialize_tables();
-
/*
* By default, makedumpfile assumes that multi-cycle processing is
* necessary to work in constant memory space.
@@ -10642,5 +10721,21 @@
}
free_elf_info();
+ gettimeofday(&end_time, NULL);
+ total_time = (end_time.tv_sec - start_time.tv_sec) * 1000000 + (end_time.tv_usec - start_time.tv_usec);
+ MSG("lock time: %lds%ldus\n", lock_time / 1000000, lock_time % 1000000);
+ MSG("write time: %lds%ldus\n", write_time / 1000000, write_time % 1000000);
+ MSG("hit time: %lds%ldus\n", hit_time / 1000000, hit_time % 1000000);
+ MSG("find time: %lds%ldus\n", find_time / 1000000, find_time % 1000000);
+ MSG("loop_time: %lds%ldus\n", loop_time / 1000000, loop_time % 1000000);
+ MSG("thread consume_time: %lds%ldus\n", consume_time / 1000000, consume_time % 1000000);
+for (i = 0; i < info->num_threads; i++){
+ MSG("filter_time[%d]: %lds%ldus\n", i, filter_time[i] / 1000000, filter_time[i] % 1000000);
+ MSG("cp_time[%d]: %lds%ldus\n", i, cp_time[i] / 1000000, cp_time[i] % 1000000);
+ MSG("compress_time[%d]: %lds%ldus\n", i, compress_time[i] / 1000000, compress_time[i] % 1000000);
+}
+ MSG("exec time: %lds%ldus\n", total_time / 1000000, total_time % 1000000);
+
+
return retcd;
}
[-- Attachment #3: test2.1 --]
[-- Type: application/x-troff-man, Size: 8159 bytes --]
[-- Attachment #4: test2.2 --]
[-- Type: application/x-troff-man, Size: 8501 bytes --]
[-- Attachment #5: Type: text/plain, Size: 143 bytes --]
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2015-07-06 13:23 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-19 8:56 [PATCH v2 00/10] makedumpfile: parallel processing Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 01/10] Add readpage_kdump_compressed_parallel Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 02/10] Add mappage_elf_parallel Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 03/10] Add readpage_elf_parallel Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 04/10] Add read_pfn_parallel Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 05/10] Add function to initial bitmap for parallel use Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 06/10] Add filter_data_buffer_parallel Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 07/10] Add write_kdump_pages_parallel to allow parallel process Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 08/10] Initial and free data used for " Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 09/10] Make makedumpfile available to read and compress pages parallelly Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 10/10] Add usage and manual about multiple threads process Zhou Wenjian
2015-06-25 2:25 ` [PATCH v2 00/10] makedumpfile: parallel processing "Zhou, Wenjian/周文剑"
2015-06-26 7:07 ` Atsushi Kumagai
2015-06-26 7:27 ` "Zhou, Wenjian/周文剑"
2015-06-26 7:49 ` Atsushi Kumagai
2015-06-29 6:19 ` "Zhou, Wenjian/周文剑"
2015-06-30 9:06 ` Atsushi Kumagai
2015-07-06 13:19 ` "Zhou, Wenjian/周文剑" [this message]
2015-07-08 8:10 ` Atsushi Kumagai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=559A8051.5090905@cn.fujitsu.com \
--to=zhouwj-fnst@cn.fujitsu.com \
--cc=ats-kumagai@wm.jp.nec.com \
--cc=kexec@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.