From: "\"Zhou, Wenjian/周文剑\"" <zhouwj-fnst@cn.fujitsu.com>
To: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Cc: "kexec@lists.infradead.org" <kexec@lists.infradead.org>
Subject: Re: [PATCH v2 00/10] makedumpfile: parallel processing
Date: Mon, 6 Jul 2015 21:19:13 +0800 [thread overview]
Message-ID: <559A8051.5090905@cn.fujitsu.com> (raw)
In-Reply-To: <0910DD04CBD6DE4193FCF86B9C00BE9701DCAB4E@BPXM01GP.gisp.nec.co.jp>
[-- Attachment #1: Type: text/plain, Size: 3208 bytes --]
Hello Atsushi Kumagai,
I have tried a lot, and I think the big performance degradation only
occurs in special CPU.
I thought about two reasons, and I need your help to confirm which is
the real one.
The following tests will also be OK by using dumpfile instead of /proc/vmcore
Test 1: distinguish whether it is resulted by multi-threads.
apply patch: test1
command1: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1
command2: ./makedumpfile -c /proc/vmcore vmcore --num-threads 8
better to do some test in -l too.
command1: ./makedumpfile -l /proc/vmcore vmcore
command2: ./makedumpfile -l /proc/vmcore vmcore --num-threads 1
command3: ./makedumpfile -l /proc/vmcore vmcore --num-threads 8
Test 2: distinguish whether it is resulted by doing compress in thread
2.1:
apply patch: test2.1
command: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1
2.2:
apply patch: test2.2
command: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1
Thanks a lot.
BTW, could you show me the cpu name, zlib version and glibc version ?
--
Thanks
Zhou Wenjian
On 06/30/2015 05:06 PM, Atsushi Kumagai wrote:
>> On 06/26/2015 03:49 PM, Atsushi Kumagai wrote:
>>> I attached 5 processors to the VM and I confirmed that all threads
>>> consumed full cpu time by top(1) on the host:
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>> 17614 qemu 20 0 5792m 4.9g 5652 R 435.1 72.8 29:02.17 qemu-kvm
>>>
>>> So I think the performance must be improved...
>>
>> Since I can't get that result in all machines here, could you test it with the patch:time
>> and show me the output?
>> Using "./makedumpfile -c --num-threads 4 /proc/vmcore dumpfile" is OK.
>>
>> The attachment is the patch time.
>
> Here is the result:
>
> / # makedumpfile -c --num-threads 4 /proc/vmcore /mnt/dumpfile
> Copying data : [100.0 %] |
> Copying data : [100.0 %] \
>
> The dumpfile is saved to /mnt/dumpfile.
>
> makedumpfile Completed.
> lock time: 310s935500us
> write time: 3s970037us
> hit time: 6s316043us
> find time: 317s926654us
> loop_time: 37s321800us
> thread consume_time: 0s0us
> thread timea: 0s0us
> thread timeb: 0s0us
> read_time[0]: 8s637011us
> lock_current_time[0]: 0s284428us
> found_time[0]: 60s366795us
> lock_consumed_time[0]: 2s782596us
> compress_time[0]: 301s427073us
> read_time[1]: 8s435914us
> lock_current_time[1]: 0s271680us
> found_time[1]: 60s329026us
> lock_consumed_time[1]: 2s849061us
> compress_time[1]: 302s98620us
> read_time[2]: 8s380550us
> lock_current_time[2]: 0s270388us
> found_time[2]: 60s209376us
> lock_consumed_time[2]: 3s297574us
> compress_time[2]: 301s486768us
> read_time[3]: 8s550662us
> lock_current_time[3]: 0s278997us
> found_time[3]: 60s476702us
> lock_consumed_time[3]: 3s49184us
> compress_time[3]: 301s718390us
> count1: 172
> count2: 70921401
> count3: 0
> count4: 0
> count5: 0
> count6: 0
> count7: 0
> exec time: 380s125494us
>
>
> BTW, I fixed a small mistake before testing like:
>
> - write_time = (write2.tv_sec - write1.tv_sec) * 1000000 + (write2.tv_usec - write1.tv_usec);
> + write_time += (write2.tv_sec - write1.tv_sec) * 1000000 + (write2.tv_usec - write1.tv_usec);
>
[-- Attachment #2: test1 --]
[-- Type: text/plain, Size: 7560 bytes --]
--- makedumpfile.ori 2015-07-06 11:56:11.300000000 -0400
+++ makedumpfile.c 2015-07-06 11:56:28.127000000 -0400
@@ -26,6 +26,42 @@
#include <limits.h>
#include <assert.h>
+unsigned long write_kdump_pages_parallel_cyclic_time = 0;
+struct timeval write_kdump_pages_parallel_cyclic1, write_kdump_pages_parallel_cyclic2;
+unsigned long loop_time = 0;
+unsigned long consume_time = 0;
+struct timeval loop1, loop2;
+unsigned long check_time = 0;
+struct timeval check1, check2;
+struct timeval write1, write2;
+unsigned long write_time;
+struct timeval lock1, lock2;
+unsigned long lock_time;
+struct timeval hit1, hit2;
+unsigned long hit_time;
+struct timeval find1, find2;
+unsigned long find_time;
+struct timeval timea1, timea2;
+unsigned long timea;
+struct timeval timeb1, timeb2;
+unsigned long filter_time[20];
+struct timeval filter1[20], filter2[20];
+unsigned long cp_time[20];
+struct timeval cp1[20], cp2[20];
+unsigned long compress_time[20];
+struct timeval compress_time1[20], compress_time2[20];
+unsigned long timeb;
+unsigned long count1 = 0;
+unsigned long count2 = 0;
+unsigned long count3 = 0;
+unsigned long count4 = 0;
+unsigned long count5 = 0;
+unsigned long count6 = 0;
+unsigned long count7 = 0;
+unsigned long count8 = 0;
+unsigned long count9 = 0;
+
+
struct symbol_table symbol_table;
struct size_table size_table;
struct offset_table offset_table;
@@ -6919,6 +6955,17 @@
return TRUE;
}
+int compress2015(unsigned char *buf_out, unsigned long *size_out, unsigned char *buf, int page_size, int a, int thread_num)
+{
+int ret;
+gettimeofday(&compress_time1[thread_num], NULL);
+ret = compress2(buf_out, size_out, buf, info->page_size, Z_BEST_SPEED);
+gettimeofday(&compress_time2[thread_num], NULL);
+compress_time[thread_num] += (compress_time2[thread_num].tv_sec - compress_time1[thread_num].tv_sec) * 1000000 + (compress_time2[thread_num].tv_usec - compress_time1[thread_num].tv_usec);
+
+return ret;
+}
+
void *
kdump_thread_function_cyclic(void *arg) {
void *retval = PTHREAD_FAIL;
@@ -6944,6 +6991,7 @@
unsigned long len_buf_out_snappy =
snappy_max_compressed_length(info->page_size);
#endif
+int thread_num = kdump_thread_args->thread_num;
buf = BUF_PARALLEL(kdump_thread_args->thread_num);
buf_out = BUF_OUT_PARALLEL(kdump_thread_args->thread_num);
@@ -7030,9 +7078,13 @@
mmap_cache))
goto fail;
+
+gettimeofday(&filter1[thread_num], NULL);
filter_data_buffer_parallel(buf, pfn_to_paddr(pfn),
info->page_size,
&info->filter_mutex);
+gettimeofday(&filter2[thread_num], NULL);
+filter_time[thread_num] += (filter2[thread_num].tv_sec - filter1[thread_num].tv_sec) * 1000000 + (filter2[thread_num].tv_usec - filter1[thread_num].tv_usec);
if ((info->dump_level & DL_EXCLUDE_ZERO)
&& is_zero_page(buf, info->page_size)) {
@@ -7048,14 +7100,17 @@
size_out = kdump_thread_args->len_buf_out;
if ((info->flag_compress & DUMP_DH_COMPRESSED_ZLIB)
&& ((size_out = kdump_thread_args->len_buf_out),
- compress2(buf_out, &size_out, buf,
+ compress2015(buf_out, &size_out, buf,
info->page_size,
- Z_BEST_SPEED) == Z_OK)
+ Z_BEST_SPEED, thread_num) == Z_OK)
&& (size_out < info->page_size)) {
page_data_buf[index].flags =
DUMP_DH_COMPRESSED_ZLIB;
page_data_buf[index].size = size_out;
+gettimeofday(&cp1[thread_num], NULL);
memcpy(page_data_buf[index].buf, buf_out, size_out);
+gettimeofday(&cp2[thread_num], NULL);
+cp_time[thread_num] += (cp2[thread_num].tv_sec - cp1[thread_num].tv_sec) * 1000000 + (cp2[thread_num].tv_usec - cp1[thread_num].tv_usec);
#ifdef USELZO
} else if (info->flag_lzo_support
&& (info->flag_compress
@@ -7220,6 +7275,7 @@
gettimeofday(&last, NULL);
while (consuming_pfn < end_pfn) {
+gettimeofday(&loop1, NULL);
index = consuming_pfn % page_data_num;
gettimeofday(&new, NULL);
@@ -7232,17 +7288,26 @@
* check pfn first without mutex locked to reduce the time
* trying to lock the mutex
*/
- if (page_data_buf[index].pfn != consuming_pfn)
+ if (page_data_buf[index].pfn != consuming_pfn){
+gettimeofday(&loop2, NULL);
+loop_time += (loop2.tv_sec - loop1.tv_sec) * 1000000 + (loop2.tv_usec - loop1.tv_usec);
continue;
+}
+gettimeofday(&find1, NULL);
+gettimeofday(&lock1, NULL);
pthread_mutex_lock(&page_data_buf[index].mutex);
+gettimeofday(&lock2, NULL);
+lock_time += (lock2.tv_sec - lock1.tv_sec) * 1000000 + (lock2.tv_usec - lock1.tv_usec);
+
/* check whether the found one is ready to be consumed */
if (page_data_buf[index].pfn != consuming_pfn ||
page_data_buf[index].ready != 1) {
goto unlock;
}
+gettimeofday(&hit1, NULL);
if ((num_dumped % per) == 0)
print_progress(PROGRESS_COPY, num_dumped, info->num_dumpable);
@@ -7256,6 +7321,7 @@
num_dumped++;
+gettimeofday(&write1, NULL);
if (page_data_buf[index].zero == TRUE) {
if (!write_cache(cd_header, pd_zero, sizeof(page_desc_t)))
goto out;
@@ -7278,8 +7344,17 @@
goto out;
}
+
+gettimeofday(&write2, NULL);
+write_time += (write2.tv_sec - write1.tv_sec) * 1000000 + (write2.tv_usec - write1.tv_usec);
+
unlock:
pthread_mutex_unlock(&page_data_buf[index].mutex);
+gettimeofday(&hit2, NULL);
+gettimeofday(&find2, NULL);
+hit_time += (hit2.tv_sec - hit1.tv_sec) * 1000000 + (hit2.tv_usec - hit1.tv_usec);
+find_time += (find2.tv_sec - find1.tv_sec) * 1000000 + (find2.tv_usec - find1.tv_usec);
+
}
ret = TRUE;
@@ -7438,8 +7513,8 @@
size_out = len_buf_out;
if ((info->flag_compress & DUMP_DH_COMPRESSED_ZLIB)
&& ((size_out = len_buf_out),
- compress2(buf_out, &size_out, buf, info->page_size,
- Z_BEST_SPEED) == Z_OK)
+ compress2015(buf_out, &size_out, buf, info->page_size,
+ Z_BEST_SPEED, 0) == Z_OK)
&& (size_out < info->page_size)) {
pd.flags = DUMP_DH_COMPRESSED_ZLIB;
pd.size = size_out;
@@ -10314,6 +10389,11 @@
int
main(int argc, char *argv[])
{
+ unsigned long total_time = 0;
+ struct timeval start_time, end_time;
+ gettimeofday(&start_time, NULL);
+
+
int i, opt, flag_debug = FALSE;
if ((info = calloc(1, sizeof(struct DumpInfo))) == NULL) {
@@ -10328,7 +10408,6 @@
goto out;
}
initialize_tables();
-
/*
* By default, makedumpfile assumes that multi-cycle processing is
* necessary to work in constant memory space.
@@ -10642,5 +10721,21 @@
}
free_elf_info();
+ gettimeofday(&end_time, NULL);
+ total_time = (end_time.tv_sec - start_time.tv_sec) * 1000000 + (end_time.tv_usec - start_time.tv_usec);
+ MSG("lock time: %lds%ldus\n", lock_time / 1000000, lock_time % 1000000);
+ MSG("write time: %lds%ldus\n", write_time / 1000000, write_time % 1000000);
+ MSG("hit time: %lds%ldus\n", hit_time / 1000000, hit_time % 1000000);
+ MSG("find time: %lds%ldus\n", find_time / 1000000, find_time % 1000000);
+ MSG("loop_time: %lds%ldus\n", loop_time / 1000000, loop_time % 1000000);
+ MSG("thread consume_time: %lds%ldus\n", consume_time / 1000000, consume_time % 1000000);
+for (i = 0; i < info->num_threads; i++){
+ MSG("filter_time[%d]: %lds%ldus\n", i, filter_time[i] / 1000000, filter_time[i] % 1000000);
+ MSG("cp_time[%d]: %lds%ldus\n", i, cp_time[i] / 1000000, cp_time[i] % 1000000);
+ MSG("compress_time[%d]: %lds%ldus\n", i, compress_time[i] / 1000000, compress_time[i] % 1000000);
+}
+ MSG("exec time: %lds%ldus\n", total_time / 1000000, total_time % 1000000);
+
+
return retcd;
}
[-- Attachment #3: test2.1 --]
[-- Type: application/x-troff-man, Size: 8159 bytes --]
[-- Attachment #4: test2.2 --]
[-- Type: application/x-troff-man, Size: 8501 bytes --]
[-- Attachment #5: Type: text/plain, Size: 143 bytes --]
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2015-07-06 13:23 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-19 8:56 [PATCH v2 00/10] makedumpfile: parallel processing Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 01/10] Add readpage_kdump_compressed_parallel Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 02/10] Add mappage_elf_parallel Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 03/10] Add readpage_elf_parallel Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 04/10] Add read_pfn_parallel Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 05/10] Add function to initial bitmap for parallel use Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 06/10] Add filter_data_buffer_parallel Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 07/10] Add write_kdump_pages_parallel to allow parallel process Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 08/10] Initial and free data used for " Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 09/10] Make makedumpfile available to read and compress pages parallelly Zhou Wenjian
2015-06-19 8:56 ` [PATCH v2 10/10] Add usage and manual about multiple threads process Zhou Wenjian
2015-06-25 2:25 ` [PATCH v2 00/10] makedumpfile: parallel processing "Zhou, Wenjian/周文剑"
2015-06-26 7:07 ` Atsushi Kumagai
2015-06-26 7:27 ` "Zhou, Wenjian/周文剑"
2015-06-26 7:49 ` Atsushi Kumagai
2015-06-29 6:19 ` "Zhou, Wenjian/周文剑"
2015-06-30 9:06 ` Atsushi Kumagai
2015-07-06 13:19 ` "Zhou, Wenjian/周文剑" [this message]
2015-07-08 8:10 ` Atsushi Kumagai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=559A8051.5090905@cn.fujitsu.com \
--to=zhouwj-fnst@cn.fujitsu.com \
--cc=ats-kumagai@wm.jp.nec.com \
--cc=kexec@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox