From: "\"Zhou, Wenjian/周文剑\"" <zhouwj-fnst@cn.fujitsu.com>
To: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Cc: "kexec@lists.infradead.org" <kexec@lists.infradead.org>
Subject: Re: [PATCH RFC 00/11] makedumpfile: parallel processing
Date: Mon, 14 Dec 2015 16:59:10 +0800 [thread overview]
Message-ID: <566E84DE.2020804@cn.fujitsu.com> (raw)
In-Reply-To: <0910DD04CBD6DE4193FCF86B9C00BE9701E0D0DA@BPXM01GP.gisp.nec.co.jp>
On 12/14/2015 04:26 PM, Atsushi Kumagai wrote:
>>>> Think about this, in a huge memory, most of the page will be filtered, and
>>>> we have 5 buffers.
>>>>
>>>> page1 page2 page3 page4 page5 page6 page7 .....
>>>> [buffer1] [2] [3] [4] [5]
>>>> unfiltered filtered filtered filtered filtered unfiltered filtered
>>>>
>>>> Since filtered page will take a buffer, when compressing page1,
>>>> page6 can't be compressed at the same time.
>>>> That why it will prevent parallel compression.
>>>
>>> Thanks for your explanation, I understand.
>>> This is just an issue of the current implementation, there is no
>>> reason to stand this restriction.
>>>
>>>>> Further, according to Chao's benchmark, there is a big performance
>>>>> degradation even if the number of thread is 1. (58s vs 240s)
>>>>> The current implementation seems to have some problems, we should
>>>>> solve them.
>>>>>
>>>>
>>>> If "-d 31" is specified, on the one hand we can't save time by compressing
>>>> parallel, on the other hand we will introduce some extra work by adding
>>>> "--num-threads". So it is obvious that it will have a performance degradation.
>>>
>>> Sure, there must be some overhead due to "some extra work"(e.g. exclusive lock),
>>> but "--num-threads=1 is 4 times slower than --num-threads=0" still sounds
>>> too slow, the degradation is too big to be called "some extra work".
>>>
>>> Both --num-threads=0 and --num-threads=1 are serial processing,
>>> the above "buffer fairness issue" will not be related to this degradation.
>>> What do you think what make this degradation ?
>>>
>>
>> I can't get such result at this moment, so I can't do some further investigation
>> right now. I guess it may be caused by the underlying implementation of pthread.
>> I reviewed the test result of the patch v2 and found in different machines,
>> the results are quite different.
>
> Unluckily, I also can't reproduce such big degradation.
> According to the Chao's verification, this issue seems different form
> the "too many page fault issue" that we solved.
> I have no ideas, but at least I want to confirm whether this issue
> is avoidable or not.
>
>> It seems that I can get almost the same result of Chao from "PRIMEQUEST 1800E".
>>
>> ###################################
>> - System: PRIMERGY RX300 S6
>> - CPU: Intel(R) Xeon(R) CPU x5660
>> - memory: 16GB
>> ###################################
>> ************ makedumpfile -d 7 ******************
>> core-data 0 256
>> threads-num
>> -l
>> 0 10 144
>> 4 5 110
>> 8 5 111
>> 12 6 111
>>
>> ************ makedumpfile -d 31 ******************
>> core-data 0 256
>> threads-num
>> -l
>> 0 0 0
>> 4 2 2
>> 8 2 3
>> 12 2 3
>>
>> ###################################
>> - System: PRIMEQUEST 1800E
>> - CPU: Intel(R) Xeon(R) CPU E7540
>> - memory: 32GB
>> ###################################
>> ************ makedumpfile -d 7 ******************
>> core-data 0 256
>> threads-num
>> -l
>> 0 34 270
>> 4 63 154
>> 8 64 131
>> 12 65 159
>>
>> ************ makedumpfile -d 31 ******************
>> core-data 0 256
>> threads-num
>> -l
>> 0 2 1
>> 4 48 48
>> 8 48 49
>> 12 49 50
>>
>>>> I'm not so sure if it is a problem that the performance degradation is so big.
>>>> But I think if in other cases, it works as expected, this won't be a problem(
>>>> or a problem needs to be fixed), for the performance degradation existing
>>>> in theory.
>>>>
>>>> Or the current implementation should be replaced by a new arithmetic.
>>>> For example:
>>>> We can add an array to record whether the page is filtered or not.
>>>> And only the unfiltered page will take the buffer.
>>>
>>> We should discuss how to implement new mechanism, I'll mention this later.
>>>
>>>> But I'm not sure if it is worth.
>>>> For "-l -d 31" is fast enough, the new arithmetic also can't do much help.
>>>
>>> Basically the faster, the better. There is no obvious target time.
>>> If there is room for improvement, we should do it.
>>>
>>
>> Maybe we can improve the performance of "-c -d 31" in some case.
>
> Yes, the buffer is used for -c, -l and -p, not only for -l.
> It would be useful to improve that.
>
>> BTW, we can easily get the theoretical performance by using the "--split".
>
> Are you sure ? You persuaded me in the thread below:
>
> http://lists.infradead.org/pipermail/kexec/2015-June/013881.html
>
> --num-threads is orthogonal to --split, it's better to use the both
> option since they try to solve different bottlenecks.
> That's why I decided to merge your multi thread feature.
>
> However, what you said sounds --split is a superset of --num-threads.
> You don't need the multi thread feature ?
>
I just mean the performance.
There is no doubt that we will use multi-threads in --split in the future.
But as we all known, threads and processes have some common characters.
And in makedumpfile, if we use "--split core1 core2 core3 core4" and
"--num-threads 4" separately, the spent time should not be quite different.
Since the logic of "--split" is more simple, if we can't improve the performance
of "-l -d 31" by "--split", we also don't have much chance to do it by "--num-threads".
I just mean that.
It is of course that --split is not a super set of --num-threads.
--
Thanks
Zhou
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2015-12-14 9:01 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-05 7:56 [PATCH RFC 00/11] makedumpfile: parallel processing Zhou Wenjian
2015-06-05 7:56 ` [PATCH RFC 01/11] Add readpage_kdump_compressed_parallel Zhou Wenjian
2015-06-05 7:56 ` [PATCH RFC 02/11] Add mappage_elf_parallel Zhou Wenjian
2015-06-05 7:56 ` [PATCH RFC 03/11] Add readpage_elf_parallel Zhou Wenjian
2015-06-05 7:56 ` [PATCH RFC 04/11] Add read_pfn_parallel Zhou Wenjian
2015-06-05 7:56 ` [PATCH RFC 05/11] Add function to initial bitmap for parallel use Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 06/11] Add filter_data_buffer_parallel Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 07/11] Add write_kdump_pages_parallel to allow parallel process Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 08/11] Add write_kdump_pages_parallel_cyclic to allow parallel process in cyclic_mode Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 09/11] Initial and free data used for parallel process Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 10/11] Make makedumpfile available to read and compress pages parallelly Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 11/11] Add usage and manual about multiple threads process Zhou Wenjian
2015-06-08 3:55 ` [PATCH RFC 00/11] makedumpfile: parallel processing "Zhou, Wenjian/周文剑"
2015-12-01 8:39 ` Chao Fan
2015-12-02 5:29 ` "Zhou, Wenjian/周文剑"
2015-12-02 7:24 ` Dave Young
2015-12-02 7:38 ` "Zhou, Wenjian/周文剑"
2015-12-04 2:30 ` Atsushi Kumagai
2015-12-04 3:33 ` "Zhou, Wenjian/周文剑"
2015-12-04 8:56 ` Chao Fan
2015-12-07 1:09 ` "Zhou, Wenjian/周文剑"
2015-12-10 8:14 ` Atsushi Kumagai
2015-12-10 9:36 ` "Zhou, Wenjian/周文剑"
2015-12-10 9:58 ` Chao Fan
2015-12-10 10:32 ` "Zhou, Wenjian/周文剑"
2015-12-10 10:54 ` Chao Fan
2015-12-22 8:32 ` HATAYAMA Daisuke
2015-12-24 2:20 ` Chao Fan
2015-12-24 3:22 ` HATAYAMA Daisuke
2015-12-24 3:31 ` Chao Fan
2015-12-24 3:50 ` HATAYAMA Daisuke
2015-12-24 6:02 ` Chao Fan
2015-12-24 7:22 ` HATAYAMA Daisuke
2015-12-24 8:20 ` Atsushi Kumagai
2015-12-24 9:04 ` Chao Fan
2015-12-14 8:26 ` Atsushi Kumagai
2015-12-14 8:59 ` "Zhou, Wenjian/周文剑" [this message]
2015-06-10 6:06 ` Atsushi Kumagai
2015-06-11 3:47 ` "Zhou, Wenjian/周文剑"
2015-06-15 1:59 ` qiaonuohan
2015-06-15 5:57 ` Atsushi Kumagai
2015-06-15 6:06 ` qiaonuohan
2015-06-15 6:07 ` qiaonuohan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=566E84DE.2020804@cn.fujitsu.com \
--to=zhouwj-fnst@cn.fujitsu.com \
--cc=ats-kumagai@wm.jp.nec.com \
--cc=kexec@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox