From: Chao Fan <cfan@redhat.com>
To: "Wenjian Zhou/周文剑" <zhouwj-fnst@cn.fujitsu.com>
Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>, kexec@lists.infradead.org
Subject: Re: [PATCH RFC 00/11] makedumpfile: parallel processing
Date: Thu, 10 Dec 2015 04:58:41 -0500 (EST) [thread overview]
Message-ID: <595559040.32425969.1449741521365.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <566947AF.5010800@cn.fujitsu.com>
----- Original Message -----
> From: "Wenjian Zhou/周文剑" <zhouwj-fnst@cn.fujitsu.com>
> To: "Atsushi Kumagai" <ats-kumagai@wm.jp.nec.com>
> Cc: kexec@lists.infradead.org
> Sent: Thursday, December 10, 2015 5:36:47 PM
> Subject: Re: [PATCH RFC 00/11] makedumpfile: parallel processing
>
> On 12/10/2015 04:14 PM, Atsushi Kumagai wrote:
> >> Hello Kumagai,
> >>
> >> On 12/04/2015 10:30 AM, Atsushi Kumagai wrote:
> >>> Hello, Zhou
> >>>
> >>>> On 12/02/2015 03:24 PM, Dave Young wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On 12/02/15 at 01:29pm, "Zhou, Wenjian/周文剑" wrote:
> >>>>>> I think there is no problem if other test results are as expected.
> >>>>>>
> >>>>>> --num-threads mainly reduces the time of compressing.
> >>>>>> So for lzo, it can't do much help at most of time.
> >>>>>
> >>>>> Seems the help of --num-threads does not say it exactly:
> >>>>>
> >>>>> [--num-threads THREADNUM]:
> >>>>> Using multiple threads to read and compress data of each page
> >>>>> in parallel.
> >>>>> And it will reduces time for saving DUMPFILE.
> >>>>> This feature only supports creating DUMPFILE in
> >>>>> kdump-comressed format from
> >>>>> VMCORE in kdump-compressed format or elf format.
> >>>>>
> >>>>> Lzo is also a compress method, it should be mentioned that
> >>>>> --num-threads only
> >>>>> supports zlib compressed vmcore.
> >>>>>
> >>>>
> >>>> Sorry, it seems that something I said is not so clear.
> >>>> lzo is also supported. Since lzo compresses data at a high speed, the
> >>>> improving of the performance is not so obvious at most of time.
> >>>>
> >>>>> Also worth to mention about the recommended -d value for this feature.
> >>>>>
> >>>>
> >>>> Yes, I think it's worth. I forgot it.
> >>>
> >>> I saw your patch, but I think I should confirm what is the problem first.
> >>>
> >>>> However, when "-d 31" is specified, it will be worse.
> >>>> Less than 50 buffers are used to cache the compressed page.
> >>>> And even the page has been filtered, it will also take a buffer.
> >>>> So if "-d 31" is specified, the filtered page will use a lot
> >>>> of buffers. Then the page which needs to be compressed can't
> >>>> be compressed parallel.
> >>>
> >>> Could you explain why compression will not be parallel in more detail ?
> >>> Actually the buffers are used also for filtered pages, it sounds
> >>> inefficient.
> >>> However, I don't understand why it prevents parallel compression.
> >>>
> >>
> >> Think about this, in a huge memory, most of the page will be filtered, and
> >> we have 5 buffers.
> >>
> >> page1 page2 page3 page4 page5 page6 page7
> >> .....
> >> [buffer1] [2] [3] [4] [5]
> >> unfiltered filtered filtered filtered filtered unfiltered filtered
> >>
> >> Since filtered page will take a buffer, when compressing page1,
> >> page6 can't be compressed at the same time.
> >> That why it will prevent parallel compression.
> >
> > Thanks for your explanation, I understand.
> > This is just an issue of the current implementation, there is no
> > reason to stand this restriction.
> >
> >>> Further, according to Chao's benchmark, there is a big performance
> >>> degradation even if the number of thread is 1. (58s vs 240s)
> >>> The current implementation seems to have some problems, we should
> >>> solve them.
> >>>
> >>
> >> If "-d 31" is specified, on the one hand we can't save time by compressing
> >> parallel, on the other hand we will introduce some extra work by adding
> >> "--num-threads". So it is obvious that it will have a performance
> >> degradation.
> >
> > Sure, there must be some overhead due to "some extra work"(e.g. exclusive
> > lock),
> > but "--num-threads=1 is 4 times slower than --num-threads=0" still sounds
> > too slow, the degradation is too big to be called "some extra work".
> >
> > Both --num-threads=0 and --num-threads=1 are serial processing,
> > the above "buffer fairness issue" will not be related to this degradation.
> > What do you think what make this degradation ?
> >
>
> I can't get such result at this moment, so I can't do some further
> investigation
> right now. I guess it may be caused by the underlying implementation of
> pthread.
> I reviewed the test result of the patch v2 and found in different machines,
> the results are quite different.
Hi Zhou Wenjian,
I have done more tests in another machine with 128G memory, and get the result:
the size of vmcore is 300M in "-d 31"
makedumpfile -l --message-level 1 -d 31:
time: 8.6s page-faults: 2272
makedumpfile -l --num-threads 1 --message-level 1 -d 31:
time: 28.1s page-faults: 2359
and the size of vmcore is 2.6G in "-d 0".
In this machine, I get the same result as yours:
makedumpfile -c --message-level 1 -d 0:
time: 597s page-faults: 2287
makedumpfile -c --num-threads 1 --message-level 1 -d 0:
time: 602s page-faults: 2361
makedumpfile -c --num-threads 2 --message-level 1 -d 0:
time: 337s page-faults: 2397
makedumpfile -c --num-threads 4 --message-level 1 -d 0:
time: 175s page-faults: 2461
makedumpfile -c --num-threads 8 --message-level 1 -d 0:
time: 103s page-faults: 2611
But the machine of my first test is not under my control, should I wait for
the first machine to do more tests?
If there are still some problems in my tests, please tell me.
Thanks,
Chao Fan
>
> It seems that I can get almost the same result of Chao from "PRIMEQUEST
> 1800E".
>
> ###################################
> - System: PRIMERGY RX300 S6
> - CPU: Intel(R) Xeon(R) CPU x5660
> - memory: 16GB
> ###################################
> ************ makedumpfile -d 7 ******************
> core-data 0 256
> threads-num
> -l
> 0 10 144
> 4 5 110
> 8 5 111
> 12 6 111
>
> ************ makedumpfile -d 31 ******************
> core-data 0 256
> threads-num
> -l
> 0 0 0
> 4 2 2
> 8 2 3
> 12 2 3
>
> ###################################
> - System: PRIMEQUEST 1800E
> - CPU: Intel(R) Xeon(R) CPU E7540
> - memory: 32GB
> ###################################
> ************ makedumpfile -d 7 ******************
> core-data 0 256
> threads-num
> -l
> 0 34 270
> 4 63 154
> 8 64 131
> 12 65 159
>
> ************ makedumpfile -d 31 ******************
> core-data 0 256
> threads-num
> -l
> 0 2 1
> 4 48 48
> 8 48 49
> 12 49 50
>
> >> I'm not so sure if it is a problem that the performance degradation is so
> >> big.
> >> But I think if in other cases, it works as expected, this won't be a
> >> problem(
> >> or a problem needs to be fixed), for the performance degradation existing
> >> in theory.
> >>
> >> Or the current implementation should be replaced by a new arithmetic.
> >> For example:
> >> We can add an array to record whether the page is filtered or not.
> >> And only the unfiltered page will take the buffer.
> >
> > We should discuss how to implement new mechanism, I'll mention this later.
> >
> >> But I'm not sure if it is worth.
> >> For "-l -d 31" is fast enough, the new arithmetic also can't do much help.
> >
> > Basically the faster, the better. There is no obvious target time.
> > If there is room for improvement, we should do it.
> >
>
> Maybe we can improve the performance of "-c -d 31" in some case.
>
> BTW, we can easily get the theoretical performance by using the "--split".
>
> --
> Thanks
> Zhou
>
>
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2015-12-10 9:59 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-05 7:56 [PATCH RFC 00/11] makedumpfile: parallel processing Zhou Wenjian
2015-06-05 7:56 ` [PATCH RFC 01/11] Add readpage_kdump_compressed_parallel Zhou Wenjian
2015-06-05 7:56 ` [PATCH RFC 02/11] Add mappage_elf_parallel Zhou Wenjian
2015-06-05 7:56 ` [PATCH RFC 03/11] Add readpage_elf_parallel Zhou Wenjian
2015-06-05 7:56 ` [PATCH RFC 04/11] Add read_pfn_parallel Zhou Wenjian
2015-06-05 7:56 ` [PATCH RFC 05/11] Add function to initial bitmap for parallel use Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 06/11] Add filter_data_buffer_parallel Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 07/11] Add write_kdump_pages_parallel to allow parallel process Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 08/11] Add write_kdump_pages_parallel_cyclic to allow parallel process in cyclic_mode Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 09/11] Initial and free data used for parallel process Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 10/11] Make makedumpfile available to read and compress pages parallelly Zhou Wenjian
2015-06-05 7:57 ` [PATCH RFC 11/11] Add usage and manual about multiple threads process Zhou Wenjian
2015-06-08 3:55 ` [PATCH RFC 00/11] makedumpfile: parallel processing "Zhou, Wenjian/周文剑"
2015-12-01 8:39 ` Chao Fan
2015-12-02 5:29 ` "Zhou, Wenjian/周文剑"
2015-12-02 7:24 ` Dave Young
2015-12-02 7:38 ` "Zhou, Wenjian/周文剑"
2015-12-04 2:30 ` Atsushi Kumagai
2015-12-04 3:33 ` "Zhou, Wenjian/周文剑"
2015-12-04 8:56 ` Chao Fan
2015-12-07 1:09 ` "Zhou, Wenjian/周文剑"
2015-12-10 8:14 ` Atsushi Kumagai
2015-12-10 9:36 ` "Zhou, Wenjian/周文剑"
2015-12-10 9:58 ` Chao Fan [this message]
2015-12-10 10:32 ` "Zhou, Wenjian/周文剑"
2015-12-10 10:54 ` Chao Fan
2015-12-22 8:32 ` HATAYAMA Daisuke
2015-12-24 2:20 ` Chao Fan
2015-12-24 3:22 ` HATAYAMA Daisuke
2015-12-24 3:31 ` Chao Fan
2015-12-24 3:50 ` HATAYAMA Daisuke
2015-12-24 6:02 ` Chao Fan
2015-12-24 7:22 ` HATAYAMA Daisuke
2015-12-24 8:20 ` Atsushi Kumagai
2015-12-24 9:04 ` Chao Fan
2015-12-14 8:26 ` Atsushi Kumagai
2015-12-14 8:59 ` "Zhou, Wenjian/周文剑"
2015-06-10 6:06 ` Atsushi Kumagai
2015-06-11 3:47 ` "Zhou, Wenjian/周文剑"
2015-06-15 1:59 ` qiaonuohan
2015-06-15 5:57 ` Atsushi Kumagai
2015-06-15 6:06 ` qiaonuohan
2015-06-15 6:07 ` qiaonuohan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=595559040.32425969.1449741521365.JavaMail.zimbra@redhat.com \
--to=cfan@redhat.com \
--cc=ats-kumagai@wm.jp.nec.com \
--cc=kexec@lists.infradead.org \
--cc=zhouwj-fnst@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox