Re: [PATCH RFC 00/11] makedumpfile: parallel processing

From: "\"Zhou, Wenjian/周文剑\"" <zhouwj-fnst@cn.fujitsu.com>
To: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Cc: "kexec@lists.infradead.org" <kexec@lists.infradead.org>
Subject: Re: [PATCH RFC 00/11] makedumpfile: parallel processing
Date: Fri, 4 Dec 2015 11:33:36 +0800	[thread overview]
Message-ID: <56610990.2090607@cn.fujitsu.com> (raw)
In-Reply-To: <0910DD04CBD6DE4193FCF86B9C00BE9701E09F28@BPXM01GP.gisp.nec.co.jp>

Hello Kumagai,

On 12/04/2015 10:30 AM, Atsushi Kumagai wrote:
> Hello, Zhou
>
>> On 12/02/2015 03:24 PM, Dave Young wrote:
>>> Hi,
>>>
>>> On 12/02/15 at 01:29pm, "Zhou, Wenjian/周文剑" wrote:
>>>> I think there is no problem if other test results are as expected.
>>>>
>>>> --num-threads mainly reduces the time of compressing.
>>>> So for lzo, it can't do much help at most of time.
>>>
>>> Seems the help of --num-threads does not say it exactly:
>>>
>>>     [--num-threads THREADNUM]:
>>>         Using multiple threads to read and compress data of each page in parallel.
>>>         And it will reduces time for saving DUMPFILE.
>>>         This feature only supports creating DUMPFILE in kdump-comressed format from
>>>         VMCORE in kdump-compressed format or elf format.
>>>
>>> Lzo is also a compress method, it should be mentioned that --num-threads only
>>> supports zlib compressed vmcore.
>>>
>>
>> Sorry, it seems that something I said is not so clear.
>> lzo is also supported. Since lzo compresses data at a high speed, the
>> improving of the performance is not so obvious at most of time.
>>
>>> Also worth to mention about the recommended -d value for this feature.
>>>
>>
>> Yes, I think it's worth. I forgot it.
>
> I saw your patch, but I think I should confirm what is the problem first.
>
>> However, when "-d 31" is specified, it will be worse.
>> Less than 50 buffers are used to cache the compressed page.
>> And even the page has been filtered, it will also take a buffer.
>> So if "-d 31" is specified, the filtered page will use a lot
>> of buffers. Then the page which needs to be compressed can't
>> be compressed parallel.
>
> Could you explain why compression will not be parallel in more detail ?
> Actually the buffers are used also for filtered pages, it sounds inefficient.
> However, I don't understand why it prevents parallel compression.
>

Think about this, in a huge memory, most of the page will be filtered, and
we have 5 buffers.

page1       page2      page3     page4     page5      page6       page7 .....
[buffer1]   [2]        [3]       [4]       [5]
unfiltered  filtered   filtered  filtered  filtered   unfiltered  filtered

Since filtered page will take a buffer, when compressing page1,
page6 can't be compressed at the same time.
That why it will prevent parallel compression.

> Further, according to Chao's benchmark, there is a big performance
> degradation even if the number of thread is 1. (58s vs 240s)
> The current implementation seems to have some problems, we should
> solve them.
>

If "-d 31" is specified, on the one hand we can't save time by compressing
parallel, on the other hand we will introduce some extra work by adding
"--num-threads". So it is obvious that it will have a performance degradation.

I'm not so sure if it is a problem that the performance degradation is so big.
But I think if in other cases, it works as expected, this won't be a problem(
or a problem needs to be fixed), for the performance degradation existing
in theory.

Or the current implementation should be replaced by a new arithmetic.
For example:
We can add an array to record whether the page is filtered or not.
And only the unfiltered page will take the buffer.

But I'm not sure if it is worth.
For "-l -d 31" is fast enough, the new arithmetic also can't do much help.

-- 
Thanks
Zhou

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec