Kexec Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "\"Zhou, Wenjian/周文剑\"" <zhouwj-fnst@cn.fujitsu.com>
To: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Cc: "kexec@lists.infradead.org" <kexec@lists.infradead.org>
Subject: Re: [PATCH v3 00/10] makedumpfile: parallel processing
Date: Wed, 5 Aug 2015 10:46:10 +0800	[thread overview]
Message-ID: <55C178F2.80406@cn.fujitsu.com> (raw)
In-Reply-To: <55BB4178.3040602@cn.fujitsu.com>

ping...

-- 
Thanks
Zhou Wenjian

On 07/31/2015 05:35 PM, "Zhou, Wenjian/周文剑" wrote:
> On 07/31/2015 04:27 PM, Atsushi Kumagai wrote:
>>> On 07/23/2015 02:20 PM, Atsushi Kumagai wrote:
>>>>> Hello Kumagai,
>>>>>
>>>>> The PATCH v3 has improved the performance.
>>>>> The performance degradation in PATCH v2 mainly caused by the page_fault
>>>>> produced by the function compress2().
>>>>>
>>>>> I wrote some codes to test the performance of compress2. It almost costs
>>>>> the same time and produces the same amount of page_fault as executing compress2
>>>>> in thread.
>>>>>
>>>>> To reduce page_faults, I have to do the following in kdump_thread_function_cyclic().
>>>>>
>>>>> +    /*
>>>>> +     * lock memory to reduce page_faults by compress2()
>>>>> +     */
>>>>> +    void *temp = malloc(1);
>>>>> +    memset(temp, 0, 1);
>>>>> +    mlockall(MCL_CURRENT);
>>>>> +    free(temp);
>>>>> +
>>>>>
>>>>> With this, using a thread or not almost has the same performance.
>>>>
>>>> Hmm... I can't get good results with this patch, many page faults still
>>>> occur. I guess mlock will change when page faults occur, but will not
>>>> change the total number of page faults.
>>>> Could you explain why compress2() causes many page faults only in thread,
>>>> then I may understand why this patch is meaningful.
>>>>
>>>
>>> Actually, it will also cause so much page faults even not in thread, if
>>> info->bitmap2 is not freed in makedumpfile.
>>>
>>> I wrote some codes to test the performance of compress2().
>>>
>>> <cut>
>>> buf = malloc(PAGE_SIZE);
>>> bufout = malloc(SIZE_OUT);
>>> memset(buf, 1, PAGE_SIZE / 2);
>>> while (1)
>>>      compress2(bufout, &size_out, buf, PAGE_SIZE, Z_BEST_SPEED);
>>> <cut>
>>>
>>> The codes almost like this.
>>> It will cause much page faults.
>>>
>>> But if the codes turn to be the following, it will be much better.
>>>
>>> <cut>
>>> temp = malloc(TEMP_SIZE);
>>> memset(temp, 0, TEMP_SIZE);
>>> free(temp);
>>>
>>> buf = malloc(PAGE_SIZE);
>>> bufout = malloc(SIZE_OUT);
>>> memset(buf, 1, PAGE_SIZE / 2);
>>> while (1)
>>>      compress2(bufout, &size_out, buf, PAGE_SIZE, Z_BEST_SPEED);
>>> <cut>
>>>
>>> TEMP_SIZE must be large enough.
>>> (larger than 135097 will work,in my machine)
>>>
>>>
>>> If in thread, the following codes can reduce the page faults.
>>>
>>> <cut>
>>> temp = malloc(1);
>>> memset(temp, 0, 1);
>>> mlockall(MCL_CURRENT);
>>> free(temp);
>>>
>>> buf = malloc(PAGE_SIZE);
>>> bufout = malloc(SIZE_OUT);
>>> memset(buf, 1, PAGE_SIZE / 2);
>>> while (1)
>>>      compress2(bufout, &size_out, buf, PAGE_SIZE, Z_BEST_SPEED);
>>> <cut>
>>>
>>> I haven't known why.
>>
>> I assume that we are facing the known issue of glibc:
>>
>>    https://sourceware.org/ml/libc-alpha/2015-03/msg00270.html
>>
>> According to the thread above, per-thread arena is easy to be grown and
>> trimmed compared with main arena.
>> Actually compress2() calls malloc() and free() for compression each time
>> it is called, so every compression processing will cause page fault.
>> Moreover, I confirmed that many madvise(MADV_DONTNEED) are invoked only
>> when compress2() is called in thread.
>>
>> OTOH, in lzo case, a temp buffer for working is allocated on the caller
>> side, so it can reduce the number of malloc()/free() pair.
>> (but I'm not sure why snappy doesn't hit this issue. The buffer size
>> for compression may be smaller than the trim threshold.)
>>
>> Anyway, basically it's hard for zlib to avoid this issue on the application
>> side, it seems that we have to accept the performance degradation caused by it.
>> Unfortunately, the main target of this multi thread feature is zlib as you
>> measured, we should resolve this issue somehow.
>>
>> Nevertheless, even now we can get some benefit of parallel processing,
>> so lets' start to discuss the implementation of the parallel processing
>> feature to accept this patch. I have some comments:
>>
>>    - read_pfn_parallel() doesn't use the cache feature(cache.c), is it
>>      intentional with you ?
>>
>
> Yes, since the data are read once a page here, cache feature seems not
> needed.
>
>>    - Now --num-buffers is tunable but the man description and your benchmark
>>      didn't mention what is the benefit of this parameter.
>>
>
> The default value of num-buffers is 50. Originally the value has great influence
> on the performance. But since we changed the logic in the 2nd version of the
> patch set, more buffers have little improvement(1000 buffers may have 1% improvement).
> I'm considering if the option should be removed. what do you think about it?
>
> BTW, the code (mlockall) added in the 3rd version works well in several machines here.
> Should I keep it ?
> With the codes, madvise(MADV_DONTNEED) will be failed in compress2 and the performance
> is as expected in these machines.
>
>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2015-08-05  2:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-21  6:29 [PATCH v3 00/10] makedumpfile: parallel processing Zhou Wenjian
2015-07-21  6:29 ` [PATCH v3 01/10] Add readpage_kdump_compressed_parallel Zhou Wenjian
2015-07-21  6:29 ` [PATCH v3 02/10] Add mappage_elf_parallel Zhou Wenjian
2015-07-21  6:29 ` [PATCH v3 03/10] Add readpage_elf_parallel Zhou Wenjian
2015-07-21  6:29 ` [PATCH v3 04/10] Add read_pfn_parallel Zhou Wenjian
2015-07-21  6:29 ` [PATCH v3 05/10] Add function to initial bitmap for parallel use Zhou Wenjian
2015-07-21  6:29 ` [PATCH v3 06/10] Add filter_data_buffer_parallel Zhou Wenjian
2015-07-21  6:29 ` [PATCH v3 07/10] Add write_kdump_pages_parallel to allow parallel process Zhou Wenjian
2015-07-21  6:29 ` [PATCH v3 08/10] Initial and free data used for " Zhou Wenjian
2015-07-21  6:29 ` [PATCH v3 09/10] Make makedumpfile available to read and compress pages parallelly Zhou Wenjian
2015-07-21  6:29 ` [PATCH v3 10/10] Add usage and manual about multiple threads process Zhou Wenjian
2015-07-21  7:10 ` [PATCH v3 00/10] makedumpfile: parallel processing "Zhou, Wenjian/周文剑"
2015-07-23  6:20   ` Atsushi Kumagai
2015-07-23  6:39     ` "Zhou, Wenjian/周文剑"
2015-07-31  8:27       ` Atsushi Kumagai
2015-07-31  9:35         ` "Zhou, Wenjian/周文剑"
2015-08-05  2:46           ` "Zhou, Wenjian/周文剑" [this message]
2015-08-06  2:46             ` Atsushi Kumagai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55C178F2.80406@cn.fujitsu.com \
    --to=zhouwj-fnst@cn.fujitsu.com \
    --cc=ats-kumagai@wm.jp.nec.com \
    --cc=kexec@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox