Re: [PATCH] makedumpfile: --split: assign fair I/O workloads for each process

From: "\"Hatayama, Daisuke/畑山 大輔\"" <d.hatayama@jp.fujitsu.com>
To: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: "kexec@lists.infradead.org" <kexec@lists.infradead.org>
Subject: Re: [PATCH] makedumpfile: --split: assign fair I/O workloads for each process
Date: Tue, 25 Mar 2014 14:52:36 +0900	[thread overview]
Message-ID: <533119A4.8040900@jp.fujitsu.com> (raw)
In-Reply-To: <0910DD04CBD6DE4193FCF86B9C00BE971F90A2@BPXM01GP.gisp.nec.co.jp>

(2014/03/25 10:14), Atsushi Kumagai wrote:
>> From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
>>
>> When --split option is specified, fair I/O workloads should be
>> assigned for each process to maximize amount of performance
>> optimization by parallel processing.
>>
>> However, the current implementation of setup_splitting() in cyclic
>> mode doesn't care about filtering at all; I/O workloads for each
>> process could be biased easily.
>>
>> This patch deals with the issue by implementing the fair I/O workload
>> assignment as setup_splitting_cyclic().
>>
>> Note: If --split is specified in cyclic mode, we do filtering three
>> times: in get_dumpable_pages_cyclic(), in setup_splitting_cyclic() and
>> in writeout_dumpfile(). Filtering takes about 10 minutes on system
>> with huge memory according to the benchmark on the past, so it might
>> be necessary to optimize filtering or setup_filtering_cyclic().
> 
> Sorry, I lost the result of that benchmark, could you give me the URL?
> I'd like to confirm that the advantage of fair I/O will exceed the
> 10 minutes disadvantage.
> 

Here are two benchmarks by Jingbai Ma and myself.

http://lists.infradead.org/pipermail/kexec/2013-March/008515.html
http://lists.infradead.org/pipermail/kexec/2013-March/008517.html

Note that Jingbai Ma's results are sum of get_dumpable_cyclic() and writeout_dumpfile(), so apparently it looks twice larger than mine, but actually they show almost same performance.

In summary, each result shows about 40 seconds per 1TiB. So, most of systems is not affected very much. On 12TiB memory, which is the current maximum memory size of Fujitsu system, we needs 480 seconds == 8 minutes more. But this is stable in the sense that time never become long suddenly in some rare worst case, so it seems to me optimistic in this sense.

The other ideas to deal with the issue are:

- paralellize the counting up processes. But it might be difficult to paralellize the 2nd pass, which seems inherently serial processing.

- Insead of doing the 2nd pass, make the terminating proces join to still running process. But it might be combersome to implement this not using pthread.

Thanks.
HATAYAMA, Daisuke

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec