Re: [PATCH v1 0/5] makedumpfile: --split: assign fair I/O workloads in appropriate time

From: "\"Zhou, Wenjian/周文剑\"" <zhouwj-fnst@cn.fujitsu.com>
To: kexec@lists.infradead.org
Subject: Re: [PATCH v1 0/5] makedumpfile: --split: assign fair I/O workloads in appropriate time
Date: Fri, 10 Oct 2014 12:12:01 +0800	[thread overview]
Message-ID: <54375C91.5040707@cn.fujitsu.com> (raw)
In-Reply-To: <1411974387-10839-1-git-send-email-zhouwj-fnst@cn.fujitsu.com>

Maybe I should give more information about the issue.

When --split option is specified, fair I/O workloads should be assigned for each process
to maximize amount of performance optimization by parallel processing.

However, the current implementation of setup_splitting() in cyclic mode doesn't care about
  filtering at all. It may always cause a big difference among dumpfiles in size.

To solve the problem, we should count the dumpable pfn instead of each pfn. It means that
the start and end pfn of each dumpfile must be calculated with filtering.

So, HATAYAMA Daisuke put forward the 3-pass algorithm. The algorithm deals with the issue
by doing the complete filtering in setup_splitting_cyclic().
(The implementation of 3-pass algorithm is referred to
http://lists.infradead.org/pipermail/kexec/2014-March/011339.html)

However, in 3-pass algorithm, if --split is specified in cyclic mode, we do filtering three times:
in get_dumpable_pages_cyclic(), in setup_splitting_cyclic() and in writeout_dumpfile().
Filtering takes a long time on system with huge memory according to the benchmark on
the past, so it is necessary to be optimized.

Then, the 2-pass algorithm came. We remove the filtering in setup_splitting_cyclic(). Since we
just need counting the dumpable pfn, we can record the number of dumpable pfn in first filtering
and calculate the start-end pfn with the number.

We divide memory into several parts(we call it block. the default block size is 1GB). The number
of dumpable pages in each block is recorded when doing first filtering. When calculating, with
the help of the dumpable number, we don't need to do the filtering for whole memory.

These algorithms may can be described as the following:

	current:
		get_dumpable_pages_cyclic():
						do filtering
						count all dumpable pages
		setup_splitting():
						calculate start-end pfn without counting dumpable pages

		writeout_dumpfile():
						do filtering
						write data

	3-pass:
		get_dumpable_pages_cyclic():
						do filtering
						count all dumpable pages
		setup_splitting_cyclic():
						do filtering
						count dumpable pages of each dumpfile
						calculate start-end pfn of each dumpfile
		writeout_dumpfile():
						do filtering
						write data

	2-pass:
		get_dumpable_pages_cyclic():
						do filtering
						count dumpable pages of each block
						count all dumpable pages
		setup_splitting_cyclic():
						calculate start-end pfn of each dumpfile with the help of block

		writeout_dumpfile():
						do filtering
						write data

The performance of the two algorithm (2-pass and 3-pass) was tested. The result can be found in
the previous letter.

On 09/29/2014 03:06 PM, Zhou Wenjian wrote:
> The issue is discussed at http://lists.infradead.org/pipermail/kexec/2014-March/011289.html
>
> This patch implements the idea of 2-pass algorhythm with smaller memory to manage block table.
> Exactly the algorhythm is still 3-pass,but the time of second pass is much shorter.
> The tables below show the performence with different size of cyclic-buffer and block.
> The test is executed on the machine having 128G memory.
>
> the value is total time (including first pass and second pass).
> the value in brackets is the time of second pass.
> 															      sec
> 	cyclic-buffer	1		2		4		8		16		32		64
> block-size
> 1M			4.74(0.00)	4.22(0.01)	3.94(0.01)	3.78(0.02)	3.71(0.03)	3.73(0.07)	3.74(0.10)	
> 2M			4.74(0.00)	4.19(0.00)	3.94(0.01)	3.80(0.03)	3.71(0.03)	3.72(0.07)	3.72(0.09)	
> 4M			4.73(0.00)	4.21(0.01)	3.95(0.01)	3.78(0.02)	3.70(0.02)	3.73(0.08)	3.73(0.10)	
> 8M			4.73(0.00)	4.19(0.00)	3.94(0.01)	3.83(0.02)	3.73(0.03)	3.72(0.07)	3.74(0.10)	
> 16M			4.74(0.01)	4.21(0.00)	3.94(0.01)	3.76(0.01)	3.73(0.03)	3.73(0.08)	3.74(0.10)	
> 32M			4.72(0.00)	4.20(0.02)	3.92(0.01)	3.77(0.02)	3.71(0.02)	3.70(0.06)	3.74(0.10)	
> 64M			4.74(0.01)	4.20(0.00)	3.95(0.01)	3.78(0.02)	3.70(0.02)	3.71(0.07)	3.72(0.09)	
> 128M			4.73(0.01)	4.20(0.00)	3.94(0.01)	3.78(0.02)	3.76(0.03)	3.72(0.08)	3.74(0.09)	
> 256M			4.75(0.02)	4.22(0.02)	3.96(0.03)	3.78(0.02)	3.70(0.03)	3.70(0.07)	3.74(0.11)	
> 512M			4.77(0.04)	4.21(0.03)	3.97(0.04)	3.79(0.03)	3.73(0.04)	3.75(0.09)	3.82(0.13)	
> 1G			4.82(0.09)	4.26(0.07)	4.00(0.08)	3.83(0.07)	3.76(0.08)	3.73(0.08)	3.76(0.12)	
> 2G			8.26(3.54)	7.34(3.14)	6.86(2.93)	6.56(2.80)	6.44(2.76)	6.45(2.79)	6.42(2.80)
>
> the performence of 3-pass algorhythm
> origin			8.25(3.54)	7.26(3.11)	6.80(2.91)	6.52(2.80)	6.39(2.76)	6.40(2.78)	6.45(2.85)
>
> 															       sec
> 	cyclic-buffer	128		256		512		1024		2048		4096		8192	
> block-size
> 1M			3.83(0.21)	3.94(0.33)	4.16(0.54)	4.61(0.99)	7.03(3.41)	8.73(5.11)	8.69(5.08)
> 2M			3.86(0.21)	3.92(0.32)	4.16(0.54)	4.64(0.98)	7.02(3.41)	8.71(5.09)	8.72(5.09)
> 4M			3.82(0.21)	3.95(0.32)	4.18(0.55)	4.62(0.99)	7.05(3.44)	8.70(5.09)	8.68(5.07)
> 8M			3.82(0.21)	3.95(0.33)	4.17(0.54)	4.58(0.97)	7.03(3.41)	8.79(5.16)	8.71(5.09)
> 16M			3.83(0.21)	3.93(0.31)	4.15(0.54)	4.60(0.98)	7.06(3.43)	8.76(5.13)	8.73(5.10)
> 32M			3.84(0.22)	3.93(0.32)	4.15(0.54)	4.61(0.98)	7.00(3.40)	8.69(5.08)	8.75(5.13)
> 64M			3.84(0.21)	3.94(0.33)	4.15(0.54)	4.60(0.98)	7.04(3.42)	8.74(5.10)	8.80(5.16)
> 128M			3.85(0.22)	3.97(0.33)	4.16(0.54)	4.60(0.98)	7.07(3.44)	8.68(5.07)	8.69(5.07)
> 256M			3.84(0.21)	3.94(0.33)	4.16(0.55)	4.64(1.00)	7.02(3.41)	8.74(5.11)	8.73(5.11)
> 512M			3.85(0.24)	3.97(0.34)	4.17(0.56)	4.61(0.99)	7.05(3.44)	8.73(5.11)	8.75(5.13)
> 1G			3.85(0.22)	3.96(0.35)	4.18(0.56)	4.65(1.00)	7.06(3.44)	8.76(5.12)	8.72(5.11)
> 2G			6.53(2.91)	6.86(3.25)	7.54(3.92)	8.95(5.31)	10.60(6.97)	14.08(10.47)	14.32(10.60)
>
> the performence of 3-pass algorhythm
> origin			6.64(3.05)	6.81(3.24)	7.51(3.93)	8.86(5.30)	10.51(6.94)	13.92(10.36)	14.11(10.55)
>
> Zhou Wenjian (5):
>    Add support for block
>    Add tools for reading and writing from block table
>    Add module of generating table
>    Add module of calculating start_pfn and end_pfn in each dumpfile
>    Add support for --block-size
>
>   makedumpfile.8 |   16 ++++
>   makedumpfile.c |  245 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   makedumpfile.h |   15 ++++
>   3 files changed, 271 insertions(+), 5 deletions(-)
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec