Kexec Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "\"Zhou, Wenjian/周文剑\"" <zhouwj-fnst@cn.fujitsu.com>
To: kexec@lists.infradead.org
Subject: Re: [PATCH v1 0/5] makedumpfile: --split: assign fair I/O workloads in appropriate time
Date: Fri, 10 Oct 2014 12:12:01 +0800	[thread overview]
Message-ID: <54375C91.5040707@cn.fujitsu.com> (raw)
In-Reply-To: <1411974387-10839-1-git-send-email-zhouwj-fnst@cn.fujitsu.com>

Maybe I should give more information about the issue.

When --split option is specified, fair I/O workloads should be assigned for each process
to maximize amount of performance optimization by parallel processing.

However, the current implementation of setup_splitting() in cyclic mode doesn't care about
  filtering at all. It may always cause a big difference among dumpfiles in size.

To solve the problem, we should count the dumpable pfn instead of each pfn. It means that
the start and end pfn of each dumpfile must be calculated with filtering.

So, HATAYAMA Daisuke put forward the 3-pass algorithm. The algorithm deals with the issue
by doing the complete filtering in setup_splitting_cyclic().
(The implementation of 3-pass algorithm is referred to
http://lists.infradead.org/pipermail/kexec/2014-March/011339.html)

However, in 3-pass algorithm, if --split is specified in cyclic mode, we do filtering three times:
in get_dumpable_pages_cyclic(), in setup_splitting_cyclic() and in writeout_dumpfile().
Filtering takes a long time on system with huge memory according to the benchmark on
the past, so it is necessary to be optimized.


Then, the 2-pass algorithm came. We remove the filtering in setup_splitting_cyclic(). Since we
just need counting the dumpable pfn, we can record the number of dumpable pfn in first filtering
and calculate the start-end pfn with the number.

We divide memory into several parts(we call it block. the default block size is 1GB). The number
of dumpable pages in each block is recorded when doing first filtering. When calculating, with
the help of the dumpable number, we don't need to do the filtering for whole memory.

These algorithms may can be described as the following:

	current:
		get_dumpable_pages_cyclic():
						do filtering
						count all dumpable pages
		setup_splitting():
						calculate start-end pfn without counting dumpable pages

		writeout_dumpfile():
						do filtering
						write data

	3-pass:
		get_dumpable_pages_cyclic():
						do filtering
						count all dumpable pages
		setup_splitting_cyclic():
						do filtering
						count dumpable pages of each dumpfile
						calculate start-end pfn of each dumpfile
		writeout_dumpfile():
						do filtering
						write data

	2-pass:
		get_dumpable_pages_cyclic():
						do filtering
						count dumpable pages of each block
						count all dumpable pages
		setup_splitting_cyclic():
						calculate start-end pfn of each dumpfile with the help of block

		writeout_dumpfile():
						do filtering
						write data

The performance of the two algorithm (2-pass and 3-pass) was tested. The result can be found in
the previous letter.


On 09/29/2014 03:06 PM, Zhou Wenjian wrote:
> The issue is discussed at http://lists.infradead.org/pipermail/kexec/2014-March/011289.html
>
> This patch implements the idea of 2-pass algorhythm with smaller memory to manage block table.
> Exactly the algorhythm is still 3-pass,but the time of second pass is much shorter.
> The tables below show the performence with different size of cyclic-buffer and block.
> The test is executed on the machine having 128G memory.
>
> the value is total time (including first pass and second pass).
> the value in brackets is the time of second pass.
> 															      sec
> 	cyclic-buffer	1		2		4		8		16		32		64
> block-size
> 1M			4.74(0.00)	4.22(0.01)	3.94(0.01)	3.78(0.02)	3.71(0.03)	3.73(0.07)	3.74(0.10)	
> 2M			4.74(0.00)	4.19(0.00)	3.94(0.01)	3.80(0.03)	3.71(0.03)	3.72(0.07)	3.72(0.09)	
> 4M			4.73(0.00)	4.21(0.01)	3.95(0.01)	3.78(0.02)	3.70(0.02)	3.73(0.08)	3.73(0.10)	
> 8M			4.73(0.00)	4.19(0.00)	3.94(0.01)	3.83(0.02)	3.73(0.03)	3.72(0.07)	3.74(0.10)	
> 16M			4.74(0.01)	4.21(0.00)	3.94(0.01)	3.76(0.01)	3.73(0.03)	3.73(0.08)	3.74(0.10)	
> 32M			4.72(0.00)	4.20(0.02)	3.92(0.01)	3.77(0.02)	3.71(0.02)	3.70(0.06)	3.74(0.10)	
> 64M			4.74(0.01)	4.20(0.00)	3.95(0.01)	3.78(0.02)	3.70(0.02)	3.71(0.07)	3.72(0.09)	
> 128M			4.73(0.01)	4.20(0.00)	3.94(0.01)	3.78(0.02)	3.76(0.03)	3.72(0.08)	3.74(0.09)	
> 256M			4.75(0.02)	4.22(0.02)	3.96(0.03)	3.78(0.02)	3.70(0.03)	3.70(0.07)	3.74(0.11)	
> 512M			4.77(0.04)	4.21(0.03)	3.97(0.04)	3.79(0.03)	3.73(0.04)	3.75(0.09)	3.82(0.13)	
> 1G			4.82(0.09)	4.26(0.07)	4.00(0.08)	3.83(0.07)	3.76(0.08)	3.73(0.08)	3.76(0.12)	
> 2G			8.26(3.54)	7.34(3.14)	6.86(2.93)	6.56(2.80)	6.44(2.76)	6.45(2.79)	6.42(2.80)
>
> the performence of 3-pass algorhythm
> origin			8.25(3.54)	7.26(3.11)	6.80(2.91)	6.52(2.80)	6.39(2.76)	6.40(2.78)	6.45(2.85)
>
> 															       sec
> 	cyclic-buffer	128		256		512		1024		2048		4096		8192	
> block-size
> 1M			3.83(0.21)	3.94(0.33)	4.16(0.54)	4.61(0.99)	7.03(3.41)	8.73(5.11)	8.69(5.08)
> 2M			3.86(0.21)	3.92(0.32)	4.16(0.54)	4.64(0.98)	7.02(3.41)	8.71(5.09)	8.72(5.09)
> 4M			3.82(0.21)	3.95(0.32)	4.18(0.55)	4.62(0.99)	7.05(3.44)	8.70(5.09)	8.68(5.07)
> 8M			3.82(0.21)	3.95(0.33)	4.17(0.54)	4.58(0.97)	7.03(3.41)	8.79(5.16)	8.71(5.09)
> 16M			3.83(0.21)	3.93(0.31)	4.15(0.54)	4.60(0.98)	7.06(3.43)	8.76(5.13)	8.73(5.10)
> 32M			3.84(0.22)	3.93(0.32)	4.15(0.54)	4.61(0.98)	7.00(3.40)	8.69(5.08)	8.75(5.13)
> 64M			3.84(0.21)	3.94(0.33)	4.15(0.54)	4.60(0.98)	7.04(3.42)	8.74(5.10)	8.80(5.16)
> 128M			3.85(0.22)	3.97(0.33)	4.16(0.54)	4.60(0.98)	7.07(3.44)	8.68(5.07)	8.69(5.07)
> 256M			3.84(0.21)	3.94(0.33)	4.16(0.55)	4.64(1.00)	7.02(3.41)	8.74(5.11)	8.73(5.11)
> 512M			3.85(0.24)	3.97(0.34)	4.17(0.56)	4.61(0.99)	7.05(3.44)	8.73(5.11)	8.75(5.13)
> 1G			3.85(0.22)	3.96(0.35)	4.18(0.56)	4.65(1.00)	7.06(3.44)	8.76(5.12)	8.72(5.11)
> 2G			6.53(2.91)	6.86(3.25)	7.54(3.92)	8.95(5.31)	10.60(6.97)	14.08(10.47)	14.32(10.60)
>
> the performence of 3-pass algorhythm
> origin			6.64(3.05)	6.81(3.24)	7.51(3.93)	8.86(5.30)	10.51(6.94)	13.92(10.36)	14.11(10.55)
>
> Zhou Wenjian (5):
>    Add support for block
>    Add tools for reading and writing from block table
>    Add module of generating table
>    Add module of calculating start_pfn and end_pfn in each dumpfile
>    Add support for --block-size
>
>   makedumpfile.8 |   16 ++++
>   makedumpfile.c |  245 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   makedumpfile.h |   15 ++++
>   3 files changed, 271 insertions(+), 5 deletions(-)
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

      parent reply	other threads:[~2014-10-10  4:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-29  7:06 [PATCH v1 0/5] makedumpfile: --split: assign fair I/O workloads in appropriate time Zhou Wenjian
2014-09-29  7:06 ` [PATCH v1 1/5] makedumpfile: Add support for block Zhou Wenjian
2014-10-10  8:11   ` Atsushi Kumagai
2014-09-29  7:06 ` [PATCH v1 2/5] makedumpfile: Add tools for reading and writing from block table Zhou Wenjian
2014-09-29  7:06 ` [PATCH v1 3/5] makedumpfile: Add module of generating table Zhou Wenjian
2014-10-10  8:12   ` Atsushi Kumagai
2014-09-29  7:06 ` [PATCH v1 4/5] makedumpfile: Add module of calculating start_pfn and end_pfn in each dumpfile Zhou Wenjian
2014-09-29  7:06 ` [PATCH v1 5/5] makedumpfile: Add support for --block-size Zhou Wenjian
2014-10-10  8:11   ` Atsushi Kumagai
2014-10-07  2:49 ` [PATCH v1 0/5] makedumpfile: --split: assign fair I/O workloads in appropriate time "Zhou, Wenjian/周文剑"
2014-10-10  4:12 ` "Zhou, Wenjian/周文剑" [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54375C91.5040707@cn.fujitsu.com \
    --to=zhouwj-fnst@cn.fujitsu.com \
    --cc=kexec@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox