From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]) by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1W32Gu-0000zq-QK for kexec@lists.infradead.org; Tue, 14 Jan 2014 11:35:06 +0000 Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id A445B3EE0AE for ; Tue, 14 Jan 2014 20:34:39 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 932CC45DE52 for ; Tue, 14 Jan 2014 20:34:39 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.nic.fujitsu.com [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 7838745DE4D for ; Tue, 14 Jan 2014 20:34:39 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 691F31DB803B for ; Tue, 14 Jan 2014 20:34:39 +0900 (JST) Received: from m1000.s.css.fujitsu.com (m1000.s.css.fujitsu.com [10.240.81.136]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 0A7DF1DB8037 for ; Tue, 14 Jan 2014 20:34:39 +0900 (JST) Message-ID: <52D52094.3050301@jp.fujitsu.com> Date: Tue, 14 Jan 2014 20:33:40 +0900 From: HATAYAMA Daisuke MIME-Version: 1.0 Subject: Re: [PATCH 0/2] makedumpfile: for large memories References: In-Reply-To: List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=twosheds.infradead.org@lists.infradead.org To: cpw Cc: kumagai-atsushi@mxc.nes.nec.co.jp, kexec@lists.infradead.org (2014/01/01 8:30), cpw wrote: > From: Cliff Wickman > > Gentlemen of kexec, > > I have been working on enabling kdump on some very large systems, and > have found some solutions that I hope you will consider. > > The first issue is to work within the restricted size of crashkernel memory > under 2.6.32-based kernels, such as sles11 and rhel6. > > The second issue is to reduce the very large size of a dump of a big memory > system, even on an idle system. > > These are my propositions: > > Size of crashkernel memory > 1) raw i/o for writing the dump > 2) use root device for the bitmap file (not tmpfs) > 3) raw i/o for reading/writing the bitmaps > > Size of dump (and hence the duration of dumping) > 4) exclude page structures for unused pages > > > 1) Is quite easy. The cache of pages needs to be aligned on a block > boundary and written in block multiples, as required by O_DIRECT files. > > The use of raw i/o prevents the growing of the crash kernel's page > cache. > > 2) Is also quite easy. My patch finds the path to the crash > kernel's root device by examining the dump pathname. Storing the bitmaps > to a file is otherwise not conserving memory, as they are being written > to tmpfs. > > 3) Raw i/o for the bitmaps, is accomplished by caching the > bitmap file in a similar way to that of the dump file. > > I find that the use of direct i/o is not significantly slower than > writing through the kernel's page cache. > > 4) The excluding of unused kernel page structures is very > important for a large memory system. The kernel otherwise includes > 3.67 million pages of page structures per TB of memory. By contrast > the rest of the kernel is only about 1 million pages. > > Test results are below, for systems of 1TB, 2TB, 8.8TB and 16TB. > (There are no 'old' numbers for 16TB as time and space requirements > made those effectively useless.) > > Run times were generally reduced 2-3x, and dump size reduced about 8x. > > All timings were done using 512M of crashkernel memory. > > System memory size > 1TB unpatched patched > OS: rhel6.4 (does a free pages pass) > page scan time 1.6min 1.6min > dump copy time 2.4min .4min > total time 4.1min 2.0min > dump size 3014M 364M > > OS: rhel6.5 > page scan time .6min .6min > dump copy time 2.3min .5min > total time 2.9min 1.1min > dump size 3011M 423M > > OS: sles11sp3 (3.0.93) > page scan time .5min .5min > dump copy time 2.3min .5min > total time 2.8min 1.0min > dump size 2950M 350M > > 2TB > OS: rhel6.5 (cyclicx3) > page scan time 2.0min 1.8min > dump copy time 8.0min 1.5min > total time 10.0min 3.3min > dump size 6141M 835M > > 8.8TB > OS: rhel6.5 (cyclicx5) > page scan time 6.6min 5.5min > dump copy time 67.8min 6.2min > total time 74.4min 11.7min > dump size 15.8G 2.7G > > 16TB > OS: rhel6.4 > page scan time 125.3min > dump copy time 13.2min > total time 138.5min > dump size 4.0G > > OS: rhel6.5 > page scan time 27.8min > dump copy time 13.3min > total time 41.1min > dump size 4.1G > Also, could you please show us results in more detail? That is, this benchmark is relevant to 3 parameters below - cyclic mode or non-cyclic mode - cached I/O or direct I/O - with or without page structure object array Please describe results of each parameter separately, and we can easily understand how each parameter affects without confusion. -- Thanks. HATAYAMA, Daisuke _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec