From: Cliff Wickman <cpw@sgi.com>
To: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
lisa.mitchell@hp.com, kumagai-atsushi@mxc.nes.nec.co.jp,
ebiederm@xmission.com, vgoyal@redhat.com
Subject: Re: [PATCH v4 0/8] kdump, vmcore: support mmap() on /proc/vmcore
Date: Thu, 25 Apr 2013 08:38:25 -0500 [thread overview]
Message-ID: <20130425133825.GA25089@sgi.com> (raw)
In-Reply-To: <20130413002000.18245.21513.stgit@localhost6.localdomain6>
On Fri, Apr 05, 2013 at 12:04:02AM +0000, HATAYAMA Daisuke wrote:
> Currently, read to /proc/vmcore is done by read_oldmem() that uses
> ioremap/iounmap per a single page. For example, if memory is 1GB,
> ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
> times. This causes big performance degradation.
>
> In particular, the current main user of this mmap() is makedumpfile,
> which not only reads memory from /proc/vmcore but also does other
> processing like filtering, compression and IO work.
>
> To address the issue, this patch implements mmap() on /proc/vmcore to
> improve read performance.
>
> Benchmark
> =========
>
> You can see two benchmarks on terabyte memory system. Both show about
> 40 seconds on 2TB system. This is almost equal to performance by
> experimtanal kernel-side memory filtering.
>
> - makedumpfile mmap() benchmark, by Jingbai Ma
> https://lkml.org/lkml/2013/3/27/19
>
> - makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
> https://lkml.org/lkml/2013/3/26/914
>
> ChangeLog
> =========
>
> v3 => v4)
>
> - Rebase 3.9-rc7.
> - Drop clean-up patches orthogonal to the main topic of this patch set.
> - Copy ELF note segments in the 1st kernel just as in v1. Allocate
> vmcore objects per pages. => See [PATCH 5/8]
> - Map memory referenced by PT_LOAD entry directly even if the start or
> end of the region doesn't fit inside page boundary, no longer copy
> them as the previous v3. Then, holes, outside OS memory, are visible
> from /proc/vmcore. => See [PATCH 7/8]
>
> v2 => v3)
>
> - Rebase 3.9-rc3.
> - Copy program headers seprately from e_phoff in ELF note segment
> buffer. Now there's no risk to allocate huge memory if program
> header table positions after memory segment.
> - Add cleanup patch that removes unnecessary variable.
> - Fix wrongly using the variable that is buffer size configurable at
> runtime. Instead, use the varibale that has original buffer size.
>
> v1 => v2)
>
> - Clean up the existing codes: use e_phoff, and remove the assumption
> on PT_NOTE entries.
> - Fix potencial bug that ELF haeader size is not included in exported
> vmcoreinfo size.
> - Divide patch modifying read_vmcore() into two: clean-up and primary
> code change.
> - Put ELF note segments in page-size boundary on the 1st kernel
> instead of copying them into the buffer on the 2nd kernel.
>
> Test
> ====
>
> This patch set is composed based on v3.9-rc7.
>
> Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.
>
> ---
>
> HATAYAMA Daisuke (8):
> vmcore: support mmap() on /proc/vmcore
> vmcore: treat memory chunks referenced by PT_LOAD program header entries in \
> page-size boundary in vmcore_list
> vmcore: count holes generated by round-up operation for page boudary for size \
> of /proc/vmcore
> vmcore: copy ELF note segments in the 2nd kernel per page vmcore objects
> vmcore: Add helper function vmcore_add()
> vmcore, procfs: introduce MEM_TYPE_CURRENT_KERNEL flag to distinguish objects \
> copied in 2nd kernel vmcore: clean up read_vmcore()
> vmcore: allocate buffer for ELF headers on page-size alignment
>
>
> fs/proc/vmcore.c | 349 ++++++++++++++++++++++++++++++++---------------
> include/linux/proc_fs.h | 8 +
> 2 files changed, 245 insertions(+), 112 deletions(-)
>
> --
>
> Thanks.
> HATAYAMA, Daisuke
This is a very important patch set for speeding the kdump process.
(patches 1 - 8)
We have found the mmap interface to /proc/vmcore about 80x faster than the
read interface.
That is, doing mmap's and copying data (in pieces the size of page
structures) transfers all of /proc/vmcore about 80 times faster than
reading it.
This greatly speeds up the capture of a kdump, as the scan of page
structures takes the bulk of the time in dumping the OS on a machine
with terabytes of memory.
We would very much like to see this set make it into the 3.10 release.
Acked-by: Cliff Wickman <cpw@sgi.com>
-Cliff
--
Cliff Wickman
SGI
cpw@sgi.com
(651) 683-3824
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Cliff Wickman <cpw@sgi.com>
To: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: ebiederm@xmission.com, vgoyal@redhat.com,
kumagai-atsushi@mxc.nes.nec.co.jp, lisa.mitchell@hp.com,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 0/8] kdump, vmcore: support mmap() on /proc/vmcore
Date: Thu, 25 Apr 2013 08:38:25 -0500 [thread overview]
Message-ID: <20130425133825.GA25089@sgi.com> (raw)
In-Reply-To: <20130413002000.18245.21513.stgit@localhost6.localdomain6>
On Fri, Apr 05, 2013 at 12:04:02AM +0000, HATAYAMA Daisuke wrote:
> Currently, read to /proc/vmcore is done by read_oldmem() that uses
> ioremap/iounmap per a single page. For example, if memory is 1GB,
> ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
> times. This causes big performance degradation.
>
> In particular, the current main user of this mmap() is makedumpfile,
> which not only reads memory from /proc/vmcore but also does other
> processing like filtering, compression and IO work.
>
> To address the issue, this patch implements mmap() on /proc/vmcore to
> improve read performance.
>
> Benchmark
> =========
>
> You can see two benchmarks on terabyte memory system. Both show about
> 40 seconds on 2TB system. This is almost equal to performance by
> experimtanal kernel-side memory filtering.
>
> - makedumpfile mmap() benchmark, by Jingbai Ma
> https://lkml.org/lkml/2013/3/27/19
>
> - makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
> https://lkml.org/lkml/2013/3/26/914
>
> ChangeLog
> =========
>
> v3 => v4)
>
> - Rebase 3.9-rc7.
> - Drop clean-up patches orthogonal to the main topic of this patch set.
> - Copy ELF note segments in the 1st kernel just as in v1. Allocate
> vmcore objects per pages. => See [PATCH 5/8]
> - Map memory referenced by PT_LOAD entry directly even if the start or
> end of the region doesn't fit inside page boundary, no longer copy
> them as the previous v3. Then, holes, outside OS memory, are visible
> from /proc/vmcore. => See [PATCH 7/8]
>
> v2 => v3)
>
> - Rebase 3.9-rc3.
> - Copy program headers seprately from e_phoff in ELF note segment
> buffer. Now there's no risk to allocate huge memory if program
> header table positions after memory segment.
> - Add cleanup patch that removes unnecessary variable.
> - Fix wrongly using the variable that is buffer size configurable at
> runtime. Instead, use the varibale that has original buffer size.
>
> v1 => v2)
>
> - Clean up the existing codes: use e_phoff, and remove the assumption
> on PT_NOTE entries.
> - Fix potencial bug that ELF haeader size is not included in exported
> vmcoreinfo size.
> - Divide patch modifying read_vmcore() into two: clean-up and primary
> code change.
> - Put ELF note segments in page-size boundary on the 1st kernel
> instead of copying them into the buffer on the 2nd kernel.
>
> Test
> ====
>
> This patch set is composed based on v3.9-rc7.
>
> Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.
>
> ---
>
> HATAYAMA Daisuke (8):
> vmcore: support mmap() on /proc/vmcore
> vmcore: treat memory chunks referenced by PT_LOAD program header entries in \
> page-size boundary in vmcore_list
> vmcore: count holes generated by round-up operation for page boudary for size \
> of /proc/vmcore
> vmcore: copy ELF note segments in the 2nd kernel per page vmcore objects
> vmcore: Add helper function vmcore_add()
> vmcore, procfs: introduce MEM_TYPE_CURRENT_KERNEL flag to distinguish objects \
> copied in 2nd kernel vmcore: clean up read_vmcore()
> vmcore: allocate buffer for ELF headers on page-size alignment
>
>
> fs/proc/vmcore.c | 349 ++++++++++++++++++++++++++++++++---------------
> include/linux/proc_fs.h | 8 +
> 2 files changed, 245 insertions(+), 112 deletions(-)
>
> --
>
> Thanks.
> HATAYAMA, Daisuke
This is a very important patch set for speeding the kdump process.
(patches 1 - 8)
We have found the mmap interface to /proc/vmcore about 80x faster than the
read interface.
That is, doing mmap's and copying data (in pieces the size of page
structures) transfers all of /proc/vmcore about 80 times faster than
reading it.
This greatly speeds up the capture of a kdump, as the scan of page
structures takes the bulk of the time in dumping the OS on a machine
with terabytes of memory.
We would very much like to see this set make it into the 3.10 release.
Acked-by: Cliff Wickman <cpw@sgi.com>
-Cliff
--
Cliff Wickman
SGI
cpw@sgi.com
(651) 683-3824
next prev parent reply other threads:[~2013-04-25 13:38 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-13 0:21 [PATCH v4 0/8] kdump, vmcore: support mmap() on /proc/vmcore HATAYAMA Daisuke
2013-04-13 0:21 ` HATAYAMA Daisuke
2013-04-13 0:21 ` [PATCH v4 1/8] vmcore: allocate buffer for ELF headers on page-size alignment HATAYAMA Daisuke
2013-04-13 0:21 ` HATAYAMA Daisuke
2013-04-13 0:21 ` [PATCH v4 2/8] vmcore: clean up read_vmcore() HATAYAMA Daisuke
2013-04-13 0:21 ` HATAYAMA Daisuke
2013-04-13 0:21 ` [PATCH v4 3/8] vmcore, procfs: introduce MEM_TYPE_CURRENT_KERNEL flag to distinguish objects copied in 2nd kernel HATAYAMA Daisuke
2013-04-13 0:21 ` HATAYAMA Daisuke
2013-04-13 0:21 ` [PATCH v4 4/8] vmcore: Add helper function vmcore_add() HATAYAMA Daisuke
2013-04-13 0:21 ` HATAYAMA Daisuke
2013-04-13 0:21 ` [PATCH v4 5/8] vmcore: copy ELF note segments in the 2nd kernel per page vmcore objects HATAYAMA Daisuke
2013-04-13 0:21 ` HATAYAMA Daisuke
2013-04-29 19:36 ` Vivek Goyal
2013-04-29 19:36 ` Vivek Goyal
2013-05-07 7:56 ` HATAYAMA Daisuke
2013-05-07 7:56 ` HATAYAMA Daisuke
2013-05-07 15:08 ` Vivek Goyal
2013-05-07 15:08 ` Vivek Goyal
2013-05-08 4:57 ` HATAYAMA Daisuke
2013-05-08 4:57 ` HATAYAMA Daisuke
2013-04-13 0:21 ` [PATCH v4 6/8] vmcore: count holes generated by round-up operation for page boudary for size of /proc/vmcore HATAYAMA Daisuke
2013-04-13 0:21 ` HATAYAMA Daisuke
2013-04-13 0:21 ` [PATCH v4 7/8] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list HATAYAMA Daisuke
2013-04-13 0:21 ` HATAYAMA Daisuke
2013-04-29 19:51 ` Vivek Goyal
2013-04-29 19:51 ` Vivek Goyal
2013-05-07 7:38 ` HATAYAMA Daisuke
2013-05-07 7:38 ` HATAYAMA Daisuke
2013-05-07 15:24 ` Vivek Goyal
2013-05-07 15:24 ` Vivek Goyal
2013-04-13 0:21 ` [PATCH v4 8/8] vmcore: support mmap() on /proc/vmcore HATAYAMA Daisuke
2013-04-13 0:21 ` HATAYAMA Daisuke
2013-04-25 13:38 ` Cliff Wickman [this message]
2013-04-25 13:38 ` [PATCH v4 0/8] kdump, " Cliff Wickman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130425133825.GA25089@sgi.com \
--to=cpw@sgi.com \
--cc=d.hatayama@jp.fujitsu.com \
--cc=ebiederm@xmission.com \
--cc=kexec@lists.infradead.org \
--cc=kumagai-atsushi@mxc.nes.nec.co.jp \
--cc=linux-kernel@vger.kernel.org \
--cc=lisa.mitchell@hp.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.