From: ebiederm@xmission.com (Eric W. Biederman)
To: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: kexec@lists.infradead.org, heiko.carstens@de.ibm.com,
linux-kernel@vger.kernel.org, lisa.mitchell@hp.com,
kumagai-atsushi@mxc.nes.nec.co.jp, zhangyanfei@cn.fujitsu.com,
akpm@linux-foundation.org, cpw@sgi.com, vgoyal@redhat.com
Subject: Re: [PATCH v3 18/21] vmcore: check if vmcore objects satify mmap()'s page-size boundary requirement
Date: Wed, 20 Mar 2013 21:18:37 -0700 [thread overview]
Message-ID: <8738vp75cy.fsf@xmission.com> (raw)
In-Reply-To: <20130321.122501.82758179.d.hatayama@jp.fujitsu.com> (HATAYAMA Daisuke's message of "Thu, 21 Mar 2013 12:25:01 +0900 (JST)")
HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> Subject: Re: [PATCH v3 18/21] vmcore: check if vmcore objects satify mmap()'s page-size boundary requirement
> Date: Wed, 20 Mar 2013 13:55:55 -0700
>
>> Vivek Goyal <vgoyal@redhat.com> writes:
>>
>>> On Tue, Mar 19, 2013 at 03:38:45PM -0700, Eric W. Biederman wrote:
>>>> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes:
>>>>
>>>> > If there's some vmcore object that doesn't satisfy page-size boundary
>>>> > requirement, remap_pfn_range() fails to remap it to user-space.
>>>> >
>>>> > Objects that posisbly don't satisfy the requirement are ELF note
>>>> > segments only. The memory chunks corresponding to PT_LOAD entries are
>>>> > guaranteed to satisfy page-size boundary requirement by the copy from
>>>> > old memory to buffer in 2nd kernel done in later patch.
>>>> >
>>>> > This patch doesn't copy each note segment into the 2nd kernel since
>>>> > they amount to so large in total if there are multiple CPUs. For
>>>> > example, current maximum number of CPUs in x86_64 is 5120, where note
>>>> > segments exceed 1MB with NT_PRSTATUS only.
>>>>
>>>> So you require the first kernel to reserve an additional 20MB, instead
>>>> of just 1.6MB. 336 bytes versus 4096 bytes.
>>>>
>>>> That seems like completely the wrong tradeoff in memory consumption,
>>>> filesize, and backwards compatibility.
>>>
>>> Agreed.
>>>
>>> So we already copy ELF headers in second kernel's memory. If we start
>>> copying notes too, then both headers and notes will support mmap().
>>
>> The only real is it could be a bit tricky to allocate all of the memory
>> for the notes section on high cpu count systems in a single allocation.
>>
>
> Do you mean it's getting difficult on many-cpus machine to get free
> pages consequtive enough to be able to cover all the notes?
>
> If so, is it necessary to think about any care to it in the next
> patch? Or, should it be pending for now?
I meant that in general allocations > PAGE_SIZE get increasingly
unreliable the larger they are. And on large cpu count machines we are
having larger allocations. Of course large cpu count machines typically
have more memory so the odds go up.
Right now MAX_ORDER seems to be set to 11 which is 8MiB, and my x86_64
machine certainly succeeded in an order 11 allocation during boot so I
don't expect any real problems with a 2MiB allocation but it is
something to keep an eye on with kernel memory.
>>> For mmap() of memory regions which are not page aligned, we can map
>>> extra bytes (as you suggested in one of the mails). Given the fact
>>> that we have one ELF header for every memory range, we can always modify
>>> the file offset where phdr data is starting to make space for mapping
>>> of extra bytes.
>>
>> Agreed ELF file offset % PAGE_SIZE should == physical address % PAGE_SIZE to
>> make mmap work.
>>
>
> OK, your conclusion is the 1st version is better than the 2nd.
>
> The purpose of this design was not to export anything but dump target
> memory to user-space from /proc/vmcore. I think it better to do it if
> possible. it's possible for read interface to fill the corresponding
> part with 0. But it's impossible for mmap interface to data on modify
> old memory.
In practice someone lied. You can't have a chunk of memory that is
smaller than page size. So I don't see it doing any harm to export
the memory that is there but some silly system lied to us about.
> Do you agree two vmcores seen from read and mmap interfaces are no
> longer coincide?
That is an interesting point. I don't think there is any point in
having read and mmap disagree, that just seems to lead to complications,
especially since the data we are talking about adding is actually memory
contents.
I do think it makes sense to have logical chunks of the file that are
not covered by PT_LOAD segments. Logical chunks like the leading edge
of a page inside of which a PT_LOAD segment starts, and the trailing
edge of a page in which a PT_LOAD segment ends.
Implementaton wise this would mean extending the struct vmcore entry to
cover missing bits, by rounding down the start address and rounding up
the end address to the nearest page size boundary. The generated
PT_LOAD segment would then have it's file offset adjusted to point skip
the bytes of the page that are there but we don't care about.
Eric
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2013-03-21 4:18 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-16 4:00 [PATCH v3 00/21] kdump, vmcore: support mmap() on /proc/vmcore HATAYAMA Daisuke
2013-03-16 4:00 ` [PATCH v3 01/21] vmcore: reference e_phoff member explicitly to get position of program header table HATAYAMA Daisuke
2013-03-19 21:44 ` Eric W. Biederman
2013-03-21 2:50 ` HATAYAMA Daisuke
2013-03-21 6:11 ` Eric W. Biederman
2013-03-21 14:12 ` Vivek Goyal
2013-03-22 0:25 ` HATAYAMA Daisuke
2013-03-16 4:00 ` [PATCH v3 02/21] vmcore: clean up by removing unnecessary variable HATAYAMA Daisuke
2013-03-16 4:01 ` [PATCH v3 03/21] vmcore: rearrange program headers without assuming consequtive PT_NOTE entries HATAYAMA Daisuke
2013-03-19 21:59 ` Eric W. Biederman
2013-03-16 4:01 ` [PATCH v3 04/21] vmcore, sysfs: export ELF note segment size instead of vmcoreinfo data size HATAYAMA Daisuke
2013-03-16 4:01 ` [PATCH v3 05/21] vmcore: allocate buffer for ELF headers on page-size alignment HATAYAMA Daisuke
2013-03-16 4:01 ` [PATCH v3 06/21] vmcore: round up buffer size of ELF headers by PAGE_SIZE HATAYAMA Daisuke
2013-03-19 22:07 ` Eric W. Biederman
2013-03-16 4:01 ` [PATCH v3 07/21] vmcore, procfs: introduce a flag to distinguish objects copied in 2nd kernel HATAYAMA Daisuke
2013-03-19 19:35 ` Andrew Morton
2013-03-16 4:01 ` [PATCH v3 08/21] vmcore: copy non page-size aligned head and tail pages " HATAYAMA Daisuke
2013-03-19 19:37 ` Andrew Morton
2013-03-19 20:59 ` Eric W. Biederman
2013-03-19 21:22 ` Vivek Goyal
2013-03-19 23:35 ` Eric W. Biederman
2013-03-16 4:01 ` [PATCH v3 09/21] vmcore: modify vmcore clean-up function to free buffer on " HATAYAMA Daisuke
2013-03-16 4:01 ` [PATCH v3 10/21] vmcore: clean up read_vmcore() HATAYAMA Daisuke
2013-03-16 4:01 ` [PATCH v3 11/21] vmcore: read buffers for vmcore objects copied from old memory HATAYAMA Daisuke
2013-03-16 4:01 ` [PATCH v3 12/21] vmcore: allocate per-cpu crash_notes objects on page-size boundary HATAYAMA Daisuke
2013-03-19 21:06 ` Eric W. Biederman
2013-03-19 22:12 ` Eric W. Biederman
2013-03-20 13:48 ` Vivek Goyal
2013-03-20 20:48 ` Eric W. Biederman
2013-03-16 4:02 ` [PATCH v3 13/21] kexec: allocate vmcoreinfo note buffer " HATAYAMA Daisuke
2013-03-19 21:07 ` Eric W. Biederman
2013-03-19 22:12 ` Eric W. Biederman
2013-03-16 4:02 ` [PATCH v3 14/21] kexec, elf: introduce NT_VMCORE_DEBUGINFO note type HATAYAMA Daisuke
2013-03-16 4:02 ` [PATCH v3 15/21] elf: introduce NT_VMCORE_PAD type HATAYAMA Daisuke
2013-03-16 4:02 ` [PATCH v3 16/21] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary HATAYAMA Daisuke
2013-03-19 22:17 ` Eric W. Biederman
2013-03-16 4:02 ` [PATCH v3 17/21] vmcore: check NT_VMCORE_PAD as a mark indicating the end of ELF note buffer HATAYAMA Daisuke
2013-03-19 21:11 ` Eric W. Biederman
2013-03-21 2:59 ` HATAYAMA Daisuke
2013-03-21 3:54 ` Eric W. Biederman
2013-03-21 14:36 ` Vivek Goyal
2013-03-22 0:30 ` HATAYAMA Daisuke
2013-03-22 0:41 ` Eric W. Biederman
2013-03-19 22:20 ` Eric W. Biederman
2013-03-16 4:02 ` [PATCH v3 18/21] vmcore: check if vmcore objects satify mmap()'s page-size boundary requirement HATAYAMA Daisuke
2013-03-19 20:02 ` Andrew Morton
2013-03-19 21:22 ` Eric W. Biederman
2013-03-20 13:51 ` Vivek Goyal
2013-03-19 22:38 ` Eric W. Biederman
2013-03-20 13:57 ` Vivek Goyal
2013-03-20 20:55 ` Eric W. Biederman
2013-03-21 3:25 ` HATAYAMA Daisuke
2013-03-21 4:18 ` Eric W. Biederman [this message]
2013-03-21 6:14 ` HATAYAMA Daisuke
2013-03-21 6:29 ` Eric W. Biederman
2013-03-21 6:46 ` HATAYAMA Daisuke
2013-03-21 7:07 ` Eric W. Biederman
2013-03-21 15:21 ` Vivek Goyal
2013-03-21 15:27 ` Vivek Goyal
2013-03-22 0:43 ` HATAYAMA Daisuke
2013-03-22 0:54 ` Eric W. Biederman
2013-03-22 2:30 ` HATAYAMA Daisuke
2013-03-21 14:57 ` Vivek Goyal
2013-03-21 7:22 ` Eric W. Biederman
2013-03-21 14:49 ` Vivek Goyal
2013-03-22 7:11 ` HATAYAMA Daisuke
2013-03-21 13:50 ` Vivek Goyal
2013-03-16 4:02 ` [PATCH v3 19/21] vmcore: round-up offset of vmcore object in page-size boundary HATAYAMA Daisuke
2013-03-16 4:02 ` [PATCH v3 20/21] vmcore: count holes generated by round-up operation for vmcore size HATAYAMA Daisuke
2013-03-16 4:02 ` [PATCH v3 21/21] vmcore: introduce mmap_vmcore() HATAYAMA Daisuke
2013-03-19 19:30 ` [PATCH v3 00/21] kdump, vmcore: support mmap() on /proc/vmcore Andrew Morton
2013-03-21 3:52 ` HATAYAMA Daisuke
2013-03-21 6:16 ` Eric W. Biederman
2013-03-21 6:35 ` HATAYAMA Daisuke
2013-03-21 7:14 ` Eric W. Biederman
2013-03-19 23:16 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8738vp75cy.fsf@xmission.com \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=cpw@sgi.com \
--cc=d.hatayama@jp.fujitsu.com \
--cc=heiko.carstens@de.ibm.com \
--cc=kexec@lists.infradead.org \
--cc=kumagai-atsushi@mxc.nes.nec.co.jp \
--cc=linux-kernel@vger.kernel.org \
--cc=lisa.mitchell@hp.com \
--cc=vgoyal@redhat.com \
--cc=zhangyanfei@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox