From: Xunlei Pang <xpang@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>,
Xunlei Pang <xlpang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>,
x86@kernel.org, kexec@lists.infradead.org,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
"H. Peter Anvin" <hpa@zytor.com>,
akpm@linux-foundation.org, Dave Young <dyoung@redhat.com>
Subject: Re: [PATCH] x86_64, kexec: Avoid unnecessary identity mappings for kdump
Date: Mon, 20 Mar 2017 10:22:44 +0800 [thread overview]
Message-ID: <58CF3CF4.4060804@redhat.com> (raw)
In-Reply-To: <8737ebg5w6.fsf@xmission.com>
On 03/18/2017 at 01:38 AM, Eric W. Biederman wrote:
> Xunlei Pang <xlpang@redhat.com> writes:
>
>> kexec setups identity mappings for all the memory mapped in 1st kernel,
>> this is not necessary for the kdump case. Actually it can cause extra
>> memory consumption for paging structures, which is quite considerable
>> on modern machines with huge memory.
>>
>> E.g. On our 24TB machine, it will waste around 96MB (around 4MB/TB)
>> from the reserved memory range if setting all the identity mappings.
>>
>> It also causes some trouble for distributions that use an intelligent
>> policy to evaluate the proper "crashkernel=X" for users.
>>
>> To solve it, in case of kdump, we only setup identity mappings for the
>> crash memory and the ISA memory(may be needed by purgatory/kdump
>> boot).
> How about instead we detect the presence of 1GiB pages and use them
> if they are available. We already use 2MiB pages. If we can do that
> we will only need about 192K for page tables in the case you have
> described and this all becomes a non-issue.
>
> I strongly suspect that the presence of 24TiB of memory in an x86 system
> strongly correlates to the presence of 1GiB pages.
>
> In principle we certainly can use a less extensive mapping but that
> should not be something that differs between the two kexec cases.
Ok, will try gbpages for the identity mapping.
Regards,
Xunlei
> I can see forcing the low 1MiB range in. But calling it ISA range is
> very wrong and misleading. The reasons that range are special during
> boot-up have nothing to do with ISA. But have everything to do with
> where legacy page tables are mapped, and where we need identity pages to
> start other cpus. I think the only user that actually cares is
> purgatory where it plays swapping games with the low 1MiB because we
> can't preload what we need to down there or it would mess up the running
> kernel. So saying anything about the old ISA bus is wrong and
> misleading. At the very very least we need accurate comments.
>
> Eric
>
>
>> Signed-off-by: Xunlei Pang <xlpang@redhat.com>
>> ---
>> arch/x86/kernel/machine_kexec_64.c | 34 ++++++++++++++++++++++++++++++----
>> 1 file changed, 30 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 857cdbd..db77a76 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -112,14 +112,40 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>>
>> level4p = (pgd_t *)__va(start_pgtable);
>> clear_page(level4p);
>> - for (i = 0; i < nr_pfn_mapped; i++) {
>> - mstart = pfn_mapped[i].start << PAGE_SHIFT;
>> - mend = pfn_mapped[i].end << PAGE_SHIFT;
>>
>> + if (image->type == KEXEC_TYPE_CRASH) {
>> + /* Always map the ISA range */
>> result = kernel_ident_mapping_init(&info,
>> - level4p, mstart, mend);
>> + level4p, 0, ISA_END_ADDRESS);
>> if (result)
>> return result;
>> +
>> + /* crashk_low_res may not be initialized when reaching here */
>> + if (crashk_low_res.end) {
>> + mstart = crashk_low_res.start;
>> + mend = crashk_low_res.end + 1;
>> + result = kernel_ident_mapping_init(&info,
>> + level4p, mstart, mend);
>> + if (result)
>> + return result;
>> + }
>> +
>> + mstart = crashk_res.start;
>> + mend = crashk_res.end + 1;
>> + result = kernel_ident_mapping_init(&info,
>> + level4p, mstart, mend);
>> + if (result)
>> + return result;
>> + } else {
>> + for (i = 0; i < nr_pfn_mapped; i++) {
>> + mstart = pfn_mapped[i].start << PAGE_SHIFT;
>> + mend = pfn_mapped[i].end << PAGE_SHIFT;
>> +
>> + result = kernel_ident_mapping_init(&info,
>> + level4p, mstart, mend);
>> + if (result)
>> + return result;
>> + }
>> }
>>
>> /*
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Xunlei Pang <xpang@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>,
Xunlei Pang <xlpang@redhat.com>
Cc: linux-kernel@vger.kernel.org, kexec@lists.infradead.org,
akpm@linux-foundation.org, Dave Young <dyoung@redhat.com>,
Baoquan He <bhe@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
Ingo Molnar <mingo@redhat.com>,
x86@kernel.org
Subject: Re: [PATCH] x86_64, kexec: Avoid unnecessary identity mappings for kdump
Date: Mon, 20 Mar 2017 10:22:44 +0800 [thread overview]
Message-ID: <58CF3CF4.4060804@redhat.com> (raw)
In-Reply-To: <8737ebg5w6.fsf@xmission.com>
On 03/18/2017 at 01:38 AM, Eric W. Biederman wrote:
> Xunlei Pang <xlpang@redhat.com> writes:
>
>> kexec setups identity mappings for all the memory mapped in 1st kernel,
>> this is not necessary for the kdump case. Actually it can cause extra
>> memory consumption for paging structures, which is quite considerable
>> on modern machines with huge memory.
>>
>> E.g. On our 24TB machine, it will waste around 96MB (around 4MB/TB)
>> from the reserved memory range if setting all the identity mappings.
>>
>> It also causes some trouble for distributions that use an intelligent
>> policy to evaluate the proper "crashkernel=X" for users.
>>
>> To solve it, in case of kdump, we only setup identity mappings for the
>> crash memory and the ISA memory(may be needed by purgatory/kdump
>> boot).
> How about instead we detect the presence of 1GiB pages and use them
> if they are available. We already use 2MiB pages. If we can do that
> we will only need about 192K for page tables in the case you have
> described and this all becomes a non-issue.
>
> I strongly suspect that the presence of 24TiB of memory in an x86 system
> strongly correlates to the presence of 1GiB pages.
>
> In principle we certainly can use a less extensive mapping but that
> should not be something that differs between the two kexec cases.
Ok, will try gbpages for the identity mapping.
Regards,
Xunlei
> I can see forcing the low 1MiB range in. But calling it ISA range is
> very wrong and misleading. The reasons that range are special during
> boot-up have nothing to do with ISA. But have everything to do with
> where legacy page tables are mapped, and where we need identity pages to
> start other cpus. I think the only user that actually cares is
> purgatory where it plays swapping games with the low 1MiB because we
> can't preload what we need to down there or it would mess up the running
> kernel. So saying anything about the old ISA bus is wrong and
> misleading. At the very very least we need accurate comments.
>
> Eric
>
>
>> Signed-off-by: Xunlei Pang <xlpang@redhat.com>
>> ---
>> arch/x86/kernel/machine_kexec_64.c | 34 ++++++++++++++++++++++++++++++----
>> 1 file changed, 30 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 857cdbd..db77a76 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -112,14 +112,40 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>>
>> level4p = (pgd_t *)__va(start_pgtable);
>> clear_page(level4p);
>> - for (i = 0; i < nr_pfn_mapped; i++) {
>> - mstart = pfn_mapped[i].start << PAGE_SHIFT;
>> - mend = pfn_mapped[i].end << PAGE_SHIFT;
>>
>> + if (image->type == KEXEC_TYPE_CRASH) {
>> + /* Always map the ISA range */
>> result = kernel_ident_mapping_init(&info,
>> - level4p, mstart, mend);
>> + level4p, 0, ISA_END_ADDRESS);
>> if (result)
>> return result;
>> +
>> + /* crashk_low_res may not be initialized when reaching here */
>> + if (crashk_low_res.end) {
>> + mstart = crashk_low_res.start;
>> + mend = crashk_low_res.end + 1;
>> + result = kernel_ident_mapping_init(&info,
>> + level4p, mstart, mend);
>> + if (result)
>> + return result;
>> + }
>> +
>> + mstart = crashk_res.start;
>> + mend = crashk_res.end + 1;
>> + result = kernel_ident_mapping_init(&info,
>> + level4p, mstart, mend);
>> + if (result)
>> + return result;
>> + } else {
>> + for (i = 0; i < nr_pfn_mapped; i++) {
>> + mstart = pfn_mapped[i].start << PAGE_SHIFT;
>> + mend = pfn_mapped[i].end << PAGE_SHIFT;
>> +
>> + result = kernel_ident_mapping_init(&info,
>> + level4p, mstart, mend);
>> + if (result)
>> + return result;
>> + }
>> }
>>
>> /*
next prev parent reply other threads:[~2017-03-20 2:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-17 10:22 [PATCH] x86_64, kexec: Avoid unnecessary identity mappings for kdump Xunlei Pang
2017-03-17 10:22 ` Xunlei Pang
2017-03-17 17:38 ` Eric W. Biederman
2017-03-17 17:38 ` Eric W. Biederman
2017-03-20 2:22 ` Xunlei Pang [this message]
2017-03-20 2:22 ` Xunlei Pang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58CF3CF4.4060804@redhat.com \
--to=xpang@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=dyoung@redhat.com \
--cc=ebiederm@xmission.com \
--cc=hpa@zytor.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=x86@kernel.org \
--cc=xlpang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.