From: Eric DeVolder <eric.devolder@oracle.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Baoquan He <bhe@redhat.com>,
david@redhat.com, Oscar Salvador <osalvador@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, x86@kernel.org,
kexec@lists.infradead.org, ebiederm@xmission.com,
dyoung@redhat.com, vgoyal@redhat.com, tglx@linutronix.de,
mingo@redhat.com, dave.hansen@linux.intel.com, hpa@zytor.com,
nramas@linux.microsoft.com, thomas.lendacky@amd.com,
robh@kernel.org, efault@gmx.de, rppt@kernel.org,
sourabhjain@linux.ibm.com, linux-mm@kvack.org
Subject: Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support
Date: Fri, 28 Oct 2022 10:29:45 -0500 [thread overview]
Message-ID: <d91f8728-6a63-415d-577c-bd76e69ec7f6@oracle.com> (raw)
In-Reply-To: <Y1uspLb7fLdtnQq+@zn.tnic>
On 10/28/22 05:19, Borislav Petkov wrote:
> On Thu, Oct 27, 2022 at 02:24:11PM -0500, Eric DeVolder wrote:
>> Be aware, in reality, that if the system was fully populated, it would not
>> actually consume all 8192 phdrs. Rather /proc/iomem would essentially show a
>> large contiguous address space which would require just a single phdr.
>
> Then that from below:
>
> pnum += CONFIG_CRASH_MAX_MEMORY_RANGES;
>
> which then would end up allocating 8192 would be a total waste.
>
> So why don't you make that number dynamic then?
>
> You start with something sensible:
>
> total_num_pheaders = num_online_cpus() + "some number of regions" + "some few others"
>
> I.e., a number which is a good compromise on the majority of machines.
>
> Then, on hotplug events you count how many new regions are coming in
> and when you reach the total_num_pheaders number, you double it (or some
> other increase stragegy), reallocate the ELF header buffers etc needed
> for kdump and you're good.
>
> This way, you don't waste memory unnecessarily on the majority of
> systems and those who need more, get to allocate more.
This patch series sizes and allocates the memory buffer/segment for the elfcorehdr once, at kdump
load time.
In order to dynamically resize the elcorehdr memory buffer/segment, that causes the following ripple
effects:
- Permitting resizing of the elfcorehdr requires a means to "allocate" a new size buffer from
within the crash kernel reserved area. There is no allocator today; currently it is a kind of
one-pass placement process that happens at load time. The critical side effect of allocating a new
elfcorehdr buffer memory/segment is that it creates a new address for the elfcorehdr.
- The elfcorehdr is passed to the crash kernel via the elfcorehdr= kernel cmdline option. As such,
a dynamic change to the size of the elfcorehdr size necessarily invites a change of address of that
buffer, and therefore a change to rewrite the crash kernel cmdline to reflect the new elfcorehdr
buffer address.
- A change to the cmdline, also invites a possible change of address of the buffer containing the
cmdline, and thus a change to the x86 boot_params, which contains the cmdline pointer.
- A change to the cmdline and/or boot_params, which are *not* excluded from the hash/digest, means
that the purgatory hash/digest needs to be recomputed, and purgatory re-linked with the new
hash/digest and replaced.
A fair amount of work, but I have had this working in the past, around the v1 patch series
timeframe. However, it only worked for the kexec_file_load() syscall as all the needed pieces of
information were available; but for kexec_load(), it isn't possible to relink purgatory as by that
point purgatory is but a user-space binary blob.
It was feedback on the v1/v2 that pointed out that by excluding the elfcorehdr from the hash/digest,
the "change of address" problem with the elfcorehdr buffer/segment goes away, and, in turn, negates
the need to: introduce an allocator for the crash kernel reserved space, rewrite the crash kernel
cmdline with a new elfcorehdr, update boot_params with a new cmdline and re-link and replace
purgatory with the updated digest. And it enables this hotplug efforts to support kexec_load()
syscall as well.
So it is with this in mind that I suggest we stay with the statically sized elfcorehdr buffer.
If that can be agreed upon, then it is "just a matter" of picking a useful elfcorehdr size.
Currently that size is derived from the NR_DEFAULT_CPUS and CRASH_MAX_MEMORY_RANGES. So, there is
still the CRASH_MAX_MEMORY_RANGES knob to help a dial in size, should there be some issue with the
default value/size.
Or if there is desire to drop computing the size from NR_DEFAULT_CPUs and CRASH_MAX_MEMORY_RANGES
and simply go with CRASH_HOTPLUG_ELFCOREHDR_SZ which simply specifies the buffer size, then I'm also
good with that.
I still owe a much better explanation of how to size the elfcorehdr. I can use the comments and
ideas from the discussion to provide the necessary insight when choosing this value, whether that be
CRASH_MAX_MEMORY_RANGES or CRASH_HOTPLUG_ELFCOREHDR_SZ.
>
>> I'd prefer keeping CRASH_MAX_MEMORY_RANGES as that allow the maximum phdr
>> number value to be reflective of CPUs and/or memory; not all systems support
>> both CPU and memory hotplug. For example, I have queued up this change to
>> reflect this:
>>
>> if (IS_ENABLED(CONFIG_HOTPLUG_CPU) || IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
>
> If you're going to keep CRASH_MAX_MEMORY_RANGES, then you can test only
> that thing as it expresses the dependency on CONFIG_HOTPLUG_CPU and
> CONFIG_MEMORY_HOTPLUG already.
>
> If you end up making the number dynamic, then you could make that a
> different Kconfig item which contains all that crash code as most of the
> people won't need it anyway.
It is my intention to correct the CRASH_MAX_MEMORY_RANGES (if we keep it) as such:
config CRASH_MAX_MEMORY_RANGES
depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG
CRASH_MAX_MEMORY_RANGES should have never had CPU_HOTPLUG as a dependency; that was a cut-n-paste
error on my part.
>
> Hmm?
>
Thank you for the time and thought on this topic!
eric
next prev parent reply other threads:[~2022-10-28 15:30 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20220909210509.6286-1-eric.devolder@oracle.com>
[not found] ` <20220909210509.6286-8-eric.devolder@oracle.com>
[not found] ` <Yx7XEcXZ8PwwQW95@nazgul.tnic>
[not found] ` <cb343eef-46be-2d67-b93a-84c75be86325@oracle.com>
[not found] ` <YzRxPAoN+XmOfJzV@zn.tnic>
[not found] ` <fd08c13d-a917-4cd6-85ec-267e0fe74c41@oracle.com>
2022-09-30 16:50 ` [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support Borislav Petkov
2022-09-30 17:11 ` Eric DeVolder
2022-09-30 17:40 ` Borislav Petkov
2022-10-08 2:35 ` Baoquan He
2022-10-12 17:46 ` Borislav Petkov
2022-10-12 20:19 ` Eric DeVolder
2022-10-12 20:41 ` Borislav Petkov
2022-10-13 2:57 ` Baoquan He
2022-10-25 10:31 ` Borislav Petkov
2022-10-26 14:48 ` Baoquan He
2022-10-26 14:54 ` David Hildenbrand
2022-10-27 13:52 ` Baoquan He
2022-10-27 19:28 ` Eric DeVolder
2022-10-29 4:27 ` Baoquan He
2022-10-27 19:24 ` Eric DeVolder
2022-10-28 10:19 ` Borislav Petkov
2022-10-28 15:29 ` Eric DeVolder [this message]
2022-10-28 17:06 ` Borislav Petkov
2022-10-28 19:26 ` Eric DeVolder
2022-10-28 20:30 ` Borislav Petkov
2022-10-28 20:34 ` Eric DeVolder
2022-10-28 21:22 ` Eric DeVolder
2022-10-28 22:19 ` Borislav Petkov
2022-10-12 20:42 ` Eric DeVolder
2022-10-12 16:20 ` Eric DeVolder
2022-10-25 10:39 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d91f8728-6a63-415d-577c-bd76e69ec7f6@oracle.com \
--to=eric.devolder@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=dyoung@redhat.com \
--cc=ebiederm@xmission.com \
--cc=efault@gmx.de \
--cc=hpa@zytor.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=nramas@linux.microsoft.com \
--cc=osalvador@suse.de \
--cc=robh@kernel.org \
--cc=rppt@kernel.org \
--cc=sourabhjain@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=vgoyal@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).