From: Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linuxppc-dev <linuxppc-dev@ozlabs.org>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Hari Bathini <hbathini@linux.vnet.ibm.com>,
Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Anshuman Khandual <khandual@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Ananth Narayan <ananth@in.ibm.com>,
kernelfans@gmail.com
Subject: Re: [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump.
Date: Tue, 17 Jul 2018 16:58:10 +0530 [thread overview]
Message-ID: <cb5fc554-7fa3-1356-c304-ae8c27a0c70c@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180716082646.GF17280@dhcp22.suse.cz>
On 07/16/2018 01:56 PM, Michal Hocko wrote:
> On Mon 16-07-18 11:32:56, Mahesh J Salgaonkar wrote:
>> One of the primary issues with Firmware Assisted Dump (fadump) on Power
>> is that it needs a large amount of memory to be reserved. This reserved
>> memory is used for saving the contents of old crashed kernel's memory before
>> fadump capture kernel uses old kernel's memory area to boot. However, This
>> reserved memory area stays unused until system crash and isn't available
>> for production kernel to use.
>
> How much memory are we talking about. Regular kernel dump process needs
> some reserved memory as well. Why that is not a big problem?
We reserve around 5% of total system RAM. On large systems with
TeraBytes of memory, this reservation can be quite significant.
The regular kernel dump uses the kexec method to boot into capture
kernel and it can control the parameters that are being passed to
capture kernel. This allows a capability to strip down the parameters
that can help lowering down the memory requirement for capture kernel to
boot. This allows regular kdump to reserve less memory to start with.
Where as fadump depends on power firmware (pHyp) to load the capture
kernel after full reset and boots like a regular kernel. It needs same
amount of memory to boot as the production kernel. On large systems
production kernel needs significant amount of memory to boot. Hence
fadump needs to reserve enough memory for capture kernel to boot
successfully and execute dump capturing operations. By default fadump
reserves 5% of total system RAM and in most cases this has worked
flawlessly on variety of system configurations. Optionally,
'crashkernel=X' can also be used to specify more fine-tuned memory size
for reservation.
>
>> Instead of setting aside a significant chunk of memory that nobody can use,
>> take advantage ZONE_MOVABLE to mark a significant chunk of reserved memory
>> as ZONE_MOVABLE, so that the kernel is prevented from using, but
>> applications are free to use it.
>
> Why kernel cannot use that memory while userspace can?
fadump needs to reserve memory to be able to save crashing kernel's
memory, with help from power firmware, before the capture kernel loads
into crashing kernel's memory area. Any contents present in this
reserved memory will be over-written. If kernel is allowed to use this
memory, then we loose that kernel data and won't be part of captured
dump, which could be critical to debug root cause of system crash.
Kdump and fadump both uses same infrastructure/tool (makedumpfile) to
capture the memory dump. While the tool provides flexibility to
determine what needs to be part of the dump and what memory to filter
out, all supported distributions defaults to "Capture only kernel data
and nothing else". Taking advantage of this default we can at least make
the reserved memory available for userspace to use.
If someone wants to capture userspace data as well then
'fadump=nonmovable' option can be used where reserved pages won't be
marked zone movable.
Advantage of movable method is the reserved memory chunk is also
available for use.
> [...]
>> Documentation/powerpc/firmware-assisted-dump.txt | 18 +++
>> arch/powerpc/include/asm/fadump.h | 7 +
>> arch/powerpc/kernel/fadump.c | 123 +++++++++++++++++--
>> arch/powerpc/platforms/pseries/hotplug-memory.c | 7 +
>> include/linux/mmzone.h | 2
>> mm/page_alloc.c | 146 ++++++++++++++++++++++
>> 6 files changed, 290 insertions(+), 13 deletions(-)
>
> This is quite a large change and you didn't seem to explain why we need
> it.
>
In fadump case, the reserved memory stays unused until system is
crashed. fadump uses very small portion of this reserved memory, few
KBs, for storing fadump metadata. Otherwise, the significant chunk of
memory is completely unused. Hence, instead of blocking a memory that is
un-utilized through out the lifetime of system, it's better to give it
back to production kernel to use. But at the same time we don't want
kernel to use that memory. While exploring we found 1) Linux kernel's
Contiguous Memory Allocator (CMA) feature and 2) ZONE_MOVABLE, that
suites the requirement. Initial 5 revisions of this patchset () was
using CMA feature. However, fadump does not do any cma allocations,
hence it will be more appropriate to use zone movable to achieve the same.
But unlike CMA, there is no interface available to mark a custom
reserved memory area as ZONE_MOVABLE. Hence patch 1/4 proposes the same.
Thanks,
-Mahesh.
next prev parent reply other threads:[~2018-07-17 11:28 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-16 6:02 [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
2018-07-16 6:03 ` [RFC PATCH v6 1/4] mm/page_alloc: Introduce an interface to mark reserved memory as ZONE_MOVABLE Mahesh J Salgaonkar
2018-07-16 6:03 ` [RFC PATCH v6 2/4] powerpc/fadump: Reservationless firmware assisted dump Mahesh J Salgaonkar
2018-07-16 6:03 ` [RFC PATCH v6 3/4] powerpc/fadump: throw proper error message on fadump registration failure Mahesh J Salgaonkar
2018-07-16 6:03 ` [RFC PATCH v6 4/4] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area Mahesh J Salgaonkar
2018-07-16 8:26 ` [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump Michal Hocko
2018-07-17 11:28 ` Mahesh Jagannath Salgaonkar [this message]
2018-07-17 11:52 ` Michal Hocko
2018-07-18 16:22 ` Mahesh Jagannath Salgaonkar
2018-07-19 8:08 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cb5fc554-7fa3-1356-c304-ae8c27a0c70c@linux.vnet.ibm.com \
--to=mahesh@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=ananth@in.ibm.com \
--cc=ananth@linux.vnet.ibm.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=hbathini@linux.vnet.ibm.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kernelfans@gmail.com \
--cc=khandual@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=mhocko@kernel.org \
--cc=srikar@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).