linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: linuxppc-dev <linuxppc-dev@ozlabs.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Hari Bathini <hbathini@linux.vnet.ibm.com>,
	Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Ananth Narayan <ananth@in.ibm.com>,
	kernelfans@gmail.com
Subject: Re: [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump.
Date: Thu, 19 Jul 2018 10:08:05 +0200	[thread overview]
Message-ID: <20180719080805.GM7193@dhcp22.suse.cz> (raw)
In-Reply-To: <b2c316b9-520e-0a31-3ebd-d3ade50c3783@linux.vnet.ibm.com>

On Wed 18-07-18 21:52:17, Mahesh Jagannath Salgaonkar wrote:
> On 07/17/2018 05:22 PM, Michal Hocko wrote:
> > On Tue 17-07-18 16:58:10, Mahesh Jagannath Salgaonkar wrote:
> >> On 07/16/2018 01:56 PM, Michal Hocko wrote:
> >>> On Mon 16-07-18 11:32:56, Mahesh J Salgaonkar wrote:
> >>>> One of the primary issues with Firmware Assisted Dump (fadump) on Power
> >>>> is that it needs a large amount of memory to be reserved. This reserved
> >>>> memory is used for saving the contents of old crashed kernel's memory before
> >>>> fadump capture kernel uses old kernel's memory area to boot. However, This
> >>>> reserved memory area stays unused until system crash and isn't available
> >>>> for production kernel to use.
> >>>
> >>> How much memory are we talking about. Regular kernel dump process needs
> >>> some reserved memory as well. Why that is not a big problem?
> >>
> >> We reserve around 5% of total system RAM. On large systems with
> >> TeraBytes of memory, this reservation can be quite significant.
> >>
> >> The regular kernel dump uses the kexec method to boot into capture
> >> kernel and it can control the parameters that are being passed to
> >> capture kernel. This allows a capability to strip down the parameters
> >> that can help lowering down the memory requirement for capture kernel to
> >> boot. This allows regular kdump to reserve less memory to start with.
> >>
> >> Where as fadump depends on power firmware (pHyp) to load the capture
> >> kernel after full reset and boots like a regular kernel. It needs same
> >> amount of memory to boot as the production kernel. On large systems
> >> production kernel needs significant amount of memory to boot. Hence
> >> fadump needs to reserve enough memory for capture kernel to boot
> >> successfully and execute dump capturing operations. By default fadump
> >> reserves 5% of total system RAM and in most cases this has worked
> >> flawlessly on variety of system configurations. Optionally,
> >> 'crashkernel=X' can also be used to specify more fine-tuned memory size
> >> for reservation.
> > 
> > So why do we even care about fadump when regular kexec provides
> > (presumably) same functionality with a smaller memory footprint? Or is
> > there any reason why kexec doesn't work well on ppc?
> 
> Kexec based kdump is loaded by crashing kernel. When OS crashes, the
> system is in an inconsistent state, especially the devices. In some
> cases, a rogue DMA or ill-behaving device drivers can cause the kdump
> capture to fail.
> 
> On power platform, fadump solves these issues by taking help from power
> firmware, to fully-reset the system, load the fresh copy of same kernel
> to capture the dump with PCI and I/O devices reinitialized, making it
> more reliable.

Thanks for the clarification.

> Fadump does full system reset, booting system through the regular boot
> options i.e the dump capture kernel is booted in the same fashion and
> doesn't have specialized kernel command line option. This implies, we
> need to give more memory for the system boot. Since the new kernel boots
> from the same memory location as crashed kernel, we reserve 5% of memory
> where power firmware moves the crashed kernel's memory content. This
> reserved memory is completely removed from the available memory. For
> large memory systems like 64TB systems, this account to ~ 3TB, which is
> a significant chunk of memory production kernel is deprived of. Hence,
> this patch adds an improvement to exiting fadump feature to make the
> reserved memory available to system for use, using zone movable.

Is the 5% a reasonable estimate or more a ballpark number? I find it a
bit strange to require 3TB of memory to boot a kernel just to dump the
crashed kernel image. Shouldn't you rather look into this estimate than
spreading ZONE_MOVABLE abuse? Larger systems need more memory to dump
even with the regular kexec kdump but I have never seen any to use more
than 1G or something like that.
-- 
Michal Hocko
SUSE Labs

      reply	other threads:[~2018-07-19  8:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16  6:02 [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump Mahesh J Salgaonkar
2018-07-16  6:03 ` [RFC PATCH v6 1/4] mm/page_alloc: Introduce an interface to mark reserved memory as ZONE_MOVABLE Mahesh J Salgaonkar
2018-07-16  6:03 ` [RFC PATCH v6 2/4] powerpc/fadump: Reservationless firmware assisted dump Mahesh J Salgaonkar
2018-07-16  6:03 ` [RFC PATCH v6 3/4] powerpc/fadump: throw proper error message on fadump registration failure Mahesh J Salgaonkar
2018-07-16  6:03 ` [RFC PATCH v6 4/4] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area Mahesh J Salgaonkar
2018-07-16  8:26 ` [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump Michal Hocko
2018-07-17 11:28   ` Mahesh Jagannath Salgaonkar
2018-07-17 11:52     ` Michal Hocko
2018-07-18 16:22       ` Mahesh Jagannath Salgaonkar
2018-07-19  8:08         ` Michal Hocko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180719080805.GM7193@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ananth@in.ibm.com \
    --cc=ananth@linux.vnet.ibm.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=hbathini@linux.vnet.ibm.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kernelfans@gmail.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).