All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Ellerman <michael@ellerman.id.au>
To: Nathan Lynch <ntl@pobox.com>
Cc: mahuja@us.ibm.com, linuxppc-dev@ozlabs.org,
	Linas Vepstas <linasvepstas@gmail.com>,
	lkessler@us.ibm.com, strosake@us.ibm.com
Subject: Re: [PATCH 1/8] pseries: phyp dump: Docmentation
Date: Thu, 10 Jan 2008 09:59:05 +1100	[thread overview]
Message-ID: <1199919545.7880.11.camel@concordia> (raw)
In-Reply-To: <20080109184437.GU14201@localdomain>

[-- Attachment #1: Type: text/plain, Size: 4146 bytes --]

On Wed, 2008-01-09 at 12:44 -0600, Nathan Lynch wrote:
> Hi Linas,
> 
> Linas Vepstas wrote:
> > 
> > On 08/01/2008, Nathan Lynch <ntl@pobox.com> wrote:
> > > Manish Ahuja wrote:
> > > > +
> > > > +The goal of hypervisor-assisted dump is to enable the dump of
> > > > +a crashed system, and to do so from a fully-reset system, and
> > > > +to minimize the total elapsed time until the system is back
> > > > +in production use.
> > >
> > > Is it actually faster than kdump?
> > 
> > This is a basic presumption;
> 
> > As a side effect, the system is in
> > production *while* the dump is being taken;

It's in "production" with 256MB of RAM? Err. Sure as the dump progresses
more RAM will be freed, but that's hardly production. I think Nathan's
right, any sysadmin who wants predictability will probably double reboot
anyway.

> > with kdump,
> > you can't go into production until after the dump is finished,
> > and the system has been rebooted a second time.  On
> > systems with terabytes of RAM, the time difference can be
> > hours.

> Since you bring up large systems... a system with terabytes of RAM is
> practically guaranteed to be a NUMA configuration with dozens of cpus.
> When processing a dump on such a system, I wonder how well we fare:
> can we successfully boot with (say) 128 cpus and 256MB of usable
> memory?  Do we have to hot-online nodes as system memory is freed up
> (and does that even work)?  We need to be able to restore the system
> to its optimal topology when the dump is finished; if the best we can
> do is a degraded configuration, the workload will suffer and the
> system admin is likely to just reboot the machine again so the kernel
> will have the right NUMA topology.

Yeah that's a good question. Even if the hot-onlining works, there's
still kernel data structures allocated at boot which want to be
node-local. So the end result will be != a "production" boot.

> > > > +Implementation details:
> > > > +----------------------
> > > > +In order for this scheme to work, memory needs to be reserved
> > > > +quite early in the boot cycle. However, access to the device
> > > > +tree this early in the boot cycle is difficult, and device-tree
> > > > +access is needed to determine if there is a crash data waiting.
> > >
> > > I don't think this bit about early device tree access is correct.  By
> > > the time your code is reserving memory (from early_init_devtree(), I
> > > think), RTAS has been instantiated and you are able to test for the
> > > existence of /rtas/ibm,dump-kernel.
> > 
> > If I remember right, it was still too early to look up this token directly,
> > so we wrote some code to crawl the flat device tree to find it.  But
> > not only was that a lot of work, but I somehow decided that doing this
> > to the flat tree was wrong, as otherwise someone would surely have
> > written the access code.  If this can be made to work, that would be
> > great, but we couldn't make it work at the time.
> > 
> > > > +To work around this problem, all but 256MB of RAM is reserved
> > > > +during early boot. A short while later in boot, a check is made
> > > > +to determine if there is dump data waiting. If there isn't,
> > > > +then the reserved memory is released to general kernel use.
> > >
> > > So I think these gymnastics are unneeded -- unless I'm
> > > misunderstanding something, you should be able to determine very early
> > > whether to reserve that memory.
> > 
> > Only if you can get at rtas, but you can't get at rtas at that point.

AFAICT you don't need to get at RTAS, you just need to look at the
device tree to see if the property is present, and that is trivial.

You probably just need to add a check in early_init_dt_scan_rtas() which
sets a flag for the PHYP dump stuff, or add your own scan routine if you
need.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  parent reply	other threads:[~2008-01-09 22:59 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-01-08  0:13 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-01-09  4:29   ` Nathan Lynch
2008-01-09  4:58     ` Michael Ellerman
2008-01-09 15:31     ` Linas Vepstas
2008-01-09 18:44       ` Nathan Lynch
2008-01-09 19:28         ` Manish Ahuja
2008-01-09 22:59         ` Michael Ellerman [this message]
2008-01-09 23:18           ` Manish Ahuja
2008-01-10  2:47           ` Linas Vepstas
2008-01-10  3:55             ` Michael Ellerman
2008-01-10  2:33         ` Linas Vepstas
2008-01-10  3:17           ` Olof Johansson
2008-01-10  4:12             ` Linas Vepstas
2008-01-10  4:52               ` Michael Ellerman
2008-01-10 16:21               ` Olof Johansson
2008-01-10 16:34                 ` Linas Vepstas
2008-01-10 21:46                   ` Mike Strosaker
2008-01-11  1:26                     ` Nathan Lynch
2008-01-11 16:57                       ` Linas Vepstas
2008-01-14  5:24                         ` Olof Johansson
2008-01-14 15:21                           ` Linas Vepstas
2008-01-08  0:16 ` [PATCH 2/8] pseries: phyp dump: config file Manish Ahuja
2008-01-08  3:18   ` Stephen Rothwell
2008-01-08  0:21 ` [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
2008-01-08  3:45   ` Stephen Rothwell
2008-01-08 18:34     ` Linas Vepstas
2008-01-08  0:25 ` [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
2008-01-08  3:16   ` Stephen Rothwell
2008-01-16  4:21   ` Paul Mackerras
2008-01-08  0:28 ` [PATCH 5/8] pseries: phyp dump: register dump area Manish Ahuja
2008-01-08  3:59   ` Stephen Rothwell
2008-01-08  0:35 ` [PATCH 6/8] pseries: phyp dump: debugging print routines Manish Ahuja
2008-01-08  0:49   ` Arnd Bergmann
2008-01-08  4:03   ` Stephen Rothwell
2008-01-08  0:37 ` [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas Manish Ahuja
2008-01-08  4:25   ` Stephen Rothwell
2008-01-08 22:56     ` Manish Ahuja
2008-01-08  0:39 ` [PATCH 8/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
2008-02-12  6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-02-12  6:53   ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-02-12  7:08   ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
2008-02-12  8:48     ` Michael Ellerman
2008-02-12 16:38       ` Manish Ahuja
2008-02-14  3:46     ` Tony Breeds
2008-02-14 23:12       ` Olof Johansson
2008-02-15  7:16         ` Manish Ahuja
2008-02-12  7:11   ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
2008-02-12 10:08     ` Stephen Rothwell
2008-02-12 16:40       ` Manish Ahuja
2008-02-15  1:05     ` Tony Breeds
2008-02-15  7:17       ` Manish Ahuja
2008-02-15 22:32         ` Tony Breeds
2008-02-15 17:30       ` Linas Vepstas
2008-02-12  7:14   ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
2008-02-12 10:11     ` Stephen Rothwell
2008-02-12 16:31       ` Manish Ahuja
2008-02-12  7:16   ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja
2008-02-12  7:18   ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
2008-02-12 10:18     ` Stephen Rothwell
2008-02-12 16:32       ` Manish Ahuja
2008-02-13 21:43     ` Manish Ahuja
2008-02-12  7:20   ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
2008-02-12  7:21   ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
  -- strict thread matches above, loose matches on Subject: below --
2008-01-22 19:12 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-01-22 19:26 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-02-18  4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-02-22  0:53 ` Michael Ellerman
2008-02-28 23:57   ` Manish Ahuja
2008-02-29  0:22     ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1199919545.7880.11.camel@concordia \
    --to=michael@ellerman.id.au \
    --cc=linasvepstas@gmail.com \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=lkessler@us.ibm.com \
    --cc=mahuja@us.ibm.com \
    --cc=ntl@pobox.com \
    --cc=strosake@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.