From: "Linas Vepstas" <linasvepstas@gmail.com>
To: "Nathan Lynch" <ntl@pobox.com>
Cc: mahuja@us.ibm.com, linuxppc-dev@ozlabs.org, lkessler@us.ibm.com,
strosake@us.ibm.com
Subject: Re: [PATCH 1/8] pseries: phyp dump: Docmentation
Date: Wed, 9 Jan 2008 20:33:53 -0600 [thread overview]
Message-ID: <3ae3aa420801091833i6cf32616o2a060579be1f3191@mail.gmail.com> (raw)
In-Reply-To: <20080109184437.GU14201@localdomain>
On 09/01/2008, Nathan Lynch <ntl@pobox.com> wrote:
> Hi Linas,
>
> Linas Vepstas wrote:
> >
> > As a side effect, the system is in
> > production *while* the dump is being taken;
>
> A dubious feature IMO.
Hmm. Take it up with Ken Rozendal, this is supposed to be
one of the two main selling points of this thing.
> Seems that the design potentially trades
> reliability of first failure data capture for availability.
> E.g. system crashes, reboots, resumes processing while copying dump,
> crashes again before dump procedure is complete. How is that handled,
> if at all?
Its handled by the hypervisor. phyp maintains the copy of the
RMO of first crash, until such time that the OS declares the
dump of the RMO to be complete. So you'll always have
the RMO of the first crash.
For the rest of RAM, it will come in two parts: some portion
will have been dumped already. The rest has not yet been dumped,
and it will still be there, preserved across the second crash.
So you get both RMO and all of RAM from the first crash.
> > with kdump,
> > you can't go into production until after the dump is finished,
> > and the system has been rebooted a second time. On
> > systems with terabytes of RAM, the time difference can be
> > hours.
>
> The difference in time it takes to resume the normal workload may be
> significant, yes. But the time it takes to get a usable dump image
> would seem to be the basically the same.
Yes.
> Since you bring up large systems... a system with terabytes of RAM is
> practically guaranteed to be a NUMA configuration with dozens of cpus.
> When processing a dump on such a system, I wonder how well we fare:
> can we successfully boot with (say) 128 cpus and 256MB of usable
> memory? Do we have to hot-online nodes as system memory is freed up
> (and does that even work)? We need to be able to restore the system
> to its optimal topology when the dump is finished; if the best we can
> do is a degraded configuration, the workload will suffer and the
> system admin is likely to just reboot the machine again so the kernel
> will have the right NUMA topology.
Heh. That's the elbow-grease of this thing. The easy part is to get
the core function working. The hard part is to test these various configs,
and when they don't work, figure out what went wrong. That will take
perseverence and brains.
--linas
next prev parent reply other threads:[~2008-01-10 2:33 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-01-08 0:13 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-01-09 4:29 ` Nathan Lynch
2008-01-09 4:58 ` Michael Ellerman
2008-01-09 15:31 ` Linas Vepstas
2008-01-09 18:44 ` Nathan Lynch
2008-01-09 19:28 ` Manish Ahuja
2008-01-09 22:59 ` Michael Ellerman
2008-01-09 23:18 ` Manish Ahuja
2008-01-10 2:47 ` Linas Vepstas
2008-01-10 3:55 ` Michael Ellerman
2008-01-10 2:33 ` Linas Vepstas [this message]
2008-01-10 3:17 ` Olof Johansson
2008-01-10 4:12 ` Linas Vepstas
2008-01-10 4:52 ` Michael Ellerman
2008-01-10 16:21 ` Olof Johansson
2008-01-10 16:34 ` Linas Vepstas
2008-01-10 21:46 ` Mike Strosaker
2008-01-11 1:26 ` Nathan Lynch
2008-01-11 16:57 ` Linas Vepstas
2008-01-14 5:24 ` Olof Johansson
2008-01-14 15:21 ` Linas Vepstas
2008-01-08 0:16 ` [PATCH 2/8] pseries: phyp dump: config file Manish Ahuja
2008-01-08 3:18 ` Stephen Rothwell
2008-01-08 0:21 ` [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
2008-01-08 3:45 ` Stephen Rothwell
2008-01-08 18:34 ` Linas Vepstas
2008-01-08 0:25 ` [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
2008-01-08 3:16 ` Stephen Rothwell
2008-01-16 4:21 ` Paul Mackerras
2008-01-08 0:28 ` [PATCH 5/8] pseries: phyp dump: register dump area Manish Ahuja
2008-01-08 3:59 ` Stephen Rothwell
2008-01-08 0:35 ` [PATCH 6/8] pseries: phyp dump: debugging print routines Manish Ahuja
2008-01-08 0:49 ` Arnd Bergmann
2008-01-08 4:03 ` Stephen Rothwell
2008-01-08 0:37 ` [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas Manish Ahuja
2008-01-08 4:25 ` Stephen Rothwell
2008-01-08 22:56 ` Manish Ahuja
2008-01-08 0:39 ` [PATCH 8/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-02-12 6:53 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-02-12 7:08 ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
2008-02-12 8:48 ` Michael Ellerman
2008-02-12 16:38 ` Manish Ahuja
2008-02-14 3:46 ` Tony Breeds
2008-02-14 23:12 ` Olof Johansson
2008-02-15 7:16 ` Manish Ahuja
2008-02-12 7:11 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
2008-02-12 10:08 ` Stephen Rothwell
2008-02-12 16:40 ` Manish Ahuja
2008-02-15 1:05 ` Tony Breeds
2008-02-15 7:17 ` Manish Ahuja
2008-02-15 22:32 ` Tony Breeds
2008-02-15 17:30 ` Linas Vepstas
2008-02-12 7:14 ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
2008-02-12 10:11 ` Stephen Rothwell
2008-02-12 16:31 ` Manish Ahuja
2008-02-12 7:16 ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja
2008-02-12 7:18 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
2008-02-12 10:18 ` Stephen Rothwell
2008-02-12 16:32 ` Manish Ahuja
2008-02-13 21:43 ` Manish Ahuja
2008-02-12 7:20 ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
2008-02-12 7:21 ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
-- strict thread matches above, loose matches on Subject: below --
2008-01-22 19:12 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-01-22 19:26 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-02-18 4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-02-22 0:53 ` Michael Ellerman
2008-02-28 23:57 ` Manish Ahuja
2008-02-29 0:22 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3ae3aa420801091833i6cf32616o2a060579be1f3191@mail.gmail.com \
--to=linasvepstas@gmail.com \
--cc=linuxppc-dev@ozlabs.org \
--cc=lkessler@us.ibm.com \
--cc=mahuja@us.ibm.com \
--cc=ntl@pobox.com \
--cc=strosake@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).