From: Olof Johansson <olof@lixom.net>
To: Linas Vepstas <linasvepstas@gmail.com>
Cc: lkessler@us.ibm.com, linuxppc-dev@ozlabs.org,
Nathan Lynch <ntl@pobox.com>,
mahuja@us.ibm.com, strosake@us.ibm.com
Subject: Re: [PATCH 1/8] pseries: phyp dump: Docmentation
Date: Sun, 13 Jan 2008 23:24:02 -0600 [thread overview]
Message-ID: <20080114052402.GA23786@lixom.net> (raw)
In-Reply-To: <3ae3aa420801110857l5e43fd56s5bd1c24ffac939f3@mail.gmail.com>
On Fri, Jan 11, 2008 at 10:57:51AM -0600, Linas Vepstas wrote:
> On 10/01/2008, Nathan Lynch <ntl@pobox.com> wrote:
> > Mike Strosaker wrote:
> > >
> > > At the risk of repeating what others have already said, the PHYP-assistance
> > > method provides some advantages that the kexec method cannot:
> > > - Availability of the system for production use before the dump data is
> > > collected. As was mentioned before, some production systems may choose not
> > > to operate with the limited memory initially available after the reboot,
> > > but it sure is nice to provide the option.
> >
> > I'm more concerned that this design encourages the user to resume a
> > workload *which is almost certainly known to result in a system crash*
> > before collection of crash data is complete. Maybe the gamble will
> > pay off most of the time, but I wouldn't want to be working support
> > when it doesn't.
>
> Workloads that cause crashes within hours of startup tend to be
> weeded-out/discovered during pre-production test of the system
> to be deployed. Since its pre-production test, dumps can be
> taken in a leisurely manner. Heck, even a session at the
> xmon prompt can be contemplated.
>
> The problem is when the crash only reproduces after days or
> weeks of uptime, on a production machine. Since the machine
> is in production, its got to be brought back up ASAP. Since
> its crashing only after days/weeks, the dump should have
> plenty of time to complete. (And if it crashes quickly after
> that reboot ... well, support people always welcome ways
> in which a bug can be reproduced more quickly/easily).
How do you expect to have it in full production if you don't have all
resources available for it? It's not until the dump has finished that you
can return all memory to the production environment and use it.
This can very easily be argued in both direction, with no clear winner:
If the crash is stress-induced (say a slashdotted website), for those
cases it seems more rational to take the time, collect _good data_ even
if it takes a little longer, and then go back into production. Especially
if the alternative is to go back into production immediately, collect
about half of the data, and then crash again. Rinse and repeat.
Anyway -- I can agree that some of the arguments w.r.t robustness and
reliability of collecting dumps can be higher using this approach. It
really surprises me that there's no way to reset a device through PHYP
though. Seems like such a fundamental feature.
I think people are overly optimistic if they think it'll be possible
to do all of this reliably (as in with consistent performance) without
a second reboot though. At least without similar amounts of work being
done as it would have taken to fix kdump's reliability in the first place.
Speaking of reboots. PHYP isn't known for being quick at rebooting a
partition, it used to take in the order of minutes even on a small
machine. Has that been fixed? If not, the avoiding an extra reboot
argument hardly seems like a benefit versus kdump+kexec, which reboots
nearly instantly and without involvement from PHYP.
-Olof
next prev parent reply other threads:[~2008-01-14 5:13 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-07 23:45 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-01-08 0:13 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-01-09 4:29 ` Nathan Lynch
2008-01-09 4:58 ` Michael Ellerman
2008-01-09 15:31 ` Linas Vepstas
2008-01-09 18:44 ` Nathan Lynch
2008-01-09 19:28 ` Manish Ahuja
2008-01-09 22:59 ` Michael Ellerman
2008-01-09 23:18 ` Manish Ahuja
2008-01-10 2:47 ` Linas Vepstas
2008-01-10 3:55 ` Michael Ellerman
2008-01-10 2:33 ` Linas Vepstas
2008-01-10 3:17 ` Olof Johansson
2008-01-10 4:12 ` Linas Vepstas
2008-01-10 4:52 ` Michael Ellerman
2008-01-10 16:21 ` Olof Johansson
2008-01-10 16:34 ` Linas Vepstas
2008-01-10 21:46 ` Mike Strosaker
2008-01-11 1:26 ` Nathan Lynch
2008-01-11 16:57 ` Linas Vepstas
2008-01-14 5:24 ` Olof Johansson [this message]
2008-01-14 15:21 ` Linas Vepstas
2008-01-08 0:16 ` [PATCH 2/8] pseries: phyp dump: config file Manish Ahuja
2008-01-08 3:18 ` Stephen Rothwell
2008-01-08 0:21 ` [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
2008-01-08 3:45 ` Stephen Rothwell
2008-01-08 18:34 ` Linas Vepstas
2008-01-08 0:25 ` [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
2008-01-08 3:16 ` Stephen Rothwell
2008-01-16 4:21 ` Paul Mackerras
2008-01-08 0:28 ` [PATCH 5/8] pseries: phyp dump: register dump area Manish Ahuja
2008-01-08 3:59 ` Stephen Rothwell
2008-01-08 0:35 ` [PATCH 6/8] pseries: phyp dump: debugging print routines Manish Ahuja
2008-01-08 0:49 ` Arnd Bergmann
2008-01-08 4:03 ` Stephen Rothwell
2008-01-08 0:37 ` [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas Manish Ahuja
2008-01-08 4:25 ` Stephen Rothwell
2008-01-08 22:56 ` Manish Ahuja
2008-01-08 0:39 ` [PATCH 8/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
2008-02-12 6:31 ` [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-02-12 6:53 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-02-12 7:08 ` [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept Manish Ahuja
2008-02-12 8:48 ` Michael Ellerman
2008-02-12 16:38 ` Manish Ahuja
2008-02-14 3:46 ` Tony Breeds
2008-02-14 23:12 ` Olof Johansson
2008-02-15 7:16 ` Manish Ahuja
2008-02-12 7:11 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
2008-02-12 10:08 ` Stephen Rothwell
2008-02-12 16:40 ` Manish Ahuja
2008-02-15 1:05 ` Tony Breeds
2008-02-15 7:17 ` Manish Ahuja
2008-02-15 22:32 ` Tony Breeds
2008-02-15 17:30 ` Linas Vepstas
2008-02-12 7:14 ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
2008-02-12 10:11 ` Stephen Rothwell
2008-02-12 16:31 ` Manish Ahuja
2008-02-12 7:16 ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja
2008-02-12 7:18 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
2008-02-12 10:18 ` Stephen Rothwell
2008-02-12 16:32 ` Manish Ahuja
2008-02-13 21:43 ` Manish Ahuja
2008-02-12 7:20 ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
2008-02-12 7:21 ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
-- strict thread matches above, loose matches on Subject: below --
2008-01-22 19:12 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-01-22 19:26 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
2008-02-18 4:53 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-02-22 0:53 ` Michael Ellerman
2008-02-28 23:57 ` Manish Ahuja
2008-02-29 0:22 ` [PATCH 1/8] pseries: phyp dump: Docmentation Manish Ahuja
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080114052402.GA23786@lixom.net \
--to=olof@lixom.net \
--cc=linasvepstas@gmail.com \
--cc=linuxppc-dev@ozlabs.org \
--cc=lkessler@us.ibm.com \
--cc=mahuja@us.ibm.com \
--cc=ntl@pobox.com \
--cc=strosake@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).