From: Vivek Goyal <vgoyal@redhat.com>
To: Valdis.Kletnieks@vt.edu
Cc: oomichi@mxs.nes.nec.co.jp, linux-s390@vger.kernel.org,
mahesh@linux.vnet.ibm.com, heiko.carstens@de.ibm.com,
linux-kernel@vger.kernel.org, hbabu@us.ibm.com,
horms@verge.net.au, ebiederm@xmission.com,
schwidefsky@de.ibm.com,
Michael Holzheu <holzheu@linux.vnet.ibm.com>,
kexec@lists.infradead.org
Subject: Re: [patch 0/9] kdump: Patch series for s390 support
Date: Tue, 12 Jul 2011 09:52:41 -0400 [thread overview]
Message-ID: <20110712135241.GD1293@redhat.com> (raw)
In-Reply-To: <49979.1310234299@turing-police.cc.vt.edu>
On Sat, Jul 09, 2011 at 01:58:19PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Thu, 07 Jul 2011 15:33:21 EDT, Vivek Goyal said:
> > On Wed, Jul 06, 2011 at 11:24:47AM +0200, Michael Holzheu wrote:
>
> > > S390 stand-alone dump tools are independent mini operating systems that
> > > are installed on disks or tapes. When a dump should be created, these
> > > stand-alone dump tools are booted. All that they do is to write the dump
> > > (current memory plus the CPU registers) to the disk/tape device.
> > >
> > > The advantage compared to kdump is that since they are freshly loaded
> > > into memory they can't be overwritten in memory.
> >
> > > Another advantage is
> > > that since it is different code, it is much less likely that the dump
> > > tool will run into the same problem than the previously crashed kernel.
> >
> > I think in practice this is not really a problem. If your kernel
> > is not stable enough to even boot and copy a file, then most likely
> > it has not even been deployed. The very fact that a kernel has been
> > up and running verifies that it is a stable kernel for that machine
> > and is capable of capturing the dump.
>
> Vivek: I used to do VM/XA on S/390 boxes for a living, and that's *not* where
> Michael is coming from.
>
> What the standalone dump code does is take a system that may have the moral
> equivalent of 256 separate PCI buses, several hundred disks all visible in
> multipath configurations, dozens of other devices, and as long as you can find
> *one* console and *one* tape/disk drive that works, you can capture a dump.
IIUC, capturing dump in virtualized environment is much more easy as
software is not completely dead and hypervisor is still running. For
example, qemu can easily capture the memory snapshot of the VM once it
is hung reliably in all situations. Issue becomes mageability with filtering
with various kernel versions and across operating systems inside VM. Hence
kdump for linux is being deployed even in virtualized environment.
I guess using stand alone dump tools is very similar to qemu dump in terms
of reliability but lacks filtering capabilities and is limited to specific
devices. That way qemu is much more powerful.
>
> More than once in my career, I got into a situation where the production system
> would hang - and booting off another disk that contained an older copy with
> maybe a few less patches would *also* hang. VM/XA would simply *not run*.
> Booting the standalone dump utility (which shared zero code with VM/XA, and did
> *much* less initialization of I/O devices not needed for the actual dump) would
> work just fine. This would get me a dump that would show that we had a
> (usually) hardware issue - either we were tripping over an errata that *no*
> released version of VM/XA had a workaround for, or outright defective hardware.
Can we not achieve almost equivalent of it by only loading very selective
modules in second kernel?
If not, one can always use qemu-kvm dump capability with kvm hypervisor if
kdump does not work. It will be a manual operation though like s390 stand
alone dump utility.
So the point is that I am fine with stand alone dump utitliy capturing
the dump. Just keep it as backup plan if kdump does not work. Also for
early crashes kdump will not work and stand alone dump utility will be
the primary plan to capture the dump.
In above example, are you saying that your production kernel does not even
boot now which used to boot in the past on same system (because of some
bad hardware state?).
>
> For the same efficiency reasons that Linux doesn't do a lot of checking for
> "can never happen" cases, VM/XA doesn't check some things. So when busted
> hardware would present logically impossible combinations of status bits (for
> instance, "device still connected" but "I/O bus disconnected"), Bad Things
> would happen. Booting a tiny dump program that never even *tried* to look at
> the bad bits posted by the miscreant hardware would allow you to get the info
> you needed to debug it.
Ok, may be. I am not saying that don't use stand alone dump utility for
severe hardware issues. I am just saying that a closer integration with
kexec infrastructure like other architecture will be better. We probably
do not require any common code changes except a custom purgatory for
s390 to IPL stand alone utilities.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
prev parent reply other threads:[~2011-07-12 13:52 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-04 17:09 [patch 0/9] kdump: Patch series for s390 support Michael Holzheu
2011-07-04 17:09 ` [patch 1/9] kdump: Add KEXEC_CRASH_CONTROL_MEMORY_LIMIT Michael Holzheu
2011-07-04 17:09 ` [patch 2/9] kdump: Add machine_kexec_finish() Michael Holzheu
2011-07-04 17:09 ` [patch 3/9] kdump: Make kimage_load_crash_segment() weak Michael Holzheu
2011-07-04 17:09 ` [patch 4/9] kdump: Initialize vmcoreinfo note at startup Michael Holzheu
2011-07-04 17:09 ` [patch 5/9] kdump: Allow vmcore ELF header to be created in new kernel Michael Holzheu
2011-07-04 17:09 ` [patch 6/9] kdump: Merge set_vmcore_list_offsets_elf_32/64() Michael Holzheu
2011-07-04 17:09 ` [patch 7/9] kdump: Trigger kdump via panic notifier chain on s390 Michael Holzheu
2011-07-04 17:09 ` [patch 8/9] s390: kdump backend code Michael Holzheu
2011-07-04 17:09 ` [patch 9/9] kexec-tools: Add s390 kdump support Michael Holzheu
2011-07-05 20:26 ` [patch 0/9] kdump: Patch series for s390 support Vivek Goyal
2011-07-06 9:24 ` Michael Holzheu
2011-07-07 19:33 ` Vivek Goyal
2011-07-08 9:01 ` Martin Schwidefsky
2011-07-11 14:42 ` Vivek Goyal
2011-07-11 15:56 ` Martin Schwidefsky
2011-07-13 16:02 ` Vivek Goyal
2011-07-13 16:46 ` Martin Schwidefsky
2011-07-13 16:59 ` Michael Holzheu
2011-07-13 17:19 ` Vivek Goyal
2011-07-13 20:00 ` Vivek Goyal
2011-07-14 7:18 ` Martin Schwidefsky
2011-07-14 17:55 ` Vivek Goyal
2011-07-14 18:05 ` Vivek Goyal
2011-07-15 14:21 ` Michael Holzheu
2011-07-15 14:38 ` Vivek Goyal
2011-07-15 15:43 ` Michael Holzheu
2011-07-18 12:31 ` Vivek Goyal
2011-07-18 14:00 ` Michael Holzheu
2011-07-18 14:19 ` Vivek Goyal
2011-07-18 14:44 ` Michael Holzheu
2011-07-18 15:25 ` Vivek Goyal
2011-07-18 18:03 ` Michael Holzheu
2011-07-19 15:04 ` Vivek Goyal
2011-07-20 8:00 ` Martin Schwidefsky
2011-07-20 9:28 ` Michael Holzheu
2011-07-20 20:24 ` Vivek Goyal
2011-07-20 19:25 ` Vivek Goyal
2011-07-21 14:58 ` Michael Holzheu
2011-07-21 21:22 ` Vivek Goyal
2011-07-22 9:33 ` Michael Holzheu
2011-07-25 16:02 ` Vivek Goyal
2011-07-26 9:44 ` Michael Holzheu
2011-07-22 15:26 ` Michael Holzheu
2011-07-25 18:07 ` Vivek Goyal
2011-07-26 9:32 ` Michael Holzheu
2011-07-15 13:56 ` Michael Holzheu
2011-07-15 14:18 ` Vivek Goyal
2011-07-18 13:57 ` Martin Schwidefsky
2011-07-08 13:04 ` Michael Holzheu
2011-07-11 15:36 ` Vivek Goyal
2011-07-12 17:29 ` Michael Holzheu
[not found] ` <1310133738.3508.245.camel@br98xy6r>
2011-07-11 14:07 ` Vivek Goyal
2011-07-11 15:06 ` Michael Holzheu
[not found] ` <49979.1310234299@turing-police.cc.vt.edu>
2011-07-12 13:52 ` Vivek Goyal [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110712135241.GD1293@redhat.com \
--to=vgoyal@redhat.com \
--cc=Valdis.Kletnieks@vt.edu \
--cc=ebiederm@xmission.com \
--cc=hbabu@us.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=holzheu@linux.vnet.ibm.com \
--cc=horms@verge.net.au \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mahesh@linux.vnet.ibm.com \
--cc=oomichi@mxs.nes.nec.co.jp \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox