From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>,
ebiederm@xmission.com, hbabu@us.ibm.com,
mahesh@linux.vnet.ibm.com, oomichi@mxs.nes.nec.co.jp,
horms@verge.net.au, heiko.carstens@de.ibm.com,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
linux-s390@vger.kernel.org
Subject: Re: [patch 0/9] kdump: Patch series for s390 support
Date: Wed, 13 Jul 2011 18:46:11 +0200 [thread overview]
Message-ID: <20110713184611.6dcb09b4@mschwide> (raw)
In-Reply-To: <20110713160239.GE4426@redhat.com>
On Wed, 13 Jul 2011 12:02:39 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:
> On Mon, Jul 11, 2011 at 05:56:26PM +0200, Martin Schwidefsky wrote:
>
> [..]
> > > kexec-tools purgatory code already has the checksum logic. So you don't
> > > have to redo that in stand alone tools. I think you probably need to
> > > s390 specic purgatory and jump to IPLing stand alone kernel if kdump
> > > kernel is corrupted instead of rebooting back or spinning infinitely
> > > in the loop/
> >
> > I can not quite follow you here. The purgatory code is part of the kdump kernel,
> > no? When we trigger a dump with the stand-alone tools we will start executing
> > code in the assembler function of that stand-alone tools. We can not trust
> > the kdump kernel yet, not without doing the checksums first.
>
> Purgatory is another piece of binary code which is loaded along with kdump
> kernel in reserved memory area. So yes, there is a chance that this code
> itself get corrupted.
Yes, that is one of the possible failure scenarios.
> So in case of stand alone dump, you save the calculated checksum of
> kdump kernel at disk and not in memory? And then calculate the checksum
> of memory image of kdump kernel and decide whether kdump kenrel is
> corrupted or not?
>
> If yes, this sounds more reliable as checksum of kernel is stored on
> some disk/tape.
No, the checksum for the purgatory code is stored in memory. If the purgatory
code is corrupted you would have to corrupt the checksum in a very specific
way as well to make it fail. The likelihood for that to happen is very low,
but if it does we still have a fallback plan: before we branch to the
purgatory code we invalidate the checksum. If the purgatory code has been
corrupt although the checksum told us that it is fine the machine will crash
again. If we then start the stand-alone dump tool again it will create a
full dump. But mind you that second IPL of the stand-alone dump tool is only
required for a very, very rare case.
> [..]
> > > Ok. So again why not reuse the checksump capability of kexec-tools and
> > > instead of infinite looping you can jump to stand alone tools + IPL etc.
> > > I understand this will require a tighter integration with kexec-tools
> > > and using ELF header mechanism and will not cover the early kernel
> > > crashes.
> >
> > Imho the checksum of kexec-tools is in the wrong place.
>
> Because you think that stored checksum can get corrupted?
No, what I meant is that the code that verifies the checksum has to be part
of the stand-alone dump tool and not the purgatory code.
> [..]
> > > To me we seem to be diverging a lot from existing kdump+kexec-tools
> > > mechanism just to solve the case of early crash dumping. If we break
> > > down the problem in two parts and do thing kexec-tools way (with a
> > > backup path of booting stand alone kernel if kdump kenrel is corrupted),
> > > things might be better.
> >
> > The "backup path of booting stand alone kernel" would result in passing
> > the control twice, once from the stand-alone dumper to the kexec purgatory
> > (after the purgatory checksum has been verified), then doing more checks
> > in the kdump kernel, only to return to the stand-alone dumper if some check
> > fails. Does not really sound enticing to me.
>
> What I am suggesting is that stand alone dumper gets control only if
> kdump kernel is corrupted.
>
> So following sequence.
>
> Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools
>
> Here only drawback seems to be that we assume that purgatory code and
> pre-calculated checksum has not been corrupted. The big advantage is
> that s390 kdump support looks very similar to other arches and
> understaning and supporting kdump across architectures becomes easy.
My problem with that is the following: how do we get from the "Kernel Crash"
step to the purgatory code? It does work for "normal" panics, but it fails
miserably for a hard crash that does not even get as far as panic. That is
why we insist on a possible second order of things:
Kernel Crash --> IPL of stand-alone dump tool --> branch to kdump if the
checksums turn out ok.
If the kernel called panic itself and branched to the purgatory code but the
checksum turned out to be bad we just stop there. Then the operator has to
do a manual IPL of the stand-alone dump tool to get the dump.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
next prev parent reply other threads:[~2011-07-13 19:55 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-04 17:09 [patch 0/9] kdump: Patch series for s390 support Michael Holzheu
2011-07-04 17:09 ` [patch 1/9] kdump: Add KEXEC_CRASH_CONTROL_MEMORY_LIMIT Michael Holzheu
2011-07-04 17:09 ` [patch 2/9] kdump: Add machine_kexec_finish() Michael Holzheu
2011-07-04 17:09 ` [patch 3/9] kdump: Make kimage_load_crash_segment() weak Michael Holzheu
2011-07-04 17:09 ` [patch 4/9] kdump: Initialize vmcoreinfo note at startup Michael Holzheu
2011-07-04 17:09 ` [patch 5/9] kdump: Allow vmcore ELF header to be created in new kernel Michael Holzheu
2011-07-04 17:09 ` [patch 6/9] kdump: Merge set_vmcore_list_offsets_elf_32/64() Michael Holzheu
2011-07-04 17:09 ` [patch 7/9] kdump: Trigger kdump via panic notifier chain on s390 Michael Holzheu
2011-07-04 17:09 ` [patch 8/9] s390: kdump backend code Michael Holzheu
2011-07-04 17:09 ` [patch 9/9] kexec-tools: Add s390 kdump support Michael Holzheu
2011-07-05 20:26 ` [patch 0/9] kdump: Patch series for s390 support Vivek Goyal
2011-07-06 9:24 ` Michael Holzheu
2011-07-07 19:33 ` Vivek Goyal
2011-07-08 9:01 ` Martin Schwidefsky
2011-07-11 14:42 ` Vivek Goyal
2011-07-11 15:56 ` Martin Schwidefsky
2011-07-13 16:02 ` Vivek Goyal
2011-07-13 16:46 ` Martin Schwidefsky [this message]
2011-07-13 16:59 ` Michael Holzheu
2011-07-13 17:19 ` Vivek Goyal
2011-07-13 20:00 ` Vivek Goyal
2011-07-14 7:18 ` Martin Schwidefsky
2011-07-14 17:55 ` Vivek Goyal
2011-07-14 18:05 ` Vivek Goyal
2011-07-15 14:21 ` Michael Holzheu
2011-07-15 14:38 ` Vivek Goyal
2011-07-15 15:43 ` Michael Holzheu
2011-07-18 12:31 ` Vivek Goyal
2011-07-18 14:00 ` Michael Holzheu
2011-07-18 14:19 ` Vivek Goyal
2011-07-18 14:44 ` Michael Holzheu
2011-07-18 15:25 ` Vivek Goyal
2011-07-18 18:03 ` Michael Holzheu
2011-07-19 15:04 ` Vivek Goyal
2011-07-20 8:00 ` Martin Schwidefsky
2011-07-20 9:28 ` Michael Holzheu
2011-07-20 20:24 ` Vivek Goyal
2011-07-20 19:25 ` Vivek Goyal
2011-07-21 14:58 ` Michael Holzheu
2011-07-21 21:22 ` Vivek Goyal
2011-07-22 9:33 ` Michael Holzheu
2011-07-25 16:02 ` Vivek Goyal
2011-07-26 9:44 ` Michael Holzheu
2011-07-22 15:26 ` Michael Holzheu
2011-07-25 18:07 ` Vivek Goyal
2011-07-26 9:32 ` Michael Holzheu
2011-07-15 13:56 ` Michael Holzheu
2011-07-15 14:18 ` Vivek Goyal
2011-07-18 13:57 ` Martin Schwidefsky
2011-07-08 13:04 ` Michael Holzheu
2011-07-11 15:36 ` Vivek Goyal
2011-07-12 17:29 ` Michael Holzheu
2011-07-08 14:02 ` Michael Holzheu
2011-07-11 14:07 ` Vivek Goyal
2011-07-11 15:06 ` Michael Holzheu
2011-07-09 17:58 ` Valdis.Kletnieks
2011-07-12 13:52 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110713184611.6dcb09b4@mschwide \
--to=schwidefsky@de.ibm.com \
--cc=ebiederm@xmission.com \
--cc=hbabu@us.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=holzheu@linux.vnet.ibm.com \
--cc=horms@verge.net.au \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mahesh@linux.vnet.ibm.com \
--cc=oomichi@mxs.nes.nec.co.jp \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox