public inbox for kexec@lists.infradead.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: "K.Prasad" <prasad@linux.vnet.ibm.com>
Cc: oomichi@mxs.nes.nec.co.jp, Nick Bowler <nbowler@elliptictech.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	Valdis.Kletnieks@vt.edu, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, tachibana@mxm.nes.nec.co.jp,
	Andi Kleen <andi@firstfloor.org>, Borislav Petkov <bp@alien8.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	anderson@redhat.com, crash-utility@redhat.com
Subject: Re: [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump
Date: Wed, 12 Oct 2011 11:51:44 -0400	[thread overview]
Message-ID: <20111012155144.GC12845@redhat.com> (raw)
In-Reply-To: <20111011184434.GB32316@in.ibm.com>

On Wed, Oct 12, 2011 at 12:14:34AM +0530, K.Prasad wrote:
> On Mon, Oct 10, 2011 at 09:07:25AM +0200, Borislav Petkov wrote:
> > On Fri, Oct 07, 2011 at 09:42:19PM +0530, K.Prasad wrote:
> > > The problem, as pointed out by Borislav Petkov in a different mail, is that
> > > we might end up capturing a vmcore containing corrupted data when the
> > > same is not required for analysing the cause of the crash.
> > > 
> > > Of course, all this is assuming that reading the faulty memory with MCE
> > > disabled is harmless. However, the effect of a read operation in this
> > > case is undefined.
> > 
> > Frankly, I don't think that it is undefined - you basically should be
> > able to read DRAM albeit with the corrupted data in it. However, you
> > probably best disable the whole DRAM error detection first by clearing
> > a couple of bits in MC4_CTL_MASK (at least on AMD that should work, I
> > dunno how Intel does that).
> > 
> 
> The MC4_CTL_MASK doesn't appear to be defined in the kernel. Looking at
> http://support.amd.com/us/Processor_TechDocs/26094.PDF, Page 196, it
> states that "This register is typically programmed by BIOS and not by
> the Kernel software".
> 
> So, in any case we may not be able to disable machine-check exceptions
> (MCEs) only within the context of kexec'ed kernel. Let me know if I've
> missed something here.
> 
> > But, regardless, according to Vivek, the "makedumpfile" tool should be
> > able to jump over poisoned pages and you don't need all the hoopla above
> > at all, right?
> >
> 
> In short, the answer is yes. We could add a new string, say
> "CRASH_REASON=PANIC_MCE" to VMCOREINFO elf-note which can be parsed by
> 'makedumpfile' and get away without adding the new NT_NOCOREDUMP
> elf-note. Parsing through the log_buf to lookout for panic string from
> inside 'makedumpfile' appears to be a clumsy solution though.
> 
> The suggestion to make NT_NOCOREDUMP to contain more fine-granular
> information can be met by using meaningful strings for VMCOREINFO.

I guess we don't have to overload VMCOREINFO with more fine grained info
about MCE. kernel log buf should have that info. So makedumpfile can just
extract and save kernel buf and save it on disk and user can get all the
MCE info from that.

> 
> ---
> 
> In this context, I wish to quickly recollect the issues we've discussed
> thus far, their proposed solutions and re-evaluate the need for new elf-note.
> 
> i) Scenario1: System crashes because of a fatal MCE
> 
> Proposed Solution: Add a new string in the VMCOREINFO elf-note from
> within the MCE panic path to indicate cause of crash. 'makedumpfile'
> recognises this string to collect a slimdump instead of the normal dump.

What is slimdump? Why to define a new format and extra note in the vmcore.
Just simply save kernel log buf if you encounter PANIC_MCE.

> 
> ii) Scenario2: System with PG_hwpoison (or landmine!) pages crashes because
> of a software bug. In this case, kexec kernel would normally reboot because
> of reading the PG_poison page. I'll soon get a new version of the patchset
> implementing this.
> 
> Solution: Maintain a linked list of PFNs when the corresponding 'struct page'
> has been marked PG_hwpoison. We could export/put this list to use in
> quite a few ways.

What's the need of a list and why do we have to export anything. Can't
makedumpfile look at the struct page and then just not dump that page if
hwpoison flag is set.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  parent reply	other threads:[~2011-10-12 15:52 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-03  7:07 [Patch 0/4] Slimdump framework using NT_NOCOREDUMP elf-note K.Prasad
2011-10-03  7:32 ` [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump K.Prasad
2011-10-03 10:10   ` Eric W. Biederman
2011-10-03 12:03     ` K.Prasad
2011-10-04  6:34       ` Borislav Petkov
2011-10-05  7:07         ` K.Prasad
2011-10-05  7:31           ` Borislav Petkov
2011-10-05  9:47             ` K.Prasad
2011-10-05 12:41               ` Borislav Petkov
2011-10-05 15:52               ` Vivek Goyal
     [not found]                 ` <10327.1317830438@turing-police.cc.vt.edu>
2011-10-05 16:16                   ` Borislav Petkov
2011-10-05 17:20                     ` Vivek Goyal
2011-10-05 17:13                   ` Vivek Goyal
     [not found]             ` <26571.1317815746@turing-police.cc.vt.edu>
2011-10-05 12:31               ` Borislav Petkov
2011-10-05 15:19           ` Vivek Goyal
2011-10-05 15:30           ` Vivek Goyal
2011-10-03 22:53     ` Luck, Tony
2011-10-04 14:04   ` Vivek Goyal
2011-10-05  7:18     ` K.Prasad
2011-10-05  7:33       ` Borislav Petkov
2011-10-05  9:23         ` K.Prasad
2011-10-05 15:25       ` Vivek Goyal
2011-10-07 16:12         ` K.Prasad
2011-10-10  7:07           ` Borislav Petkov
2011-10-11 18:44             ` K.Prasad
2011-10-11 18:59               ` Luck, Tony
2011-10-12  0:20               ` Andi Kleen
2011-10-12 10:44               ` Borislav Petkov
2011-10-12 15:59                 ` Vivek Goyal
2011-10-12 15:51               ` Vivek Goyal [this message]
2011-10-14 11:30                 ` K.Prasad
2011-10-14 14:14                   ` Vivek Goyal
2011-10-18 17:41                     ` K.Prasad
2011-10-11 18:55             ` Luck, Tony
2011-10-04 14:30   ` Vivek Goyal
2011-10-05  7:41     ` K.Prasad
2011-10-05 15:40       ` Vivek Goyal
2011-10-05 15:58         ` Luck, Tony
2011-10-05 16:25           ` Borislav Petkov
2011-10-05 17:10           ` Vivek Goyal
2011-10-05 17:20             ` Borislav Petkov
2011-10-05 17:29               ` Vivek Goyal
2011-10-05 17:43                 ` Borislav Petkov
2011-10-05 18:00                 ` Dave Anderson
2011-10-05 18:09                   ` Vivek Goyal
2011-10-04 15:04   ` Nick Bowler
2011-10-07 16:36     ` K.Prasad
2011-10-07 18:19       ` Nick Bowler
2011-10-03  7:35 ` [Patch 2/4][kexec-tools] Recognise NT_NOCOREDUMP elf-note type K.Prasad
2011-10-03  7:37 ` [Patch 3/4][makedumpfile] Capture slimdump if elf-note NT_NOCOREDUMP present K.Prasad
2011-10-03  7:45 ` [Patch 4/4][crash] Recognise elf-note of type NT_NOCOREDUMP before vmcore analysis K.Prasad

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111012155144.GC12845@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=anderson@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=bp@alien8.de \
    --cc=crash-utility@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nbowler@elliptictech.com \
    --cc=oomichi@mxs.nes.nec.co.jp \
    --cc=prasad@linux.vnet.ibm.com \
    --cc=tachibana@mxm.nes.nec.co.jp \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox