From: "K.Prasad" <prasad@linux.vnet.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>,
"Luck, Tony" <tony.luck@intel.com>,
Vivek Goyal <vgoyal@redhat.com>,
kexec@lists.infradead.org, anderson@redhat.com
Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information
Date: Tue, 31 May 2011 23:10:43 +0530 [thread overview]
Message-ID: <20110531174043.GA2000@in.ibm.com> (raw)
In-Reply-To: <m1fwo09g15.fsf@fess.ebiederm.org>
On Fri, May 27, 2011 at 11:04:06AM -0700, Eric W. Biederman wrote:
> "K.Prasad" <prasad@linux.vnet.ibm.com> writes:
>
> > PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information
> >
> > Fatal machine check exceptions (caused due to hardware memory errors) will now
> > result in a 'slim' coredump that captures vital information about the MCE. This
> > patch introduces a new panic flag, and new parameters to *panic functions
> > that can capture more information pertaining to the cause of crash.
> >
> > Enable a new elf-notes section to store additional information about the crash.
> > For MCE, enable a new notes section that captures relevant register status
> > (struct mce) to be later read during coredump analysis.
>
> There may be a reason to pass everything struct mce through 5 layers of
> code but right now it looks like it just makes everything uglier to no
> real purpose.
We could have stopped with just a blank elf-note of type NT_MCE
indicating an MCE triggered panic, but dumping 'struct mce' in it will
help gather more useful information about the error - especially the
memory address that experienced unrecoverable error (stored in
mce->addr).
The patch 6/6 for the 'crash' tool enabled decoding of 'struct
mce' to show this information (although the sample log in patch 0/6)
didn't show these benefits because 'mce-inject' tool used to soft-inject
these errors doesn't populate all registers with valid contents.
The idea was that when mce->addr contains physical address is shown
while decoding coredump, the corresponding memory DIMM could be identified
for replacement/isolation.
Given that 'struct mce' isn't placed in a user-space visible file its
duplicate copies have to be maintained in 'crash' (like it is done in
'mcelog' tool), and that's one disadvantage.
If you think that this complicates the patch, I'll start with a much
'slimmer' version (!) of the slimdump and the improvements may be
contemplated iteratively.
Thanks,
K.Prasad
next prev parent reply other threads:[~2011-05-31 17:40 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-26 17:07 [RFC Patch 0/6] slimdump: Enable slimdump if crashing kernel memory is not required K.Prasad
2011-05-26 17:12 ` [Patch 1/6] XPANIC: Add extended panic interface K.Prasad
2011-05-26 17:38 ` richard -rw- weinberger
2011-05-27 15:56 ` K.Prasad
2011-05-27 17:59 ` Eric W. Biederman
2011-05-26 17:12 ` [Patch 2/6] x86: mce: Convert mce code to xpanic K.Prasad
2011-05-27 18:01 ` Eric W. Biederman
2011-05-26 17:14 ` [Bugfix][Patch 3/3] Invoke vpanic inside xpanic function K.Prasad
2011-05-26 17:15 ` [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information K.Prasad
2011-05-26 18:43 ` Vivek Goyal
2011-05-27 17:03 ` K.Prasad
2011-05-27 18:29 ` Vivek Goyal
2011-05-27 18:04 ` Eric W. Biederman
2011-05-31 17:40 ` K.Prasad [this message]
2011-06-01 17:18 ` Dave Anderson
2011-06-01 17:23 ` Vivek Goyal
2011-06-01 17:41 ` Dave Anderson
2011-06-08 17:16 ` K.Prasad
2011-06-12 15:44 ` Eric W. Biederman
2011-06-15 2:06 ` K.Prasad
2011-05-27 18:09 ` Eric W. Biederman
2011-05-26 17:23 ` [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes K.Prasad
2011-05-26 17:32 ` Andi Kleen
2011-05-27 15:53 ` K.Prasad
2011-05-26 17:44 ` Vivek Goyal
2011-05-26 18:09 ` Andi Kleen
2011-05-26 18:26 ` Vivek Goyal
2011-05-26 18:58 ` Andi Kleen
2011-05-26 19:10 ` Vivek Goyal
2011-05-26 23:44 ` Simon Horman
2011-05-27 16:57 ` K.Prasad
2011-05-27 17:59 ` Vivek Goyal
2011-06-08 17:00 ` K.Prasad
2011-05-27 18:14 ` Eric W. Biederman
2011-05-26 17:26 ` [RFC Patch 6/6] Crash: Recognise slim coredumps and process new elf-note sections K.Prasad
2011-05-27 15:37 ` Mahesh J Salgaonkar
2011-05-27 18:16 ` Eric W. Biederman
2011-05-27 18:22 ` Vivek Goyal
2011-05-27 18:35 ` Eric W. Biederman
2011-05-26 17:31 ` [RFC Patch 0/6] slimdump: Enable slimdump if crashing kernel memory is not required K.Prasad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110531174043.GA2000@in.ibm.com \
--to=prasad@linux.vnet.ibm.com \
--cc=anderson@redhat.com \
--cc=andi@firstfloor.org \
--cc=ebiederm@xmission.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox