public inbox for kexec@lists.infradead.org
 help / color / mirror / Atom feed
From: "K.Prasad" <prasad@linux.vnet.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: oomichi@mxs.nes.nec.co.jp, "Luck, Tony" <tony.luck@intel.com>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	tachibana@mxm.nes.nec.co.jp, Andi Kleen <andi@firstfloor.org>,
	anderson@redhat.com, Vivek Goyal <vgoyal@redhat.com>,
	crash-utility@redhat.com
Subject: Re: [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump
Date: Mon, 3 Oct 2011 17:33:36 +0530	[thread overview]
Message-ID: <20111003120336.GK2223@in.ibm.com> (raw)
In-Reply-To: <m162k64c7w.fsf@fess.ebiederm.org>

On Mon, Oct 03, 2011 at 03:10:43AM -0700, Eric W. Biederman wrote:
> "K.Prasad" <prasad@linux.vnet.ibm.com> writes:
> 
> > There are certain types of crashes induced by faulty hardware in which
> > capturing crashing kernel's memory (through kdump) makes no sense (or sometimes
> > dangerous).
> >
> > A case in point, is unrecoverable memory errors (resulting in fatal machine
> > check exceptions) in which reading from the faulty memory location from the
> > kexec'ed kernel will cause double fault and system reset (leaving no
> > information for the user).
> 
> It does make plenty of sense, and I capture the all of the time.
> It totally doesn't make sense to do this in the kernel when we can
> filter this from userspace just fine.
> 

It's interesting...according to Intel's Software Developer Manual
(quoting from Volume 3A, Chapter 15), the MCIP bit in IA32_MCG_STATUS
MSR behaves as described below.

"MCIP (machine check in progress) flag, bit 2 Indicates (when set)
that a machine-check exception was generated. Software can set or clear this
flag. The occurrence of a second Machine-Check Event while MCIP is set will
cause the processor to enter a shutdown state."

While in do_machine_check function, we enter the panic path (for
unrecoverable errors) much before the IA32_MCG_STATUS MSR is reset and
this is likely to dangerous.

911 void do_machine_check(struct pt_regs *regs, long error_code)
912 {
.............
................
1055         if (no_way_out && tolerant < 3)
1056                 mce_panic("Fatal machine check on current CPU", final, msg);
.............
................
1073         mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
1074 out:

It'd be interesting to know the type of memory error (as classified by
the processor) for which you're able to capture the memory dump.
Maybe a dump of the various MCE status registers (and struct mce) would
help us understand the behaviour on your system better.

> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> I thought we already had this discussion.  Why is this silliness coming
> back?
> 
> I especially dislike the notion of hardcoding policy in the kernel like this.
> 

The last time this was discussed in the community, the kernel was hardcoded to
prevent anybody from reading the kernel memory, while this time it is NOT.

This kernel patch is different from the last time, in that it only adds an
elf-note to denote a particular type of crash. However in the user-space, using
'cp' for instance, the entire coredump can be read from /proc/vmcore. Similarly
'makedumpfile' can be used to extract the dmesg from the crashed kernel and the
new elf-note does not interfere with the same.

Hope this addresses your concerns.

Thanks,
K.Prasad


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2011-10-03 12:03 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-03  7:07 [Patch 0/4] Slimdump framework using NT_NOCOREDUMP elf-note K.Prasad
2011-10-03  7:32 ` [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump K.Prasad
2011-10-03 10:10   ` Eric W. Biederman
2011-10-03 12:03     ` K.Prasad [this message]
2011-10-04  6:34       ` Borislav Petkov
2011-10-05  7:07         ` K.Prasad
2011-10-05  7:31           ` Borislav Petkov
2011-10-05  9:47             ` K.Prasad
2011-10-05 12:41               ` Borislav Petkov
2011-10-05 15:52               ` Vivek Goyal
     [not found]                 ` <10327.1317830438@turing-police.cc.vt.edu>
2011-10-05 16:16                   ` Borislav Petkov
2011-10-05 17:20                     ` Vivek Goyal
2011-10-05 17:13                   ` Vivek Goyal
     [not found]             ` <26571.1317815746@turing-police.cc.vt.edu>
2011-10-05 12:31               ` Borislav Petkov
2011-10-05 15:19           ` Vivek Goyal
2011-10-05 15:30           ` Vivek Goyal
2011-10-03 22:53     ` Luck, Tony
2011-10-04 14:04   ` Vivek Goyal
2011-10-05  7:18     ` K.Prasad
2011-10-05  7:33       ` Borislav Petkov
2011-10-05  9:23         ` K.Prasad
2011-10-05 15:25       ` Vivek Goyal
2011-10-07 16:12         ` K.Prasad
2011-10-10  7:07           ` Borislav Petkov
2011-10-11 18:44             ` K.Prasad
2011-10-11 18:59               ` Luck, Tony
2011-10-12  0:20               ` Andi Kleen
2011-10-12 10:44               ` Borislav Petkov
2011-10-12 15:59                 ` Vivek Goyal
2011-10-12 15:51               ` Vivek Goyal
2011-10-14 11:30                 ` K.Prasad
2011-10-14 14:14                   ` Vivek Goyal
2011-10-18 17:41                     ` K.Prasad
2011-10-11 18:55             ` Luck, Tony
2011-10-04 14:30   ` Vivek Goyal
2011-10-05  7:41     ` K.Prasad
2011-10-05 15:40       ` Vivek Goyal
2011-10-05 15:58         ` Luck, Tony
2011-10-05 16:25           ` Borislav Petkov
2011-10-05 17:10           ` Vivek Goyal
2011-10-05 17:20             ` Borislav Petkov
2011-10-05 17:29               ` Vivek Goyal
2011-10-05 17:43                 ` Borislav Petkov
2011-10-05 18:00                 ` Dave Anderson
2011-10-05 18:09                   ` Vivek Goyal
2011-10-04 15:04   ` Nick Bowler
2011-10-07 16:36     ` K.Prasad
2011-10-07 18:19       ` Nick Bowler
2011-10-03  7:35 ` [Patch 2/4][kexec-tools] Recognise NT_NOCOREDUMP elf-note type K.Prasad
2011-10-03  7:37 ` [Patch 3/4][makedumpfile] Capture slimdump if elf-note NT_NOCOREDUMP present K.Prasad
2011-10-03  7:45 ` [Patch 4/4][crash] Recognise elf-note of type NT_NOCOREDUMP before vmcore analysis K.Prasad

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111003120336.GK2223@in.ibm.com \
    --to=prasad@linux.vnet.ibm.com \
    --cc=anderson@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=crash-utility@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oomichi@mxs.nes.nec.co.jp \
    --cc=tachibana@mxm.nes.nec.co.jp \
    --cc=tony.luck@intel.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox