From: Vivek Goyal <vgoyal@redhat.com>
To: "K.Prasad" <prasad@linux.vnet.ibm.com>
Cc: oomichi@mxs.nes.nec.co.jp, "Luck, Tony" <tony.luck@intel.com>,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
tachibana@mxm.nes.nec.co.jp, Andi Kleen <andi@firstfloor.org>,
anderson@redhat.com, "Eric W. Biederman" <ebiederm@xmission.com>,
crash-utility@redhat.com
Subject: Re: [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump
Date: Tue, 4 Oct 2011 10:30:12 -0400 [thread overview]
Message-ID: <20111004143012.GB28306@redhat.com> (raw)
In-Reply-To: <20111003073203.GA22694@in.ibm.com>
On Mon, Oct 03, 2011 at 01:02:03PM +0530, K.Prasad wrote:
> There are certain types of crashes induced by faulty hardware in which
> capturing crashing kernel's memory (through kdump) makes no sense (or sometimes
> dangerous).
>
> A case in point, is unrecoverable memory errors (resulting in fatal machine
> check exceptions) in which reading from the faulty memory location from the
> kexec'ed kernel will cause double fault and system reset (leaving no
> information for the user).
>
> This patch introduces a framework called 'slimdump' enabled through a new
> elf-note NT_NOCOREDUMP. Any error whose cause cannot be attributed to a
> software error and cannot be detected by analysing the kernel memory may
> decide to add this elf-note to the vmcore and indicate the futility of
> such an exercise. Tools such as 'kexec', 'makedumpfile' and 'crash' are
> also modified in tandem to recognise this new elf-note and capture
> 'slimdump'.
>
> The physical address and size of the NT_NOCOREDUMP are made available to the
> user-space through a "/sys/kernel/nt_nocoredump" sysfs file (just like other
> kexec related files).
Even if kernel has to signal to user space the reason for crash, why not
add this info to existing vmcoreinfo note. Something like another filed.
PANIC_MCE=1.
Secondly, the note name NT_NOCOREDUMP itself sounds binding. Kernel can
export the reason of panic and then it is up to user space what do they
want to do with it.
So to me,
>
> Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 28 ++++++++++++++++++++++++++++
> include/linux/elf.h | 18 ++++++++++++++++++
> include/linux/kexec.h | 1 +
> kernel/kexec.c | 11 +++++++++++
> kernel/ksysfs.c | 10 ++++++++++
> 5 files changed, 68 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 08363b0..483b2fc 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -238,6 +238,34 @@ static atomic_t mce_paniced;
> static int fake_panic;
> static atomic_t mce_fake_paniced;
>
> +void arch_add_nocoredump_note(u32 *buf)
> +{
> + struct elf_note note;
> + const char note_name[] = "PANIC_MCE";
> + const char desc_msg[] = "Crash induced due to a fatal machine "
> + "check error";
> +
Again, note_name and desc_msg seem to be only two exports. Frankly desc
string seems pretty obivious and we should be able to ignore it. So just
exporting PANIC_MCE=true or something like that in case of MCE.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com>
To: "K.Prasad" <prasad@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org, crash-utility@redhat.com,
kexec@lists.infradead.org, Andi Kleen <andi@firstfloor.org>,
"Luck, Tony" <tony.luck@intel.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
anderson@redhat.com, tachibana@mxm.nes.nec.co.jp,
oomichi@mxs.nes.nec.co.jp
Subject: Re: [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump
Date: Tue, 4 Oct 2011 10:30:12 -0400 [thread overview]
Message-ID: <20111004143012.GB28306@redhat.com> (raw)
In-Reply-To: <20111003073203.GA22694@in.ibm.com>
On Mon, Oct 03, 2011 at 01:02:03PM +0530, K.Prasad wrote:
> There are certain types of crashes induced by faulty hardware in which
> capturing crashing kernel's memory (through kdump) makes no sense (or sometimes
> dangerous).
>
> A case in point, is unrecoverable memory errors (resulting in fatal machine
> check exceptions) in which reading from the faulty memory location from the
> kexec'ed kernel will cause double fault and system reset (leaving no
> information for the user).
>
> This patch introduces a framework called 'slimdump' enabled through a new
> elf-note NT_NOCOREDUMP. Any error whose cause cannot be attributed to a
> software error and cannot be detected by analysing the kernel memory may
> decide to add this elf-note to the vmcore and indicate the futility of
> such an exercise. Tools such as 'kexec', 'makedumpfile' and 'crash' are
> also modified in tandem to recognise this new elf-note and capture
> 'slimdump'.
>
> The physical address and size of the NT_NOCOREDUMP are made available to the
> user-space through a "/sys/kernel/nt_nocoredump" sysfs file (just like other
> kexec related files).
Even if kernel has to signal to user space the reason for crash, why not
add this info to existing vmcoreinfo note. Something like another filed.
PANIC_MCE=1.
Secondly, the note name NT_NOCOREDUMP itself sounds binding. Kernel can
export the reason of panic and then it is up to user space what do they
want to do with it.
So to me,
>
> Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 28 ++++++++++++++++++++++++++++
> include/linux/elf.h | 18 ++++++++++++++++++
> include/linux/kexec.h | 1 +
> kernel/kexec.c | 11 +++++++++++
> kernel/ksysfs.c | 10 ++++++++++
> 5 files changed, 68 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 08363b0..483b2fc 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -238,6 +238,34 @@ static atomic_t mce_paniced;
> static int fake_panic;
> static atomic_t mce_fake_paniced;
>
> +void arch_add_nocoredump_note(u32 *buf)
> +{
> + struct elf_note note;
> + const char note_name[] = "PANIC_MCE";
> + const char desc_msg[] = "Crash induced due to a fatal machine "
> + "check error";
> +
Again, note_name and desc_msg seem to be only two exports. Frankly desc
string seems pretty obivious and we should be able to ignore it. So just
exporting PANIC_MCE=true or something like that in case of MCE.
Thanks
Vivek
next prev parent reply other threads:[~2011-10-04 14:30 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-03 7:07 [Patch 0/4] Slimdump framework using NT_NOCOREDUMP elf-note K.Prasad
2011-10-03 7:07 ` K.Prasad
2011-10-03 7:32 ` [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump K.Prasad
2011-10-03 7:32 ` K.Prasad
2011-10-03 10:10 ` Eric W. Biederman
2011-10-03 10:10 ` Eric W. Biederman
2011-10-03 12:03 ` K.Prasad
2011-10-03 12:03 ` K.Prasad
2011-10-04 6:34 ` Borislav Petkov
2011-10-04 6:34 ` Borislav Petkov
2011-10-05 7:07 ` K.Prasad
2011-10-05 7:07 ` K.Prasad
2011-10-05 7:31 ` Borislav Petkov
2011-10-05 7:31 ` Borislav Petkov
2011-10-05 9:47 ` K.Prasad
2011-10-05 9:47 ` K.Prasad
2011-10-05 12:41 ` Borislav Petkov
2011-10-05 12:41 ` Borislav Petkov
2011-10-05 15:52 ` Vivek Goyal
2011-10-05 15:52 ` Vivek Goyal
2011-10-05 16:00 ` Valdis.Kletnieks
2011-10-05 16:16 ` Borislav Petkov
2011-10-05 16:16 ` Borislav Petkov
2011-10-05 17:20 ` Vivek Goyal
2011-10-05 17:20 ` Vivek Goyal
2011-10-05 17:13 ` Vivek Goyal
2011-10-05 17:13 ` Vivek Goyal
2011-10-05 11:55 ` Valdis.Kletnieks
2011-10-05 12:31 ` Borislav Petkov
2011-10-05 12:31 ` Borislav Petkov
2011-10-05 15:19 ` Vivek Goyal
2011-10-05 15:19 ` Vivek Goyal
2011-10-05 15:30 ` Vivek Goyal
2011-10-05 15:30 ` Vivek Goyal
2011-10-03 22:53 ` Luck, Tony
2011-10-03 22:53 ` Luck, Tony
2011-10-04 14:04 ` Vivek Goyal
2011-10-04 14:04 ` Vivek Goyal
2011-10-05 7:18 ` K.Prasad
2011-10-05 7:18 ` K.Prasad
2011-10-05 7:33 ` Borislav Petkov
2011-10-05 7:33 ` Borislav Petkov
2011-10-05 9:23 ` K.Prasad
2011-10-05 9:23 ` K.Prasad
2011-10-05 15:25 ` Vivek Goyal
2011-10-05 15:25 ` Vivek Goyal
2011-10-07 16:12 ` K.Prasad
2011-10-07 16:12 ` K.Prasad
2011-10-10 7:07 ` Borislav Petkov
2011-10-10 7:07 ` Borislav Petkov
2011-10-11 18:44 ` K.Prasad
2011-10-11 18:44 ` K.Prasad
2011-10-11 18:59 ` Luck, Tony
2011-10-11 18:59 ` Luck, Tony
2011-10-12 0:20 ` Andi Kleen
2011-10-12 0:20 ` Andi Kleen
2011-10-12 10:44 ` Borislav Petkov
2011-10-12 10:44 ` Borislav Petkov
2011-10-12 15:59 ` Vivek Goyal
2011-10-12 15:59 ` Vivek Goyal
2011-10-12 15:51 ` Vivek Goyal
2011-10-12 15:51 ` Vivek Goyal
2011-10-14 11:30 ` K.Prasad
2011-10-14 11:30 ` K.Prasad
2011-10-14 14:14 ` Vivek Goyal
2011-10-14 14:14 ` Vivek Goyal
2011-10-18 17:41 ` K.Prasad
2011-10-18 17:41 ` K.Prasad
2011-10-11 18:55 ` Luck, Tony
2011-10-04 14:30 ` Vivek Goyal [this message]
2011-10-04 14:30 ` Vivek Goyal
2011-10-05 7:41 ` K.Prasad
2011-10-05 7:41 ` K.Prasad
2011-10-05 15:40 ` Vivek Goyal
2011-10-05 15:40 ` Vivek Goyal
2011-10-05 15:58 ` Luck, Tony
2011-10-05 16:25 ` Borislav Petkov
2011-10-05 16:25 ` Borislav Petkov
2011-10-05 17:10 ` Vivek Goyal
2011-10-05 17:10 ` Vivek Goyal
2011-10-05 17:20 ` Borislav Petkov
2011-10-05 17:20 ` Borislav Petkov
2011-10-05 17:29 ` Vivek Goyal
2011-10-05 17:29 ` Vivek Goyal
2011-10-05 17:43 ` Borislav Petkov
2011-10-05 17:43 ` Borislav Petkov
2011-10-05 18:00 ` Dave Anderson
2011-10-05 18:00 ` Dave Anderson
2011-10-05 18:09 ` Vivek Goyal
2011-10-05 18:09 ` Vivek Goyal
2011-10-04 15:04 ` Nick Bowler
2011-10-04 15:04 ` Nick Bowler
2011-10-07 16:36 ` K.Prasad
2011-10-07 16:36 ` K.Prasad
2011-10-07 18:19 ` Nick Bowler
2011-10-07 18:19 ` Nick Bowler
2011-10-03 7:35 ` [Patch 2/4][kexec-tools] Recognise NT_NOCOREDUMP elf-note type K.Prasad
2011-10-03 7:35 ` K.Prasad
2011-10-03 7:37 ` [Patch 3/4][makedumpfile] Capture slimdump if elf-note NT_NOCOREDUMP present K.Prasad
2011-10-03 7:37 ` K.Prasad
2011-10-03 7:45 ` [Patch 4/4][crash] Recognise elf-note of type NT_NOCOREDUMP before vmcore analysis K.Prasad
2011-10-03 7:45 ` K.Prasad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111004143012.GB28306@redhat.com \
--to=vgoyal@redhat.com \
--cc=anderson@redhat.com \
--cc=andi@firstfloor.org \
--cc=crash-utility@redhat.com \
--cc=ebiederm@xmission.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oomichi@mxs.nes.nec.co.jp \
--cc=prasad@linux.vnet.ibm.com \
--cc=tachibana@mxm.nes.nec.co.jp \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.