From: "K.Prasad" <prasad@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org, crash-utility@redhat.com,
kexec@lists.infradead.org
Cc: oomichi@mxs.nes.nec.co.jp, "Luck, Tony" <tony.luck@intel.com>,
tachibana@mxm.nes.nec.co.jp, Andi Kleen <andi@firstfloor.org>,
anderson@redhat.com, "Eric W. Biederman" <ebiederm@xmission.com>,
Vivek Goyal <vgoyal@redhat.com>
Subject: [Patch 0/4] Slimdump framework using NT_NOCOREDUMP elf-note
Date: Mon, 3 Oct 2011 12:37:35 +0530 [thread overview]
Message-ID: <20111003070735.GJ2223@in.ibm.com> (raw)
Hi All,
Please find a set of patches that introduce a 'slimdump'
framework. Details as described below.
Problem
--------
A system configured with kdump, captures the kernel memory
for all types of crashes even when it doesn't make much sense to do so.
For instance, system crashes triggered due to hardware errors don't need
a complete dump of the memory for investigation.
In the case of crashes triggered by fatal machine check exceptions (MCE)
due to unrecoverable memory errors, it is even dangerous to read the
crashing kernel's memory. When the kexec kernel reads the crashing
kernel's memory, it 'consumes' the data from the faulty memory location,
potentially causing a recursion of faults.
This problem was previously discussed in the kernel community, with a
proposal to leave out kernel memory regions from /proc/vmcore (refer:
mail threads pertaining to
http://article.gmane.org/gmane.linux.kernel/1148266). However there were
suggestions against making this behaviour a kernel policy.
Solution
---------
Since capturing of crashing kernel's memory for hardware error induced
crashes isn't required or is dangerous, we introduce a mechanism to
generate 'slimdump'.
Basically, a new elf-note of type NT_NOCOREDUMP type is added by the
kernel to the vmcore, which is recognised by all tools in the kdump chain
to generate and save a 'slimdump' that contains only elf-headers and the
elf-note section. The elf-note section may be used to add description
about the cause of the error.
The enclosed set of patches make changes to kernel, kexec, makedumpfile
and crash tool to make them recognise the NT_NOCOREDUMP elf-note and
generate a 'slimdump'. Also, fatal MCEs in the kernel is turned into a
consumer of the slimdump mechanism to prevent collection of normal
kdump.
Alternatively, the user has an option (through suitable makedumpfile or
kdump configuration options) to collect the complete vmcore or to
extract the 'dmesg' from /proc/vmcore.
Screen logs
-------------
# mce-inject ~/mce/mce-test/cases/soft-inj/panic_ucr/data/srar_over
[ 4934.748416] [Hardware Error]: CPU 0: Machine Check Exception: 6 Bank
2: f580000000000000
[ 4934.749079] [Hardware Error]: RIP 73:<000000001eadbabe>
[ 4934.749079] [Hardware Error]: TSC ef029a23417 ADDR 1234
[ 4934.749079] [Hardware Error]: PROCESSOR 0:663 TIME 1317149322 SOCKET
0 APIC 0
[ 4934.749079] [Hardware Error]: Run the above through 'mcelog --ascii'
[ 4934.749079] [Hardware Error]: Machine check: Overflowed uncorrected
[ 4934.749079] Kernel panic - not syncing: Fatal machine check on
current CPU
[ 4934.749079] Pid: 1379, comm: mce-inject Tainted: G M
3.1.0-rc4.slimdump+ #34
[ 4934.749079] Call Trace:
[ 4934.749079] [<ffffffff81084922>] panic+0xbc/0x1cf
[ 4934.749079] [<ffffffff810858ff>] ? printk+0x6c/0x6e
[ 4934.749079] [<ffffffff8104c43b>] mce_panic+0x187/0x1a4
[ 4934.749079] [<ffffffff8104d525>] do_machine_check+0x5ec/0x6c3
[ 4934.749079] [<ffffffff8104e4e1>] raise_exception+0x5c/0x84
[ 4934.749079] [<ffffffff8104e5e9>] raise_local+0x5a/0xcc
[ 4934.749079] [<ffffffff8104e8ee>] mce_write+0x218/0x24e
[ 4934.749079] [<ffffffff8115abee>] vfs_write+0xb0/0x108
[ 4934.749079] [<ffffffff8115ad0a>] sys_write+0x4c/0x71
[ 4934.749079] [<ffffffff815bf12b>] system_call_fastpath+0x16/0x1b
[ 0.817861] kvm: no hardware support
..............
................
.................
# ls
vmcore
# ls -lh vmcore
-r-------- 1 root root 1.8G Sep 27 13:20 vmcore
# ~/makedumpfile.slimdump/makedumpfile vmcore vmcore.makedumpfile.review
The kernel version is not supported.
The created dumpfile may be incomplete.
Copying data : [100 %]
The dumpfile is saved to vmcore.makedumpfile.review.
makedumpfile Completed.
# ls -lh vmcore.makedumpfile.review
-rw------- 1 root root 3.9K Sep 28 01:40 vmcore.makedumpfile.review
# eu-readelf -n
vmcore.makedumpfile.review
Note segment of 3592 bytes at offset 0x158:
Owner Data size Type
CORE 336 PRSTATUS
info.si_signo: 0, info.si_code: 0, info.si_errno: 0, cursig: 0
sigpend: <>
..........
.............
.........
NUMBER(PG_private)=11
NUMBER(PG_swapcache)=16
SYMBOL(phys_base)=ffffffff81a0e010
SYMBOL(init_level4_pgt)=ffffffff81a06000
SYMBOL(node_data)=ffffffff81b70b80
LENGTH(node_data)=512
CRASHTIME=1317621133
PANIC_MCE 49 <unknown>: 21
# crash -S ~/linux-2.6.slimdump/System.map ~/linux-2.6.slimdump/vmlinux vmcore.makedumpfile.review
crash 5.1.8
Copyright (C) 2002-2011 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public
License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for
details.
crash: overriding /boot/System.map with
/home/prasadkr/linux-2.6.slimdump/System.map
"System crashed due to a hardware memory error. No coredump available."
Nocoredump Reason: PANIC_MCE
crash: Elf64_Phdr pointer: 1c46170 ELF header end: 1c46130
-------
Thanks,
K.Prasad
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next reply other threads:[~2011-10-03 7:09 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-03 7:07 K.Prasad [this message]
2011-10-03 7:32 ` [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump K.Prasad
2011-10-03 10:10 ` Eric W. Biederman
2011-10-03 12:03 ` K.Prasad
2011-10-04 6:34 ` Borislav Petkov
2011-10-05 7:07 ` K.Prasad
2011-10-05 7:31 ` Borislav Petkov
2011-10-05 9:47 ` K.Prasad
2011-10-05 12:41 ` Borislav Petkov
2011-10-05 15:52 ` Vivek Goyal
[not found] ` <10327.1317830438@turing-police.cc.vt.edu>
2011-10-05 16:16 ` Borislav Petkov
2011-10-05 17:20 ` Vivek Goyal
2011-10-05 17:13 ` Vivek Goyal
[not found] ` <26571.1317815746@turing-police.cc.vt.edu>
2011-10-05 12:31 ` Borislav Petkov
2011-10-05 15:19 ` Vivek Goyal
2011-10-05 15:30 ` Vivek Goyal
2011-10-03 22:53 ` Luck, Tony
2011-10-04 14:04 ` Vivek Goyal
2011-10-05 7:18 ` K.Prasad
2011-10-05 7:33 ` Borislav Petkov
2011-10-05 9:23 ` K.Prasad
2011-10-05 15:25 ` Vivek Goyal
2011-10-07 16:12 ` K.Prasad
2011-10-10 7:07 ` Borislav Petkov
2011-10-11 18:44 ` K.Prasad
2011-10-11 18:59 ` Luck, Tony
2011-10-12 0:20 ` Andi Kleen
2011-10-12 10:44 ` Borislav Petkov
2011-10-12 15:59 ` Vivek Goyal
2011-10-12 15:51 ` Vivek Goyal
2011-10-14 11:30 ` K.Prasad
2011-10-14 14:14 ` Vivek Goyal
2011-10-18 17:41 ` K.Prasad
2011-10-11 18:55 ` Luck, Tony
2011-10-04 14:30 ` Vivek Goyal
2011-10-05 7:41 ` K.Prasad
2011-10-05 15:40 ` Vivek Goyal
2011-10-05 15:58 ` Luck, Tony
2011-10-05 16:25 ` Borislav Petkov
2011-10-05 17:10 ` Vivek Goyal
2011-10-05 17:20 ` Borislav Petkov
2011-10-05 17:29 ` Vivek Goyal
2011-10-05 17:43 ` Borislav Petkov
2011-10-05 18:00 ` Dave Anderson
2011-10-05 18:09 ` Vivek Goyal
2011-10-04 15:04 ` Nick Bowler
2011-10-07 16:36 ` K.Prasad
2011-10-07 18:19 ` Nick Bowler
2011-10-03 7:35 ` [Patch 2/4][kexec-tools] Recognise NT_NOCOREDUMP elf-note type K.Prasad
2011-10-03 7:37 ` [Patch 3/4][makedumpfile] Capture slimdump if elf-note NT_NOCOREDUMP present K.Prasad
2011-10-03 7:45 ` [Patch 4/4][crash] Recognise elf-note of type NT_NOCOREDUMP before vmcore analysis K.Prasad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111003070735.GJ2223@in.ibm.com \
--to=prasad@linux.vnet.ibm.com \
--cc=anderson@redhat.com \
--cc=andi@firstfloor.org \
--cc=crash-utility@redhat.com \
--cc=ebiederm@xmission.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oomichi@mxs.nes.nec.co.jp \
--cc=tachibana@mxm.nes.nec.co.jp \
--cc=tony.luck@intel.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox