From: Vivek Goyal <vgoyal@redhat.com>
To: Dave Anderson <anderson@redhat.com>
Cc: oomichi@mxs.nes.nec.co.jp, Tony Luck <tony.luck@intel.com>,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
tachibana@mxm.nes.nec.co.jp, Andi Kleen <andi@firstfloor.org>,
Borislav Petkov <bp@alien8.de>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"K.Prasad" <prasad@linux.vnet.ibm.com>,
crash-utility@redhat.com
Subject: Re: [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump
Date: Wed, 5 Oct 2011 14:09:00 -0400 [thread overview]
Message-ID: <20111005180900.GJ30146@redhat.com> (raw)
In-Reply-To: <67cddd75-90d7-49b8-83ff-c677101b37b3@zmail05.collab.prod.int.phx2.redhat.com>
On Wed, Oct 05, 2011 at 02:00:09PM -0400, Dave Anderson wrote:
>
>
> ----- Original Message -----
> > On Wed, Oct 05, 2011 at 07:20:37PM +0200, Borislav Petkov wrote:
> > > On Wed, Oct 05, 2011 at 01:10:07PM -0400, Vivek Goyal wrote:
> > > > On Wed, Oct 05, 2011 at 08:58:53AM -0700, Luck, Tony wrote:
> > > > > > > The plan is to pass-down the list of poisoned memory pages
> > > > > > > to the second
> > > > > > > kernel using an elf-note so that these pages are left
> > > > > > > untouched during
> > > > > > > dump capture. I'm working on an implementation of the same
> > > > > > > and should
> > > > > > > have patches soon.
> > > > > >
> > > > > > I would say let us first figure out what happens while
> > > > > > reading a poisoned
> > > > > > page and is this a problem before working on a solution.
> > > > >
> > > > > If the page is poisoned because of a real uncorrectable error
> > > > > in memory
> > > > > (reported as SRAO machine check today, or by SRAR
> > > > > real-soon-now). Then
> > > > > accessing the page from the processor while taking a memory
> > > > > dump will
> > > > > result in a machine check.
> > > > >
> > > > > Note that a large memory system that had been running for a
> > > > > long time
> > > > > may have built up a small stash of these land-mine pages - and
> > > > > we need
> > > > > to worry about them even in the case where the panic is not
> > > > > machine
> > > > > check related (in fact especially in this case ... we are in a
> > > > > case
> > > > > where we actually do want the dump to diagnose the cause of the
> > > > > panic,
> > > > > and we don't want to risk losing the crash dump because we
> > > > > aborted when
> > > > > touching a page that the OS had safely avoided for
> > > > > days/weeks/months).
> > > > >
> > > > > So passing a list of poisoned pages from the old kernel to the
> > > > > new kernel
> > > > > is a good idea - and is independent of the cause of the crash
> > > > > (except that
> > > > > in the fatal machine check case due to memory error the list is
> > > > > guaranteed
> > > > > to be non-empty).
> > > >
> > > > Whre is this poisoned page info stored? In struct page? If yes, then
> > > > user space can walk through it and make sure not to touch poisoned pages.
> > > > Anyway user space filtering utility "makedumpfile" walks through struct
> > > > pages to filter out the pages. It should be able to filter out
> > > > poisoned pages unconditionally. So there should be no need for kernel
> > > > to export a list of these pages.
> > >
> > > Does this utility work on a vmcore dump? If so, Tony refers to the
> > > creation of the vmcore itself from the memory used by the first
> > > kernel.
> >
> > No, this utitlity can directly work on /proc/vmcore where first kernel's
> > image is still in memory and not on disk.
> >
> > > If there are poisoned pages, merely accessing that portion of DRAM
> > > containing the poisoned data would cause further MCEs in the freshly
> > > booted kernel so you won't be able to finish creating the dump.
> >
> > As long as you can get to your struct page arrays, one should be able
> > to filter out poisoned pages without saving the whole dump.
>
> It's still going to require a minimal kernel change because the
> PG_hwpoison flag's bit number differs depending upon the kernel
> configuration, if it exists at all. An additional vmcoreinfo item
> probably...
>
Yes, that kind of information we can export along with other info
in vmcoreinfo.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2011-10-05 18:09 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-03 7:07 [Patch 0/4] Slimdump framework using NT_NOCOREDUMP elf-note K.Prasad
2011-10-03 7:32 ` [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump K.Prasad
2011-10-03 10:10 ` Eric W. Biederman
2011-10-03 12:03 ` K.Prasad
2011-10-04 6:34 ` Borislav Petkov
2011-10-05 7:07 ` K.Prasad
2011-10-05 7:31 ` Borislav Petkov
2011-10-05 9:47 ` K.Prasad
2011-10-05 12:41 ` Borislav Petkov
2011-10-05 15:52 ` Vivek Goyal
[not found] ` <10327.1317830438@turing-police.cc.vt.edu>
2011-10-05 16:16 ` Borislav Petkov
2011-10-05 17:20 ` Vivek Goyal
2011-10-05 17:13 ` Vivek Goyal
[not found] ` <26571.1317815746@turing-police.cc.vt.edu>
2011-10-05 12:31 ` Borislav Petkov
2011-10-05 15:19 ` Vivek Goyal
2011-10-05 15:30 ` Vivek Goyal
2011-10-03 22:53 ` Luck, Tony
2011-10-04 14:04 ` Vivek Goyal
2011-10-05 7:18 ` K.Prasad
2011-10-05 7:33 ` Borislav Petkov
2011-10-05 9:23 ` K.Prasad
2011-10-05 15:25 ` Vivek Goyal
2011-10-07 16:12 ` K.Prasad
2011-10-10 7:07 ` Borislav Petkov
2011-10-11 18:44 ` K.Prasad
2011-10-11 18:59 ` Luck, Tony
2011-10-12 0:20 ` Andi Kleen
2011-10-12 10:44 ` Borislav Petkov
2011-10-12 15:59 ` Vivek Goyal
2011-10-12 15:51 ` Vivek Goyal
2011-10-14 11:30 ` K.Prasad
2011-10-14 14:14 ` Vivek Goyal
2011-10-18 17:41 ` K.Prasad
2011-10-11 18:55 ` Luck, Tony
2011-10-04 14:30 ` Vivek Goyal
2011-10-05 7:41 ` K.Prasad
2011-10-05 15:40 ` Vivek Goyal
2011-10-05 15:58 ` Luck, Tony
2011-10-05 16:25 ` Borislav Petkov
2011-10-05 17:10 ` Vivek Goyal
2011-10-05 17:20 ` Borislav Petkov
2011-10-05 17:29 ` Vivek Goyal
2011-10-05 17:43 ` Borislav Petkov
2011-10-05 18:00 ` Dave Anderson
2011-10-05 18:09 ` Vivek Goyal [this message]
2011-10-04 15:04 ` Nick Bowler
2011-10-07 16:36 ` K.Prasad
2011-10-07 18:19 ` Nick Bowler
2011-10-03 7:35 ` [Patch 2/4][kexec-tools] Recognise NT_NOCOREDUMP elf-note type K.Prasad
2011-10-03 7:37 ` [Patch 3/4][makedumpfile] Capture slimdump if elf-note NT_NOCOREDUMP present K.Prasad
2011-10-03 7:45 ` [Patch 4/4][crash] Recognise elf-note of type NT_NOCOREDUMP before vmcore analysis K.Prasad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111005180900.GJ30146@redhat.com \
--to=vgoyal@redhat.com \
--cc=anderson@redhat.com \
--cc=andi@firstfloor.org \
--cc=bp@alien8.de \
--cc=crash-utility@redhat.com \
--cc=ebiederm@xmission.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oomichi@mxs.nes.nec.co.jp \
--cc=prasad@linux.vnet.ibm.com \
--cc=tachibana@mxm.nes.nec.co.jp \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox