From: Wu Fengguang <fengguang.wu@intel.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Nick Piggin <npiggin@suse.de>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Andi Kleen <andi@firstfloor.org>,
"riel@redhat.com" <riel@redhat.com>,
"chris.mason@oracle.com" <chris.mason@oracle.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH 1/5] HWPOISON: define VM_FAULT_HWPOISON to 0 when feature is disabled
Date: Fri, 12 Jun 2009 20:57:41 +0800 [thread overview]
Message-ID: <20090612125741.GA6140@localhost> (raw)
In-Reply-To: <20090612112258.GA14123@elte.hu>
Hi Ingo,
On Fri, Jun 12, 2009 at 07:22:58PM +0800, Ingo Molnar wrote:
>
> * Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > So as to eliminate one #ifdef in the c source.
> >
> > Proposed by Nick Piggin.
> >
> > CC: Nick Piggin <npiggin@suse.de>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> > arch/x86/mm/fault.c | 3 +--
> > include/linux/mm.h | 7 ++++++-
> > 2 files changed, 7 insertions(+), 3 deletions(-)
> >
> > --- sound-2.6.orig/arch/x86/mm/fault.c
> > +++ sound-2.6/arch/x86/mm/fault.c
> > @@ -819,14 +819,13 @@ do_sigbus(struct pt_regs *regs, unsigned
> > tsk->thread.error_code = error_code;
> > tsk->thread.trap_no = 14;
> >
> > -#ifdef CONFIG_MEMORY_FAILURE
> > if (fault & VM_FAULT_HWPOISON) {
> > printk(KERN_ERR
> > "MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> > tsk->comm, tsk->pid, address);
> > code = BUS_MCEERR_AR;
> > }
> > -#endif
>
> Btw., anything like this should happen in close cooperation with the
> x86 tree, not as some pure MM feature. I dont see Cc:s and nothing
> that indicates that realization. What's going on here?
Ah sorry for the ignorance! Andi has a nice overview of the big
picture here: http://lkml.org/lkml/2009/6/3/371
In the above chunk, the process is trying to access the already
corrupted page and thus shall be killed, otherwise it will either
silently consume corrupted data, or will trigger another (deadly)
MCE event and bring down the whole machine.
VM_FAULT_HWPOISON is tagged by the hwpoison code to indicate that the
previously mapped page contains corrupted data, and is unrecoverable
because there are no valid on-disk copy that can be reloaded.
> It is not at all clear to me whether propagating hardware failures
> this widely is desired from a general design POV. Most desktop
> hardware wont give a damn about this (and if a hardware fault
> happens you want to get as far from the crappy hardware as possible)
> so i'm not sure how relevant it is and how well tested it will
> become in practice.
Intel Nehalem-EX will have this feature, and is going to ship in
volume servers in the coming years. Given that the servers may
well be equipped with tons of memory, memory failures (especially
soft errors http://en.wikipedia.org/wiki/Soft_error) become
un-ignorable.
Sunspot Maximum is underway by 2011 and we must be prepared for it ;)
> I.e. really some wider discussion needs to happen on this.
OK.
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-06-12 12:56 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-11 14:22 [PATCH 0/5] [RFC] HWPOISON incremental fixes Wu Fengguang
2009-06-11 14:22 ` [PATCH 1/5] HWPOISON: define VM_FAULT_HWPOISON to 0 when feature is disabled Wu Fengguang
2009-06-11 15:44 ` Rik van Riel
2009-06-12 10:00 ` Andi Kleen
2009-06-12 13:15 ` Wu Fengguang
2009-06-12 11:22 ` Ingo Molnar
2009-06-12 12:57 ` Wu Fengguang [this message]
2009-06-12 13:17 ` Ingo Molnar
2009-06-12 13:33 ` Wu Fengguang
2009-06-12 15:36 ` Ingo Molnar
2009-06-12 16:14 ` Wu Fengguang
2009-06-12 18:07 ` Alan Cox
2009-06-12 17:55 ` Theodore Tso
2009-06-12 13:58 ` Andi Kleen
2009-06-12 15:28 ` Linus Torvalds
2009-06-12 15:35 ` Ingo Molnar
2009-06-12 16:05 ` Rik van Riel
2009-06-12 16:37 ` H. Peter Anvin
2009-06-12 16:48 ` Ingo Molnar
2009-06-15 7:04 ` Nick Piggin
2009-06-15 6:52 ` Nick Piggin
2009-06-16 20:27 ` Russ Anderson
2009-06-17 7:51 ` Nick Piggin
2009-06-12 15:45 ` Ingo Molnar
2009-06-12 16:12 ` Linus Torvalds
2009-06-11 14:22 ` [PATCH 2/5] HWPOISON: fix tasklist_lock/anon_vma locking order Wu Fengguang
2009-06-11 15:59 ` Rik van Riel
2009-06-12 10:03 ` Andi Kleen
2009-06-12 10:07 ` Nick Piggin
2009-06-12 13:27 ` Wu Fengguang
2009-06-12 14:04 ` Wu Fengguang
2009-06-11 14:22 ` [PATCH 3/5] HWPOISON: remove early kill option for now Wu Fengguang
2009-06-11 16:06 ` Rik van Riel
2009-06-12 9:59 ` Andi Kleen
2009-06-11 14:22 ` [PATCH 4/5] HWPOISON: report sticky EIO for poisoned file Wu Fengguang
2009-06-11 16:31 ` Rik van Riel
2009-06-12 10:07 ` Andi Kleen
2009-06-12 13:41 ` Wu Fengguang
2009-06-11 14:22 ` [PATCH 5/5] HWPOISON: use the safer invalidate page for possible metadata pages Wu Fengguang
2009-06-11 16:36 ` Rik van Riel
2009-06-12 10:56 ` [PATCH 0/5] [RFC] HWPOISON incremental fixes Andi Kleen
2009-06-12 13:59 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090612125741.GA6140@localhost \
--to=fengguang.wu@intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=chris.mason@oracle.com \
--cc=hpa@zytor.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).