From: Russ Anderson <rja@sgi.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>,
Linus Torvalds <torvalds@linux-foundation.org>,
Wu Fengguang <fengguang.wu@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Andi Kleen <andi@firstfloor.org>,
"riel@redhat.com" <riel@redhat.com>,
"chris.mason@oracle.com" <chris.mason@oracle.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
rja@sgi.com
Subject: Re: [PATCH 1/5] HWPOISON: define VM_FAULT_HWPOISON to 0 when feature is disabled
Date: Tue, 16 Jun 2009 15:27:26 -0500 [thread overview]
Message-ID: <20090616202726.GB31443@sgi.com> (raw)
In-Reply-To: <20090615065232.GC18390@wotan.suse.de>
On Mon, Jun 15, 2009 at 08:52:32AM +0200, Nick Piggin wrote:
> On Fri, Jun 12, 2009 at 05:35:01PM +0200, Ingo Molnar wrote:
> > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > On Fri, 12 Jun 2009, Ingo Molnar wrote:
> > > >
> > > > This seems like trying to handle a failure mode that cannot be
> > > > and shouldnt be 'handled' really. If there's an 'already
> > > > corrupted' page then the box should go down hard and fast, and
> > > > we should not risk _even more user data corruption_ by trying to
> > > > 'continue' in the hope of having hit some 'harmless' user
> > > > process that can be killed ...
> > >
> > > No, the box should _not_ go down hard-and-fast. That's the last
> > > thing we should *ever* do.
> > >
> > > We need to log it. Often at a user level (ie we want to make sure
> > > it actually hits syslog, possibly goes out the network, maybe pops
> > > up a window, whatever).
> > >
> > > Shutting down the machine is the last thing we ever want to do.
> > >
> > > The whole "let's panic" mentality is a disease.
> >
> > No doubt about that - and i'm removing BUG_ON()s and panic()s
> > wherever i can and havent added a single new one myself in the past
> > 5 years or so, its a disease.
>
> In HA failover systems you often do want to panic ASAP (after logging
> to serial cosole I guess) if anything like this happens so the system
> can be rebooted with minimal chance of data corruption spreading.
The whole point of hardware data poisoning is to avoid having to
panic the system due to the potential of undetected data corruption,
because the corrupt data is always marked bad. This has worked
well on ia64 where applications that encounter bad data are killed
and the memory poisoned and not reallocated, avoiding a system panic.
This has been used at customer sites for a few years. The type
customers that really check their data. It is nice to see
the hardware poison feature moving to the x86 "mainstream".
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@sgi.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-06-16 20:26 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-11 14:22 [PATCH 0/5] [RFC] HWPOISON incremental fixes Wu Fengguang
2009-06-11 14:22 ` [PATCH 1/5] HWPOISON: define VM_FAULT_HWPOISON to 0 when feature is disabled Wu Fengguang
2009-06-11 15:44 ` Rik van Riel
2009-06-12 10:00 ` Andi Kleen
2009-06-12 13:15 ` Wu Fengguang
2009-06-12 11:22 ` Ingo Molnar
2009-06-12 12:57 ` Wu Fengguang
2009-06-12 13:17 ` Ingo Molnar
2009-06-12 13:33 ` Wu Fengguang
2009-06-12 15:36 ` Ingo Molnar
2009-06-12 16:14 ` Wu Fengguang
2009-06-12 18:07 ` Alan Cox
2009-06-12 17:55 ` Theodore Tso
2009-06-12 13:58 ` Andi Kleen
2009-06-12 15:28 ` Linus Torvalds
2009-06-12 15:35 ` Ingo Molnar
2009-06-12 16:05 ` Rik van Riel
2009-06-12 16:37 ` H. Peter Anvin
2009-06-12 16:48 ` Ingo Molnar
2009-06-15 7:04 ` Nick Piggin
2009-06-15 6:52 ` Nick Piggin
2009-06-16 20:27 ` Russ Anderson [this message]
2009-06-17 7:51 ` Nick Piggin
2009-06-12 15:45 ` Ingo Molnar
2009-06-12 16:12 ` Linus Torvalds
2009-06-11 14:22 ` [PATCH 2/5] HWPOISON: fix tasklist_lock/anon_vma locking order Wu Fengguang
2009-06-11 15:59 ` Rik van Riel
2009-06-12 10:03 ` Andi Kleen
2009-06-12 10:07 ` Nick Piggin
2009-06-12 13:27 ` Wu Fengguang
2009-06-12 14:04 ` Wu Fengguang
2009-06-11 14:22 ` [PATCH 3/5] HWPOISON: remove early kill option for now Wu Fengguang
2009-06-11 16:06 ` Rik van Riel
2009-06-12 9:59 ` Andi Kleen
2009-06-11 14:22 ` [PATCH 4/5] HWPOISON: report sticky EIO for poisoned file Wu Fengguang
2009-06-11 16:31 ` Rik van Riel
2009-06-12 10:07 ` Andi Kleen
2009-06-12 13:41 ` Wu Fengguang
2009-06-11 14:22 ` [PATCH 5/5] HWPOISON: use the safer invalidate page for possible metadata pages Wu Fengguang
2009-06-11 16:36 ` Rik van Riel
2009-06-12 10:56 ` [PATCH 0/5] [RFC] HWPOISON incremental fixes Andi Kleen
2009-06-12 13:59 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090616202726.GB31443@sgi.com \
--to=rja@sgi.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=chris.mason@oracle.com \
--cc=fengguang.wu@intel.com \
--cc=hpa@zytor.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).