linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Nick Piggin <npiggin@suse.de>, Andi Kleen <andi@firstfloor.org>,
	"riel@redhat.com" <riel@redhat.com>,
	"chris.mason@oracle.com" <chris.mason@oracle.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH 0/5] [RFC] HWPOISON incremental fixes
Date: Fri, 12 Jun 2009 12:56:10 +0200	[thread overview]
Message-ID: <20090612105610.GK25568@one.firstfloor.org> (raw)
In-Reply-To: <20090611142239.192891591@intel.com>

On Thu, Jun 11, 2009 at 10:22:39PM +0800, Wu Fengguang wrote:
> Hi all,
> 
> Here are the hwpoison fixes that aims to address Nick and Hugh's concerns.
> Note that
> - the early kill option is dropped for .31. It's obscure option and complex
>   code and is not must have for .31. Maybe Andi also aims this option for
>   notifying KVM, but right now KVM is not ready to handle that.

KVM is ready to handle it, patches for that have been submitted and
are queued.

Also without early kill it's not really possible right now to recover
in the guest. Also for some other scenarios early kill is much easier
to handle than late kill: for late kill you always have to bail
out of your current execution context, while early kill that can be 
done out of line (e.g. by just dropping a corrupted object similar to 
what the kernel does). That's a much nicer and gentle model than late
kill.

Of course very few programs will try to handle this, but if any does
it's better to make it easier for them. 

That we send too many signals in a few cases is not fatal right now
I think. Remember always the alternative is to die completely.

So please don't drop that code right now.


> - It seems that even fsync() processes are not easy to catch, so I abandoned
>   the SIGKILL on fsync() idea. Instead, I choose to fail any attempt to
>   populate the poisoned file with new pages, so that the corrupted page offset
>   won't be repopulated with outdated data. This seems to be a safe way to allow
>   the process to continue running while still be able to promise good (but not
>   complete) data consistency.

The fsync() error reporting is already broken anyways, even without hwpoison,
for metadata errors which also only rely on the address space bit and not the
page and run into all the same problems.

I don't think we need to be better here than normal metadata.

Possibly if metadata can be fixed then hwpoison will be fixed too in the
same pass. But that's something longer term.

> - I didn't implement the PANIC-on-corrupted-data option. Instead, I guess
>   sending uevent notification to user space will be a more flexible scheme?

Normally you can get very aggressive panics by setting the x86 mce tolerant 
modus to 0 (default is 1); i suspect that will be good enough.

If other architectures add hwpoison support presumably they can add
a similar tunable.

Doing that in the low level handler is better than in the high level
VM because there are some corruption cases which are not reported
to high level (e.g. not affecting memory directly)

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-06-12 10:46 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-11 14:22 [PATCH 0/5] [RFC] HWPOISON incremental fixes Wu Fengguang
2009-06-11 14:22 ` [PATCH 1/5] HWPOISON: define VM_FAULT_HWPOISON to 0 when feature is disabled Wu Fengguang
2009-06-11 15:44   ` Rik van Riel
2009-06-12 10:00   ` Andi Kleen
2009-06-12 13:15     ` Wu Fengguang
2009-06-12 11:22   ` Ingo Molnar
2009-06-12 12:57     ` Wu Fengguang
2009-06-12 13:17       ` Ingo Molnar
2009-06-12 13:33         ` Wu Fengguang
2009-06-12 15:36           ` Ingo Molnar
2009-06-12 16:14             ` Wu Fengguang
2009-06-12 18:07               ` Alan Cox
2009-06-12 17:55             ` Theodore Tso
2009-06-12 13:58         ` Andi Kleen
2009-06-12 15:28         ` Linus Torvalds
2009-06-12 15:35           ` Ingo Molnar
2009-06-12 16:05             ` Rik van Riel
2009-06-12 16:37             ` H. Peter Anvin
2009-06-12 16:48               ` Ingo Molnar
2009-06-15  7:04               ` Nick Piggin
2009-06-15  6:52             ` Nick Piggin
2009-06-16 20:27               ` Russ Anderson
2009-06-17  7:51                 ` Nick Piggin
2009-06-12 15:45         ` Ingo Molnar
2009-06-12 16:12           ` Linus Torvalds
2009-06-11 14:22 ` [PATCH 2/5] HWPOISON: fix tasklist_lock/anon_vma locking order Wu Fengguang
2009-06-11 15:59   ` Rik van Riel
2009-06-12 10:03   ` Andi Kleen
2009-06-12 10:07     ` Nick Piggin
2009-06-12 13:27     ` Wu Fengguang
2009-06-12 14:04       ` Wu Fengguang
2009-06-11 14:22 ` [PATCH 3/5] HWPOISON: remove early kill option for now Wu Fengguang
2009-06-11 16:06   ` Rik van Riel
2009-06-12  9:59   ` Andi Kleen
2009-06-11 14:22 ` [PATCH 4/5] HWPOISON: report sticky EIO for poisoned file Wu Fengguang
2009-06-11 16:31   ` Rik van Riel
2009-06-12 10:07   ` Andi Kleen
2009-06-12 13:41     ` Wu Fengguang
2009-06-11 14:22 ` [PATCH 5/5] HWPOISON: use the safer invalidate page for possible metadata pages Wu Fengguang
2009-06-11 16:36   ` Rik van Riel
2009-06-12 10:56 ` Andi Kleen [this message]
2009-06-12 13:59   ` [PATCH 0/5] [RFC] HWPOISON incremental fixes Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090612105610.GK25568@one.firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=fengguang.wu@intel.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).