From: Wu Fengguang <fengguang.wu@intel.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>,
"hugh@veritas.com" <hugh@veritas.com>,
"riel@redhat.com" <riel@redhat.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"chris.mason@oracle.com" <chris.mason@oracle.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3
Date: Tue, 2 Jun 2009 21:30:19 +0800 [thread overview]
Message-ID: <20090602133019.GC20462@localhost> (raw)
In-Reply-To: <20090602123720.GF1392@wotan.suse.de>
On Tue, Jun 02, 2009 at 08:37:20PM +0800, Nick Piggin wrote:
> On Tue, Jun 02, 2009 at 02:34:50PM +0200, Andi Kleen wrote:
> > On Tue, Jun 02, 2009 at 02:10:31PM +0200, Nick Piggin wrote:
> > > > It's not, there are various differences (like the reference count)
> > >
> > > No. If there are, then it *really* needs better documentation. I
> > > don't think there are, though.
> >
> > Better documentation on what? You want a detailed listing in a comment
> > how it is different from truncate?
> >
> > To be honest I have some doubts of the usefulness of such a comment
> > (why stop at truncate and not list the differences to every other
> > page cache operation? @) but if you're insist (do you?) I can add one.
>
> Because I don't see any difference (see my previous patch). I
> still don't know what it is supposed to be doing differently.
> So if you reinvent your own that looks close enough to truncate
> to warrant a comment to say /* this is close to truncate but
> not quite */, then yes I insist that you say exactly why it is
> not quite like truncate ;)
The truncate topic goes boring. EIO is more interesting and imminent, hehe.
> > > I'm suggesting that EIO is traditionally for when the data still
> > > dirty in pagecache and was not able to get back to backing
> > > store. Do you deny that?
> >
> > Yes. That is exactly the case when memory-failure triggers EIO
> >
> > Memory error on a dirty file mapped page.
>
> But it is no longer dirty, and the problem was not that the data
> was unable to be written back.
Or rather, cannot be written back ;)
> > > And I think the application might try to handle the case of a
> > > page becoming corrupted differently. Do you deny that?
> >
> > You mean a clean file-mapped page? In this case there is no EIO,
> > memory-failure just drops the page and it is reloaded.
> >
> > If the page is dirty we trigger EIO which as you said above is the
> > right reaction.
>
> No I mean the difference between the case of dirty page unable to
> be written to backing sotre, and the case of dirty page becoming
> corrupted.
legacy EIO: may success on (do something then) retry?
hwpoison EIO: a permanent unrecoverable error
> > > OK, given the range of errors that APIs are defined to return,
> > > then maybe EIO is the best option. I don't suppose it is possible
> > > to expand them to return something else?
> >
> > Expand the syscalls to return other errnos on specific
> > kinds of IO error?
> >
> > Of course that's possible, but it has the problem that you
> > would need to fix all the applications that expect EIO for
> > IO error. The later I consider infeasible.
>
> They would presumably exit or do some default thing, which I
> think would be fine. Actually if your code catches them in the
> act of manipulating a corrupted page (ie. if it is mmapped),
> then it gets a SIGBUS.
That's OK. filemap_fault() returns VM_FAULT_SIGBUS for legacy EIO,
while hwpoison pages will return VM_FAULT_HWPOISON. Both kills the
application I guess?
read()/write() are the more interesting cases.
With read IO interception, the read() call will succeed.
The write() call have to be failed. But interestingly writes are
mostly delayed ones, and we have only one AS_EIO bit for the entire
file, which will be cleared after the EIO reporting. And the poisoned
page will be isolated (if succeed) and later read()/write() calls
won't even notice there was a poisoned page!
How are we going to fix this mess? EIO errors seem to be fuzzy and
temporary by nature at least in the current implementation, and hard
to be improved to be exact and/or permanent in both implementation and
interface:
- can/shall we remember the exact EIO page? maybe not.
- can EIO reporting be permanent? sounds like a horrible user interface..
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-06-03 14:32 UTC|newest]
Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-27 20:12 [PATCH] [0/16] HWPOISON: Intro Andi Kleen
2009-05-27 20:12 ` [PATCH] [1/16] HWPOISON: Add page flag for poisoned pages Andi Kleen
2009-05-27 20:35 ` Larry H.
2009-05-27 21:15 ` Alan Cox
2009-05-28 7:54 ` Andi Kleen
2009-05-29 16:10 ` Rik van Riel
2009-05-29 16:37 ` Andi Kleen
2009-05-29 16:34 ` Rik van Riel
2009-05-29 18:24 ` Andi Kleen
2009-05-29 18:26 ` Rik van Riel
2009-05-29 18:42 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [2/16] HWPOISON: Export poison flag in /proc/kpageflags Andi Kleen
2009-05-29 16:37 ` Rik van Riel
2009-05-27 20:12 ` [PATCH] [3/16] HWPOISON: Export some rmap vma locking to outside world Andi Kleen
2009-05-27 20:12 ` [PATCH] [4/16] HWPOISON: Add support for poison swap entries v2 Andi Kleen
2009-05-28 8:46 ` Hidehiro Kawai
2009-05-28 9:11 ` Wu Fengguang
2009-05-28 10:42 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [5/16] HWPOISON: Add new SIGBUS error codes for hardware poison signals Andi Kleen
2009-05-27 20:12 ` [PATCH] [6/16] HWPOISON: Add basic support for poisoned pages in fault handler v2 Andi Kleen
2009-05-29 4:15 ` Hidehiro Kawai
2009-05-29 6:28 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [7/16] HWPOISON: Add various poison checks in mm/memory.c Andi Kleen
2009-05-27 20:12 ` [PATCH] [8/16] HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler Andi Kleen
2009-05-27 20:12 ` [PATCH] [9/16] HWPOISON: Use bitmask/action code for try_to_unmap behaviour Andi Kleen
2009-05-28 7:27 ` Nick Piggin
2009-05-28 8:03 ` Andi Kleen
2009-05-28 8:28 ` Nick Piggin
2009-05-28 9:02 ` Andi Kleen
2009-05-28 12:26 ` Nick Piggin
2009-05-27 20:12 ` [PATCH] [10/16] HWPOISON: Handle hardware poisoned pages in try_to_unmap Andi Kleen
2009-05-27 20:12 ` [PATCH] [11/16] HWPOISON: Handle poisoned pages in set_page_dirty() Andi Kleen
2009-05-27 20:12 ` [PATCH] [12/16] HWPOISON: check and isolate corrupted free pages Andi Kleen
2009-05-27 20:12 ` [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3 Andi Kleen
2009-05-28 8:26 ` Nick Piggin
2009-05-28 9:31 ` Andi Kleen
2009-05-28 12:08 ` Nick Piggin
2009-05-28 13:45 ` Andi Kleen
2009-05-28 14:50 ` Wu Fengguang
2009-06-04 6:25 ` Nai Xia
2009-06-07 16:02 ` Wu Fengguang
2009-06-08 11:06 ` Nai Xia
2009-06-08 12:31 ` Wu Fengguang
2009-06-08 14:46 ` Nai Xia
2009-06-09 6:48 ` Wu Fengguang
2009-06-09 10:48 ` Nick Piggin
2009-06-09 12:15 ` Wu Fengguang
2009-06-09 12:17 ` Nick Piggin
2009-06-09 12:47 ` Wu Fengguang
2009-06-09 13:36 ` Nai Xia
2009-05-28 16:56 ` Russ Anderson
2009-05-30 6:42 ` Andi Kleen
2009-06-01 11:39 ` Nick Piggin
2009-06-01 18:19 ` Andi Kleen
2009-06-01 12:05 ` Nick Piggin
2009-06-01 18:51 ` Andi Kleen
2009-06-02 12:10 ` Nick Piggin
2009-06-02 12:34 ` Andi Kleen
2009-06-02 12:37 ` Nick Piggin
2009-06-02 12:55 ` Andi Kleen
2009-06-02 13:03 ` Nick Piggin
2009-06-02 13:20 ` Andi Kleen
2009-06-02 13:19 ` Nick Piggin
2009-06-02 13:46 ` Andi Kleen
2009-06-02 13:47 ` Nick Piggin
2009-06-02 14:05 ` Andi Kleen
2009-06-02 13:30 ` Wu Fengguang [this message]
2009-06-02 14:07 ` Nick Piggin
2009-05-28 9:59 ` Wu Fengguang
2009-05-28 10:11 ` Andi Kleen
2009-05-28 10:33 ` Wu Fengguang
2009-05-28 10:51 ` Andi Kleen
2009-05-28 11:03 ` Wu Fengguang
2009-05-28 12:15 ` Nick Piggin
2009-05-28 13:48 ` Andi Kleen
2009-05-28 12:23 ` Nick Piggin
2009-05-28 13:54 ` Wu Fengguang
2009-06-01 11:50 ` Nick Piggin
2009-06-01 14:05 ` Wu Fengguang
2009-06-01 14:40 ` Nick Piggin
2009-06-02 11:14 ` Wu Fengguang
2009-06-02 12:19 ` Nick Piggin
2009-06-02 12:51 ` Wu Fengguang
2009-06-02 14:33 ` Nick Piggin
2009-06-03 10:21 ` Jens Axboe
2009-06-01 21:11 ` Hugh Dickins
2009-06-01 21:41 ` Andi Kleen
2009-06-01 18:32 ` Andi Kleen
2009-06-02 12:00 ` Nick Piggin
2009-06-02 12:47 ` Andi Kleen
2009-06-02 12:57 ` Nick Piggin
2009-06-02 13:25 ` Andi Kleen
2009-06-02 13:24 ` Nick Piggin
2009-06-02 13:41 ` Andi Kleen
2009-06-02 13:40 ` Nick Piggin
2009-06-02 13:53 ` Wu Fengguang
2009-06-02 14:06 ` Andi Kleen
2009-06-02 14:12 ` Wu Fengguang
2009-06-02 14:21 ` Nick Piggin
2009-06-02 13:46 ` Wu Fengguang
2009-06-02 14:08 ` Andi Kleen
2009-06-02 14:10 ` Wu Fengguang
2009-06-02 14:14 ` Nick Piggin
2009-06-02 15:17 ` Nick Piggin
2009-06-02 17:27 ` Andi Kleen
2009-06-03 9:35 ` Nick Piggin
2009-06-03 11:24 ` Andi Kleen
2009-06-02 13:02 ` Wu Fengguang
2009-06-02 15:09 ` Nick Piggin
2009-06-02 17:19 ` Andi Kleen
2009-06-03 6:24 ` Nick Piggin
2009-06-03 15:51 ` Wu Fengguang
2009-06-03 16:05 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [14/16] HWPOISON: FOR TESTING: Enable memory failure code unconditionally Andi Kleen
2009-05-27 20:12 ` [PATCH] [15/16] HWPOISON: Add madvise() based injector for hardware poisoned pages v3 Andi Kleen
2009-05-27 20:12 ` [PATCH] [16/16] HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090602133019.GC20462@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=chris.mason@oracle.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).