From: Wu Fengguang <fengguang.wu@intel.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>,
"hugh@veritas.com" <hugh@veritas.com>,
"riel@redhat.com" <riel@redhat.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"chris.mason@oracle.com" <chris.mason@oracle.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3
Date: Tue, 2 Jun 2009 20:51:34 +0800 [thread overview]
Message-ID: <20090602125134.GA20462@localhost> (raw)
In-Reply-To: <20090602121940.GD1392@wotan.suse.de>
On Tue, Jun 02, 2009 at 08:19:40PM +0800, Nick Piggin wrote:
> On Tue, Jun 02, 2009 at 07:14:07PM +0800, Wu Fengguang wrote:
> > On Mon, Jun 01, 2009 at 10:40:51PM +0800, Nick Piggin wrote:
> > > But you just said that you try to intercept the IO. So the underlying
> > > data is not necessarily corrupt. And even if it was then what if it
> > > was reinitialized to something else in the meantime (such as filesystem
> > > metadata blocks?) You'd just be introducing worse possibilities for
> > > coruption.
> >
> > The IO interception will be based on PFN instead of file offset, so it
> > won't affect innocent pages such as your example of reinitialized data.
>
> OK, if you could intercept the IO so it never happens at all, yes
> of course that could work.
>
> > poisoned dirty page == corrupt data => process shall be killed
> > poisoned clean page == recoverable data => process shall survive
> >
> > In the case of dirty hwpoison page, if we reload the on disk old data
> > and let application proceed with it, it may lead to *silent* data
> > corruption/inconsistency, because the application will first see v2
> > then v1, which is illogical and hence may mess up its internal data
> > structure.
>
> Right, but how do you prevent that? There is no way to reconstruct the
> most updtodate data because it was destroyed.
To kill the application ruthlessly, rather than allow it go rotten quietly.
> > > You will need to demonstrate a *big* advantage before doing crazy things
> > > with writeback ;)
> >
> > OK. We can do two things about poisoned writeback pages:
> >
> > 1) to stop IO for them, thus avoid corrupted data to hit disk and/or
> > trigger further machine checks
>
> 1b) At which point, you invoke the end-io handlers, and the page is
> no longer writeback.
>
> > 2) to isolate them from page cache, thus preventing possible
> > references in the writeback time window
>
> And then this is possible because you aren't violating mm
> assumptions due to 1b. This proceeds just as the existing
> pagecache mce error handler case which exists now.
Yeah that's a good scheme - we are talking about two interception
scheme. Mine is passive one and yours is active one.
passive: check hwpoison pages at __generic_make_request()/elv_next_request()
(the code will be enabled by an mce_bad_io_pages counter)
active: iterate all queued requests for hwpoison pages
Each has its merits and complexities.
I'll list the merits(+) and complexities(-) of the passive approach,
with them you automatically get the merits of the active one:
+ works on generic code and don't have to touch all deadline/as/cfq elevators
- the wait_on_page_writeback() puzzle because of the writeback time window
+ could also intercept the "cannot de-dirty for now" pages when they
eventually go to writeback IO
- have to avoid filesystem references on PG_hwpoison pages, eg.
- zeroing partial EOF page when i_size is not page aligned
- calculating checksums
> > > > Now it's obvious that reusing more code than truncate_complete_page()
> > > > is not easy (or natural).
> > >
> > > Just lock the page and wait for writeback, then do the truncate
> > > work in another function. In your case if you've already unmapped
> > > the page then it won't try to unmap again so no problem.
> > >
> > > Truncating from pagecache does not change ->index so you can
> > > move the loop logic out.
> >
> > Right. So effectively the reusable function is exactly
> > truncate_complete_page(). As I said this reuse is not a big gain.
>
> Anyway, we don't have to argue about it. I already send a patch
> because it was so hard to do, so let's move past this ;)
>
>
> > > > Yes it's kind of insane. I'm interested in reasoning it out though.
>
> Well with the IO interception (I missed this point), then it seems
> maybe no longer so insane. We could see how it looks.
OK.
Thanks,
Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>,
"hugh@veritas.com" <hugh@veritas.com>,
"riel@redhat.com" <riel@redhat.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"chris.mason@oracle.com" <chris.mason@oracle.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3
Date: Tue, 2 Jun 2009 20:51:34 +0800 [thread overview]
Message-ID: <20090602125134.GA20462@localhost> (raw)
In-Reply-To: <20090602121940.GD1392@wotan.suse.de>
On Tue, Jun 02, 2009 at 08:19:40PM +0800, Nick Piggin wrote:
> On Tue, Jun 02, 2009 at 07:14:07PM +0800, Wu Fengguang wrote:
> > On Mon, Jun 01, 2009 at 10:40:51PM +0800, Nick Piggin wrote:
> > > But you just said that you try to intercept the IO. So the underlying
> > > data is not necessarily corrupt. And even if it was then what if it
> > > was reinitialized to something else in the meantime (such as filesystem
> > > metadata blocks?) You'd just be introducing worse possibilities for
> > > coruption.
> >
> > The IO interception will be based on PFN instead of file offset, so it
> > won't affect innocent pages such as your example of reinitialized data.
>
> OK, if you could intercept the IO so it never happens at all, yes
> of course that could work.
>
> > poisoned dirty page == corrupt data => process shall be killed
> > poisoned clean page == recoverable data => process shall survive
> >
> > In the case of dirty hwpoison page, if we reload the on disk old data
> > and let application proceed with it, it may lead to *silent* data
> > corruption/inconsistency, because the application will first see v2
> > then v1, which is illogical and hence may mess up its internal data
> > structure.
>
> Right, but how do you prevent that? There is no way to reconstruct the
> most updtodate data because it was destroyed.
To kill the application ruthlessly, rather than allow it go rotten quietly.
> > > You will need to demonstrate a *big* advantage before doing crazy things
> > > with writeback ;)
> >
> > OK. We can do two things about poisoned writeback pages:
> >
> > 1) to stop IO for them, thus avoid corrupted data to hit disk and/or
> > trigger further machine checks
>
> 1b) At which point, you invoke the end-io handlers, and the page is
> no longer writeback.
>
> > 2) to isolate them from page cache, thus preventing possible
> > references in the writeback time window
>
> And then this is possible because you aren't violating mm
> assumptions due to 1b. This proceeds just as the existing
> pagecache mce error handler case which exists now.
Yeah that's a good scheme - we are talking about two interception
scheme. Mine is passive one and yours is active one.
passive: check hwpoison pages at __generic_make_request()/elv_next_request()
(the code will be enabled by an mce_bad_io_pages counter)
active: iterate all queued requests for hwpoison pages
Each has its merits and complexities.
I'll list the merits(+) and complexities(-) of the passive approach,
with them you automatically get the merits of the active one:
+ works on generic code and don't have to touch all deadline/as/cfq elevators
- the wait_on_page_writeback() puzzle because of the writeback time window
+ could also intercept the "cannot de-dirty for now" pages when they
eventually go to writeback IO
- have to avoid filesystem references on PG_hwpoison pages, eg.
- zeroing partial EOF page when i_size is not page aligned
- calculating checksums
> > > > Now it's obvious that reusing more code than truncate_complete_page()
> > > > is not easy (or natural).
> > >
> > > Just lock the page and wait for writeback, then do the truncate
> > > work in another function. In your case if you've already unmapped
> > > the page then it won't try to unmap again so no problem.
> > >
> > > Truncating from pagecache does not change ->index so you can
> > > move the loop logic out.
> >
> > Right. So effectively the reusable function is exactly
> > truncate_complete_page(). As I said this reuse is not a big gain.
>
> Anyway, we don't have to argue about it. I already send a patch
> because it was so hard to do, so let's move past this ;)
>
>
> > > > Yes it's kind of insane. I'm interested in reasoning it out though.
>
> Well with the IO interception (I missed this point), then it seems
> maybe no longer so insane. We could see how it looks.
OK.
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-06-02 13:54 UTC|newest]
Thread overview: 232+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-27 20:12 [PATCH] [0/16] HWPOISON: Intro Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [1/16] HWPOISON: Add page flag for poisoned pages Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-27 20:35 ` Larry H.
2009-05-27 20:35 ` Larry H.
2009-05-27 21:15 ` Alan Cox
2009-05-27 21:15 ` Alan Cox
2009-05-28 7:54 ` Andi Kleen
2009-05-28 7:54 ` Andi Kleen
2009-05-29 16:10 ` Rik van Riel
2009-05-29 16:10 ` Rik van Riel
2009-05-29 16:37 ` Andi Kleen
2009-05-29 16:37 ` Andi Kleen
2009-05-29 16:34 ` Rik van Riel
2009-05-29 16:34 ` Rik van Riel
2009-05-29 18:24 ` Andi Kleen
2009-05-29 18:24 ` Andi Kleen
2009-05-29 18:26 ` Rik van Riel
2009-05-29 18:26 ` Rik van Riel
2009-05-29 18:42 ` Andi Kleen
2009-05-29 18:42 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [2/16] HWPOISON: Export poison flag in /proc/kpageflags Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-29 16:37 ` Rik van Riel
2009-05-29 16:37 ` Rik van Riel
2009-05-27 20:12 ` [PATCH] [3/16] HWPOISON: Export some rmap vma locking to outside world Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [4/16] HWPOISON: Add support for poison swap entries v2 Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-28 8:46 ` Hidehiro Kawai
2009-05-28 8:46 ` Hidehiro Kawai
2009-05-28 9:11 ` Wu Fengguang
2009-05-28 9:11 ` Wu Fengguang
2009-05-28 10:42 ` Andi Kleen
2009-05-28 10:42 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [5/16] HWPOISON: Add new SIGBUS error codes for hardware poison signals Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [6/16] HWPOISON: Add basic support for poisoned pages in fault handler v2 Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-29 4:15 ` Hidehiro Kawai
2009-05-29 4:15 ` Hidehiro Kawai
2009-05-29 6:28 ` Andi Kleen
2009-05-29 6:28 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [7/16] HWPOISON: Add various poison checks in mm/memory.c Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [8/16] HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [9/16] HWPOISON: Use bitmask/action code for try_to_unmap behaviour Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-28 7:27 ` Nick Piggin
2009-05-28 7:27 ` Nick Piggin
2009-05-28 8:03 ` Andi Kleen
2009-05-28 8:03 ` Andi Kleen
2009-05-28 8:28 ` Nick Piggin
2009-05-28 8:28 ` Nick Piggin
2009-05-28 9:02 ` Andi Kleen
2009-05-28 9:02 ` Andi Kleen
2009-05-28 12:26 ` Nick Piggin
2009-05-28 12:26 ` Nick Piggin
2009-05-27 20:12 ` [PATCH] [10/16] HWPOISON: Handle hardware poisoned pages in try_to_unmap Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [11/16] HWPOISON: Handle poisoned pages in set_page_dirty() Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [12/16] HWPOISON: check and isolate corrupted free pages Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3 Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-28 8:26 ` Nick Piggin
2009-05-28 8:26 ` Nick Piggin
2009-05-28 9:31 ` Andi Kleen
2009-05-28 9:31 ` Andi Kleen
2009-05-28 12:08 ` Nick Piggin
2009-05-28 12:08 ` Nick Piggin
2009-05-28 13:45 ` Andi Kleen
2009-05-28 13:45 ` Andi Kleen
2009-05-28 14:50 ` Wu Fengguang
2009-05-28 14:50 ` Wu Fengguang
2009-06-04 6:25 ` Nai Xia
2009-06-04 6:25 ` Nai Xia
2009-06-07 16:02 ` Wu Fengguang
2009-06-07 16:02 ` Wu Fengguang
2009-06-08 11:06 ` Nai Xia
2009-06-08 11:06 ` Nai Xia
2009-06-08 12:31 ` Wu Fengguang
2009-06-08 12:31 ` Wu Fengguang
2009-06-08 14:46 ` Nai Xia
2009-06-08 14:46 ` Nai Xia
2009-06-09 6:48 ` Wu Fengguang
2009-06-09 6:48 ` Wu Fengguang
2009-06-09 10:48 ` Nick Piggin
2009-06-09 10:48 ` Nick Piggin
2009-06-09 12:15 ` Wu Fengguang
2009-06-09 12:15 ` Wu Fengguang
2009-06-09 12:17 ` Nick Piggin
2009-06-09 12:17 ` Nick Piggin
2009-06-09 12:47 ` Wu Fengguang
2009-06-09 12:47 ` Wu Fengguang
2009-06-09 13:36 ` Nai Xia
2009-06-09 13:36 ` Nai Xia
2009-05-28 16:56 ` Russ Anderson
2009-05-28 16:56 ` Russ Anderson
2009-05-30 6:42 ` Andi Kleen
2009-05-30 6:42 ` Andi Kleen
2009-06-01 11:39 ` Nick Piggin
2009-06-01 11:39 ` Nick Piggin
2009-06-01 18:19 ` Andi Kleen
2009-06-01 18:19 ` Andi Kleen
2009-06-01 12:05 ` Nick Piggin
2009-06-01 12:05 ` Nick Piggin
2009-06-01 18:51 ` Andi Kleen
2009-06-01 18:51 ` Andi Kleen
2009-06-02 12:10 ` Nick Piggin
2009-06-02 12:10 ` Nick Piggin
2009-06-02 12:34 ` Andi Kleen
2009-06-02 12:34 ` Andi Kleen
2009-06-02 12:37 ` Nick Piggin
2009-06-02 12:37 ` Nick Piggin
2009-06-02 12:55 ` Andi Kleen
2009-06-02 12:55 ` Andi Kleen
2009-06-02 13:03 ` Nick Piggin
2009-06-02 13:03 ` Nick Piggin
2009-06-02 13:20 ` Andi Kleen
2009-06-02 13:20 ` Andi Kleen
2009-06-02 13:19 ` Nick Piggin
2009-06-02 13:19 ` Nick Piggin
2009-06-02 13:46 ` Andi Kleen
2009-06-02 13:46 ` Andi Kleen
2009-06-02 13:47 ` Nick Piggin
2009-06-02 13:47 ` Nick Piggin
2009-06-02 14:05 ` Andi Kleen
2009-06-02 14:05 ` Andi Kleen
2009-06-02 13:30 ` Wu Fengguang
2009-06-02 13:30 ` Wu Fengguang
2009-06-02 14:07 ` Nick Piggin
2009-06-02 14:07 ` Nick Piggin
2009-05-28 9:59 ` Wu Fengguang
2009-05-28 9:59 ` Wu Fengguang
2009-05-28 10:11 ` Andi Kleen
2009-05-28 10:11 ` Andi Kleen
2009-05-28 10:33 ` Wu Fengguang
2009-05-28 10:33 ` Wu Fengguang
2009-05-28 10:51 ` Andi Kleen
2009-05-28 10:51 ` Andi Kleen
2009-05-28 11:03 ` Wu Fengguang
2009-05-28 11:03 ` Wu Fengguang
2009-05-28 12:15 ` Nick Piggin
2009-05-28 12:15 ` Nick Piggin
2009-05-28 13:48 ` Andi Kleen
2009-05-28 13:48 ` Andi Kleen
2009-05-28 12:23 ` Nick Piggin
2009-05-28 12:23 ` Nick Piggin
2009-05-28 13:54 ` Wu Fengguang
2009-05-28 13:54 ` Wu Fengguang
2009-06-01 11:50 ` Nick Piggin
2009-06-01 11:50 ` Nick Piggin
2009-06-01 14:05 ` Wu Fengguang
2009-06-01 14:05 ` Wu Fengguang
2009-06-01 14:40 ` Nick Piggin
2009-06-01 14:40 ` Nick Piggin
2009-06-02 11:14 ` Wu Fengguang
2009-06-02 11:14 ` Wu Fengguang
2009-06-02 12:19 ` Nick Piggin
2009-06-02 12:19 ` Nick Piggin
2009-06-02 12:51 ` Wu Fengguang [this message]
2009-06-02 12:51 ` Wu Fengguang
2009-06-02 14:33 ` Nick Piggin
2009-06-02 14:33 ` Nick Piggin
2009-06-03 10:21 ` Jens Axboe
2009-06-03 10:21 ` Jens Axboe
2009-06-01 21:11 ` Hugh Dickins
2009-06-01 21:11 ` Hugh Dickins
2009-06-01 21:41 ` Andi Kleen
2009-06-01 21:41 ` Andi Kleen
2009-06-01 18:32 ` Andi Kleen
2009-06-01 18:32 ` Andi Kleen
2009-06-02 12:00 ` Nick Piggin
2009-06-02 12:00 ` Nick Piggin
2009-06-02 12:47 ` Andi Kleen
2009-06-02 12:47 ` Andi Kleen
2009-06-02 12:57 ` Nick Piggin
2009-06-02 12:57 ` Nick Piggin
2009-06-02 13:25 ` Andi Kleen
2009-06-02 13:25 ` Andi Kleen
2009-06-02 13:24 ` Nick Piggin
2009-06-02 13:24 ` Nick Piggin
2009-06-02 13:41 ` Andi Kleen
2009-06-02 13:41 ` Andi Kleen
2009-06-02 13:40 ` Nick Piggin
2009-06-02 13:40 ` Nick Piggin
2009-06-02 13:53 ` Wu Fengguang
2009-06-02 13:53 ` Wu Fengguang
2009-06-02 14:06 ` Andi Kleen
2009-06-02 14:06 ` Andi Kleen
2009-06-02 14:12 ` Wu Fengguang
2009-06-02 14:12 ` Wu Fengguang
2009-06-02 14:21 ` Nick Piggin
2009-06-02 14:21 ` Nick Piggin
2009-06-02 13:46 ` Wu Fengguang
2009-06-02 13:46 ` Wu Fengguang
2009-06-02 14:08 ` Andi Kleen
2009-06-02 14:08 ` Andi Kleen
2009-06-02 14:10 ` Wu Fengguang
2009-06-02 14:10 ` Wu Fengguang
2009-06-02 14:14 ` Nick Piggin
2009-06-02 14:14 ` Nick Piggin
2009-06-02 15:17 ` Nick Piggin
2009-06-02 15:17 ` Nick Piggin
2009-06-02 17:27 ` Andi Kleen
2009-06-02 17:27 ` Andi Kleen
2009-06-03 9:35 ` Nick Piggin
2009-06-03 9:35 ` Nick Piggin
2009-06-03 11:24 ` Andi Kleen
2009-06-03 11:24 ` Andi Kleen
2009-06-02 13:02 ` Wu Fengguang
2009-06-02 13:02 ` Wu Fengguang
2009-06-02 15:09 ` Nick Piggin
2009-06-02 15:09 ` Nick Piggin
2009-06-02 17:19 ` Andi Kleen
2009-06-02 17:19 ` Andi Kleen
2009-06-03 6:24 ` Nick Piggin
2009-06-03 6:24 ` Nick Piggin
2009-06-03 15:51 ` Wu Fengguang
2009-06-03 15:51 ` Wu Fengguang
2009-06-03 16:05 ` Andi Kleen
2009-06-03 16:05 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [14/16] HWPOISON: FOR TESTING: Enable memory failure code unconditionally Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [15/16] HWPOISON: Add madvise() based injector for hardware poisoned pages v3 Andi Kleen
2009-05-27 20:12 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [16/16] HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs Andi Kleen
2009-05-27 20:12 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090602125134.GA20462@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=chris.mason@oracle.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.