From: Wu Fengguang <fengguang.wu@intel.com>
To: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>, "tytso@mit.edu" <tytso@mit.edu>,
"hch@infradead.org" <hch@infradead.org>,
"mfasheh@suse.com" <mfasheh@suse.com>,
"aia21@cantab.net" <aia21@cantab.net>,
"hugh.dickins@tiscali.co.uk" <hugh.dickins@tiscali.co.uk>,
"swhiteho@redhat.com" <swhiteho@redhat.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"npiggin@suse.de" <npiggin@suse.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] [16/19] HWPOISON: Enable .remove_error_page for migration aware file systems
Date: Mon, 10 Aug 2009 15:07:45 +0800 [thread overview]
Message-ID: <20090810070745.GA26533@localhost> (raw)
In-Reply-To: <4A7FBFD1.2010208@hitachi.com>
Hi Hidehiro,
On Mon, Aug 10, 2009 at 02:36:01PM +0800, Hidehiro Kawai wrote:
> Hi,
>
> Andi Kleen wrote:
>
> > Index: linux/fs/ext3/inode.c
> > ===================================================================
> > --- linux.orig/fs/ext3/inode.c
> > +++ linux/fs/ext3/inode.c
> > @@ -1819,6 +1819,7 @@ static const struct address_space_operat
> > .direct_IO = ext3_direct_IO,
> > .migratepage = buffer_migrate_page,
> > .is_partially_uptodate = block_is_partially_uptodate,
> > + .error_remove_page = generic_error_remove_page,
> > };
>
> (I'm sorry if I'm missing the point.)
>
> If my understanding is correct, the following scenario can happen:
>
> 1. An uncorrected error on a dirty page cache page is detected by
> memory scrubbing
> 2. Kernel unmaps and truncates the page to recover from the error
> 3. An application reads data from the file location corresponding
> to the truncated page
> ==> Old or garbage data will be read into a new page cache page
> 4. The application modifies the data and write back it to the disk
> 5. The file will corrurpt!
>
> (Yes, the application is wrong to not do the right thing, i.e. fsync,
> but it's not user's fault!)
Right. Note that the data has already been corrupted and the above
scenario can be called as re-corruption. We set AS_EIO to trigger some
IO reporting mechanism so that it won't corrupt *silently*.
> A similar data corruption can be caused by a write I/O error,
> because dirty flag is cleared even if the page couldn't be written
> to the disk.
Yes.
> However, we have a way to avoid this kind of data corruption at
> least for ext3. If we mount an ext3 filesystem with data=ordered
> and data_err=abort, all I/O errors on file data block belonging to
> the committing transaction are checked. When I/O error is found,
> abort journaling and remount the filesystem with read-only to
> prevent further updates. This kind of feature is very important
> for mission critical systems.
Agreed. We also set PG_error, which should be enough to trigger such
remount?
> If we merge this patch, we would face the data corruption problem
> again.
>
> I think there are three options,
>
> (1) drop this patch
> (2) merge this patch with new panic_on_dirty_page_cache_corruption
> sysctl
> (3) implement a more sophisticated error_remove_page function
In fact we proposed a patch for preventing the re-corruption case, see
http://lkml.org/lkml/2009/6/11/294
However it is hard to answer the (policy) question "How sticky should
the EIO bit remain?".
> > static const struct address_space_operations ext3_writeback_aops = {
> > @@ -1834,6 +1835,7 @@ static const struct address_space_operat
> > .direct_IO = ext3_direct_IO,
> > .migratepage = buffer_migrate_page,
> > .is_partially_uptodate = block_is_partially_uptodate,
> > + .error_remove_page = generic_error_remove_page,
> > };
>
> The writeback case would be OK. It's not much different from the I/O
> error case.
>
> > static const struct address_space_operations ext3_journalled_aops = {
> > @@ -1848,6 +1850,7 @@ static const struct address_space_operat
> > .invalidatepage = ext3_invalidatepage,
> > .releasepage = ext3_releasepage,
> > .is_partially_uptodate = block_is_partially_uptodate,
> > + .error_remove_page = generic_error_remove_page,
> > };
> >
> > void ext3_set_aops(struct inode *inode)
>
> I'm not sure about the journalled case. I'm going to take a look at
> it later.
Thanks,
Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>, "tytso@mit.edu" <tytso@mit.edu>,
"hch@infradead.org" <hch@infradead.org>,
"mfasheh@suse.com" <mfasheh@suse.com>,
"aia21@cantab.net" <aia21@cantab.net>,
"hugh.dickins@tiscali.co.uk" <hugh.dickins@tiscali.co.uk>,
"swhiteho@redhat.com" <swhiteho@redhat.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"npiggin@suse.de" <npiggin@suse.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] [16/19] HWPOISON: Enable .remove_error_page for migration aware file systems
Date: Mon, 10 Aug 2009 15:07:45 +0800 [thread overview]
Message-ID: <20090810070745.GA26533@localhost> (raw)
In-Reply-To: <4A7FBFD1.2010208@hitachi.com>
Hi Hidehiro,
On Mon, Aug 10, 2009 at 02:36:01PM +0800, Hidehiro Kawai wrote:
> Hi,
>
> Andi Kleen wrote:
>
> > Index: linux/fs/ext3/inode.c
> > ===================================================================
> > --- linux.orig/fs/ext3/inode.c
> > +++ linux/fs/ext3/inode.c
> > @@ -1819,6 +1819,7 @@ static const struct address_space_operat
> > .direct_IO = ext3_direct_IO,
> > .migratepage = buffer_migrate_page,
> > .is_partially_uptodate = block_is_partially_uptodate,
> > + .error_remove_page = generic_error_remove_page,
> > };
>
> (I'm sorry if I'm missing the point.)
>
> If my understanding is correct, the following scenario can happen:
>
> 1. An uncorrected error on a dirty page cache page is detected by
> memory scrubbing
> 2. Kernel unmaps and truncates the page to recover from the error
> 3. An application reads data from the file location corresponding
> to the truncated page
> ==> Old or garbage data will be read into a new page cache page
> 4. The application modifies the data and write back it to the disk
> 5. The file will corrurpt!
>
> (Yes, the application is wrong to not do the right thing, i.e. fsync,
> but it's not user's fault!)
Right. Note that the data has already been corrupted and the above
scenario can be called as re-corruption. We set AS_EIO to trigger some
IO reporting mechanism so that it won't corrupt *silently*.
> A similar data corruption can be caused by a write I/O error,
> because dirty flag is cleared even if the page couldn't be written
> to the disk.
Yes.
> However, we have a way to avoid this kind of data corruption at
> least for ext3. If we mount an ext3 filesystem with data=ordered
> and data_err=abort, all I/O errors on file data block belonging to
> the committing transaction are checked. When I/O error is found,
> abort journaling and remount the filesystem with read-only to
> prevent further updates. This kind of feature is very important
> for mission critical systems.
Agreed. We also set PG_error, which should be enough to trigger such
remount?
> If we merge this patch, we would face the data corruption problem
> again.
>
> I think there are three options,
>
> (1) drop this patch
> (2) merge this patch with new panic_on_dirty_page_cache_corruption
> sysctl
> (3) implement a more sophisticated error_remove_page function
In fact we proposed a patch for preventing the re-corruption case, see
http://lkml.org/lkml/2009/6/11/294
However it is hard to answer the (policy) question "How sticky should
the EIO bit remain?".
> > static const struct address_space_operations ext3_writeback_aops = {
> > @@ -1834,6 +1835,7 @@ static const struct address_space_operat
> > .direct_IO = ext3_direct_IO,
> > .migratepage = buffer_migrate_page,
> > .is_partially_uptodate = block_is_partially_uptodate,
> > + .error_remove_page = generic_error_remove_page,
> > };
>
> The writeback case would be OK. It's not much different from the I/O
> error case.
>
> > static const struct address_space_operations ext3_journalled_aops = {
> > @@ -1848,6 +1850,7 @@ static const struct address_space_operat
> > .invalidatepage = ext3_invalidatepage,
> > .releasepage = ext3_releasepage,
> > .is_partially_uptodate = block_is_partially_uptodate,
> > + .error_remove_page = generic_error_remove_page,
> > };
> >
> > void ext3_set_aops(struct inode *inode)
>
> I'm not sure about the journalled case. I'm going to take a look at
> it later.
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-08-10 7:07 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-05 9:36 [PATCH] [0/19] HWPOISON: Intro Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [1/19] HWPOISON: Add page flag for poisoned pages Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [2/19] HWPOISON: Export some rmap vma locking to outside world Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [3/19] HWPOISON: Add support for poison swap entries v2 Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [4/19] HWPOISON: Add new SIGBUS error codes for hardware poison signals Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [5/19] HWPOISON: Add basic support for poisoned pages in fault handler v3 Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [6/19] HWPOISON: Add various poison checks in mm/memory.c v2 Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [7/19] HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [8/19] HWPOISON: Use bitmask/action code for try_to_unmap behaviour Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [9/19] HWPOISON: Handle hardware poisoned pages in try_to_unmap Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [10/19] HWPOISON: check and isolate corrupted free pages v2 Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [11/19] HWPOISON: Refactor truncate to allow direct truncating of page v2 Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 10:20 ` Nick Piggin
2009-08-05 10:20 ` Nick Piggin
2009-08-05 12:37 ` Wu Fengguang
2009-08-05 12:37 ` Wu Fengguang
2009-08-05 13:46 ` Andi Kleen
2009-08-05 13:46 ` Andi Kleen
2009-08-05 14:01 ` Nick Piggin
2009-08-05 14:01 ` Nick Piggin
2009-08-05 14:10 ` Andi Kleen
2009-08-05 14:10 ` Andi Kleen
2009-08-05 14:16 ` Nick Piggin
2009-08-05 14:16 ` Nick Piggin
2009-08-05 14:41 ` Andi Kleen
2009-08-05 14:41 ` Andi Kleen
2009-08-05 14:44 ` Nick Piggin
2009-08-05 14:44 ` Nick Piggin
2009-08-05 15:00 ` Matthew Wilcox
2009-08-05 15:00 ` Matthew Wilcox
2009-08-06 11:48 ` Martin Schwidefsky
2009-08-06 11:48 ` Martin Schwidefsky
2009-08-06 12:04 ` Andi Kleen
2009-08-06 12:04 ` Andi Kleen
2009-08-05 15:12 ` Wu Fengguang
2009-08-05 15:12 ` Wu Fengguang
2009-08-05 9:36 ` [PATCH] [12/19] HWPOISON: Add invalidate_inode_page Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [13/19] HWPOISON: Define a new error_remove_page address space op for async truncation Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [14/19] HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [15/19] HWPOISON: The high level memory error handler in the VM v7 Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [16/19] HWPOISON: Enable .remove_error_page for migration aware file systems Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 11:12 ` Christoph Hellwig
2009-08-05 11:12 ` Christoph Hellwig
2009-08-05 11:52 ` Wu Fengguang
2009-08-05 11:52 ` Wu Fengguang
2009-08-05 13:50 ` Andi Kleen
2009-08-05 13:50 ` Andi Kleen
2009-08-10 6:36 ` Hidehiro Kawai
2009-08-10 6:36 ` Hidehiro Kawai
2009-08-10 7:07 ` Wu Fengguang [this message]
2009-08-10 7:07 ` Wu Fengguang
2009-08-11 3:48 ` Hidehiro Kawai
2009-08-11 3:48 ` Hidehiro Kawai
2009-08-11 6:59 ` Andi Kleen
2009-08-11 6:59 ` Andi Kleen
2009-08-11 12:38 ` Wu Fengguang
2009-08-11 12:38 ` Wu Fengguang
2009-08-10 7:44 ` Andi Kleen
2009-08-10 7:44 ` Andi Kleen
2009-08-11 3:50 ` Hidehiro Kawai
2009-08-11 3:50 ` Hidehiro Kawai
2009-08-11 7:17 ` Andi Kleen
2009-08-11 7:17 ` Andi Kleen
2009-08-12 2:49 ` Hidehiro Kawai
2009-08-12 2:49 ` Hidehiro Kawai
2009-08-12 7:46 ` Andi Kleen
2009-08-12 7:46 ` Andi Kleen
2009-08-12 9:52 ` Hidehiro Kawai
2009-08-12 9:52 ` Hidehiro Kawai
2009-08-12 10:16 ` Andi Kleen
2009-08-12 10:16 ` Andi Kleen
2009-08-12 8:05 ` Nick Piggin
2009-08-12 8:05 ` Nick Piggin
2009-08-12 8:23 ` Andi Kleen
2009-08-12 8:23 ` Andi Kleen
2009-08-12 8:46 ` Nick Piggin
2009-08-12 8:46 ` Nick Piggin
2009-08-12 8:57 ` Andi Kleen
2009-08-12 8:57 ` Andi Kleen
2009-08-12 9:05 ` Nick Piggin
2009-08-12 9:05 ` Nick Piggin
2009-08-12 9:39 ` Wu Fengguang
2009-08-12 9:39 ` Wu Fengguang
2009-08-05 9:36 ` [PATCH] [17/19] HWPOISON: Enable error_remove_page for NFS Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [18/19] HWPOISON: Add madvise() based injector for hardware poisoned pages v3 Andi Kleen
2009-08-05 9:36 ` Andi Kleen
2009-08-05 9:36 ` [PATCH] [19/19] HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs Andi Kleen
2009-08-05 9:36 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090810070745.GA26533@localhost \
--to=fengguang.wu@intel.com \
--cc=aia21@cantab.net \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=hch@infradead.org \
--cc=hidehiro.kawai.ez@hitachi.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mfasheh@suse.com \
--cc=npiggin@suse.de \
--cc=swhiteho@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.