All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: tytso@mit.edu, hch@infradead.org, mfasheh@suse.com,
	aia21@cantab.net, hugh.dickins@tiscali.co.uk,
	swhiteho@redhat.com, akpm@linux-foundation.org, npiggin@suse.de,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	fengguang.wu@intel.com,
	Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>,
	Taketoshi Sakuraba <taketoshi.sakuraba.hc@hitachi.com>
Subject: Re: [PATCH] [16/19] HWPOISON: Enable .remove_error_page for migration aware file systems
Date: Tue, 11 Aug 2009 12:50:59 +0900	[thread overview]
Message-ID: <4A80EAA3.7040107@hitachi.com> (raw)
In-Reply-To: <20090810074421.GA6838@basil.fritz.box>

Andi Kleen wrote:

>>1. An uncorrected error on a dirty page cache page is detected by
>>   memory scrubbing
>>2. Kernel unmaps and truncates the page to recover from the error
>>3. An application reads data from the file location corresponding
>>   to the truncated page
>>   ==> Old or garbage data will be read into a new page cache page
> 
> The problem currently is that the error is not sticky enough and
> doesn't stay around long enough. It gets reported once,
> but not in later IO operations.
> 
> However it's a generic problem not unique to hwpoison. Me 

Yes, it's a generic problem, and introducing a sticky error flag
is one of the approach to solve the problem.  I think it is a good
approach because it doesn't depend on individual filesystems.

> And application
> that doesn't handle current IO errors correctly will also
> not necessarily handle hwpoison correctly (it's not better and not worse)

This is my main concern.  I'd like to prevent re-corruption even if
applications don't have good manners.

As for usual I/O error, ext3/4 can now do it by using data=ordered and
data_err=abort mount options.  Moreover, if you mount the ext3/4
filesystem with the additional errors=panic option, kernel gets
panic on write error instead of read-only remount.  Customers
who regard data integrity is very important require these features.

But this patch (PATCH 16/19) introduce this problem again, because
it doesn't provide a way to shut out further writes to the fs.
Of course, we can do it by setting tolerant level to 0 or
memory_failure_recovery to 0.  But it would be overkill.
That is why I suggested this:
>>(2) merge this patch with new panic_on_dirty_page_cache_corruption
>>    sysctl


> That is something that could be improved in the VFS -- although I fear
> any improvements here could also break compatibility. I don't think
> it's a blocker on hwpoison for now. It needs more design
> effort and thinking (e.g. likely the address space IO error
> bit should be separated into multiple bits)
> 
> Perhaps you're interested in working on this?

Yes.  Transient IO errors have a potential for causing re-corruption
problem.  Now ext3/4 provide ways to prevent it, but not the other
filesystems.  We would need a generic way.
 
>>4. The application modifies the data and write back it to the disk
>>5. The file will corrurpt!
>>
>>(Yes, the application is wrong to not do the right thing, i.e. fsync,
>> but it's not user's fault!)
>>
>>A similar data corruption can be caused by a write I/O error,
>>because dirty flag is cleared even if the page couldn't be written
>>to the disk.
>>
>>However, we have a way to avoid this kind of data corruption at
>>least for ext3.  If we mount an ext3 filesystem with data=ordered
>>and data_err=abort, all I/O errors on file data block belonging to
>>the committing transaction are checked.  When I/O error is found,
>>abort journaling and remount the filesystem with read-only to
>>prevent further updates.  This kind of feature is very important
>>for mission critical systems.
> 
> Well it sounds like a potentially useful enhancement to ext3 (or ext4).
> 
> One issue is that the default is not ordered anymore since
> Linus changed the default.

Yes, but what is important is whether the system provides
such feature or not.

> I'm sure other enhancements for IO errors could be done too.
> Some of the file systems also handle them still quite poorly (e.g. btrfs)
> 
> But again I don't think it's a blocker for hwpoison.

Unfortunately, it can be a blocker.  As I stated, we can block the
possible re-corruption caused by transient IO errors on ext3/4
filesystems.  But applying this patch (PATCH 16/19), re-corruption
can happen even if we use data=ordered, data_err=abort and
errors=panic mount options.

So...

>>I think there are three options,
>>
>>(1) drop this patch
>>(2) merge this patch with new panic_on_dirty_page_cache_corruption
>>    sysctl
>>(3) implement a more sophisticated error_remove_page function
> 
> (4) accept that hwpoison error handling is not better and not worse than normal
> IO error handling.
> 
> We opted for (4).

Could you consider adopting (2) or (3)?  Fengguang's sticky EIO
approach (http://lkml.org/lkml/2009/6/11/294) is also OK.
I hope HWPOISON patches are merged into 2.6.32.  So (2) is the
best answer for me, because it's simple and less intrusive.

Thanks,
-- 
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center


WARNING: multiple messages have this Message-ID (diff)
From: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: tytso@mit.edu, hch@infradead.org, mfasheh@suse.com,
	aia21@cantab.net, hugh.dickins@tiscali.co.uk,
	swhiteho@redhat.com, akpm@linux-foundation.org, npiggin@suse.de,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	fengguang.wu@intel.com,
	Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>,
	Taketoshi Sakuraba <taketoshi.sakuraba.hc@hitachi.com>
Subject: Re: [PATCH] [16/19] HWPOISON: Enable .remove_error_page for migration aware file systems
Date: Tue, 11 Aug 2009 12:50:59 +0900	[thread overview]
Message-ID: <4A80EAA3.7040107@hitachi.com> (raw)
In-Reply-To: <20090810074421.GA6838@basil.fritz.box>

Andi Kleen wrote:

>>1. An uncorrected error on a dirty page cache page is detected by
>>   memory scrubbing
>>2. Kernel unmaps and truncates the page to recover from the error
>>3. An application reads data from the file location corresponding
>>   to the truncated page
>>   ==> Old or garbage data will be read into a new page cache page
> 
> The problem currently is that the error is not sticky enough and
> doesn't stay around long enough. It gets reported once,
> but not in later IO operations.
> 
> However it's a generic problem not unique to hwpoison. Me 

Yes, it's a generic problem, and introducing a sticky error flag
is one of the approach to solve the problem.  I think it is a good
approach because it doesn't depend on individual filesystems.

> And application
> that doesn't handle current IO errors correctly will also
> not necessarily handle hwpoison correctly (it's not better and not worse)

This is my main concern.  I'd like to prevent re-corruption even if
applications don't have good manners.

As for usual I/O error, ext3/4 can now do it by using data=ordered and
data_err=abort mount options.  Moreover, if you mount the ext3/4
filesystem with the additional errors=panic option, kernel gets
panic on write error instead of read-only remount.  Customers
who regard data integrity is very important require these features.

But this patch (PATCH 16/19) introduce this problem again, because
it doesn't provide a way to shut out further writes to the fs.
Of course, we can do it by setting tolerant level to 0 or
memory_failure_recovery to 0.  But it would be overkill.
That is why I suggested this:
>>(2) merge this patch with new panic_on_dirty_page_cache_corruption
>>    sysctl


> That is something that could be improved in the VFS -- although I fear
> any improvements here could also break compatibility. I don't think
> it's a blocker on hwpoison for now. It needs more design
> effort and thinking (e.g. likely the address space IO error
> bit should be separated into multiple bits)
> 
> Perhaps you're interested in working on this?

Yes.  Transient IO errors have a potential for causing re-corruption
problem.  Now ext3/4 provide ways to prevent it, but not the other
filesystems.  We would need a generic way.
 
>>4. The application modifies the data and write back it to the disk
>>5. The file will corrurpt!
>>
>>(Yes, the application is wrong to not do the right thing, i.e. fsync,
>> but it's not user's fault!)
>>
>>A similar data corruption can be caused by a write I/O error,
>>because dirty flag is cleared even if the page couldn't be written
>>to the disk.
>>
>>However, we have a way to avoid this kind of data corruption at
>>least for ext3.  If we mount an ext3 filesystem with data=ordered
>>and data_err=abort, all I/O errors on file data block belonging to
>>the committing transaction are checked.  When I/O error is found,
>>abort journaling and remount the filesystem with read-only to
>>prevent further updates.  This kind of feature is very important
>>for mission critical systems.
> 
> Well it sounds like a potentially useful enhancement to ext3 (or ext4).
> 
> One issue is that the default is not ordered anymore since
> Linus changed the default.

Yes, but what is important is whether the system provides
such feature or not.

> I'm sure other enhancements for IO errors could be done too.
> Some of the file systems also handle them still quite poorly (e.g. btrfs)
> 
> But again I don't think it's a blocker for hwpoison.

Unfortunately, it can be a blocker.  As I stated, we can block the
possible re-corruption caused by transient IO errors on ext3/4
filesystems.  But applying this patch (PATCH 16/19), re-corruption
can happen even if we use data=ordered, data_err=abort and
errors=panic mount options.

So...

>>I think there are three options,
>>
>>(1) drop this patch
>>(2) merge this patch with new panic_on_dirty_page_cache_corruption
>>    sysctl
>>(3) implement a more sophisticated error_remove_page function
> 
> (4) accept that hwpoison error handling is not better and not worse than normal
> IO error handling.
> 
> We opted for (4).

Could you consider adopting (2) or (3)?  Fengguang's sticky EIO
approach (http://lkml.org/lkml/2009/6/11/294) is also OK.
I hope HWPOISON patches are merged into 2.6.32.  So (2) is the
best answer for me, because it's simple and less intrusive.

Thanks,
-- 
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-08-11 12:00 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-05  9:36 [PATCH] [0/19] HWPOISON: Intro Andi Kleen
2009-08-05  9:36 ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [1/19] HWPOISON: Add page flag for poisoned pages Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [2/19] HWPOISON: Export some rmap vma locking to outside world Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [3/19] HWPOISON: Add support for poison swap entries v2 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [4/19] HWPOISON: Add new SIGBUS error codes for hardware poison signals Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [5/19] HWPOISON: Add basic support for poisoned pages in fault handler v3 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [6/19] HWPOISON: Add various poison checks in mm/memory.c v2 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [7/19] HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [8/19] HWPOISON: Use bitmask/action code for try_to_unmap behaviour Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [9/19] HWPOISON: Handle hardware poisoned pages in try_to_unmap Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [10/19] HWPOISON: check and isolate corrupted free pages v2 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [11/19] HWPOISON: Refactor truncate to allow direct truncating of page v2 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05 10:20   ` Nick Piggin
2009-08-05 10:20     ` Nick Piggin
2009-08-05 12:37     ` Wu Fengguang
2009-08-05 12:37       ` Wu Fengguang
2009-08-05 13:46     ` Andi Kleen
2009-08-05 13:46       ` Andi Kleen
2009-08-05 14:01       ` Nick Piggin
2009-08-05 14:01         ` Nick Piggin
2009-08-05 14:10         ` Andi Kleen
2009-08-05 14:10           ` Andi Kleen
2009-08-05 14:16           ` Nick Piggin
2009-08-05 14:16             ` Nick Piggin
2009-08-05 14:41             ` Andi Kleen
2009-08-05 14:41               ` Andi Kleen
2009-08-05 14:44               ` Nick Piggin
2009-08-05 14:44                 ` Nick Piggin
2009-08-05 15:00               ` Matthew Wilcox
2009-08-05 15:00                 ` Matthew Wilcox
2009-08-06 11:48             ` Martin Schwidefsky
2009-08-06 11:48               ` Martin Schwidefsky
2009-08-06 12:04               ` Andi Kleen
2009-08-06 12:04                 ` Andi Kleen
2009-08-05 15:12         ` Wu Fengguang
2009-08-05 15:12           ` Wu Fengguang
2009-08-05  9:36 ` [PATCH] [12/19] HWPOISON: Add invalidate_inode_page Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [13/19] HWPOISON: Define a new error_remove_page address space op for async truncation Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [14/19] HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [15/19] HWPOISON: The high level memory error handler in the VM v7 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [16/19] HWPOISON: Enable .remove_error_page for migration aware file systems Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05 11:12   ` Christoph Hellwig
2009-08-05 11:12     ` Christoph Hellwig
2009-08-05 11:52     ` Wu Fengguang
2009-08-05 11:52       ` Wu Fengguang
2009-08-05 13:50     ` Andi Kleen
2009-08-05 13:50       ` Andi Kleen
2009-08-10  6:36   ` Hidehiro Kawai
2009-08-10  6:36     ` Hidehiro Kawai
2009-08-10  7:07     ` Wu Fengguang
2009-08-10  7:07       ` Wu Fengguang
2009-08-11  3:48       ` Hidehiro Kawai
2009-08-11  3:48         ` Hidehiro Kawai
2009-08-11  6:59         ` Andi Kleen
2009-08-11  6:59           ` Andi Kleen
2009-08-11 12:38         ` Wu Fengguang
2009-08-11 12:38           ` Wu Fengguang
2009-08-10  7:44     ` Andi Kleen
2009-08-10  7:44       ` Andi Kleen
2009-08-11  3:50       ` Hidehiro Kawai [this message]
2009-08-11  3:50         ` Hidehiro Kawai
2009-08-11  7:17         ` Andi Kleen
2009-08-11  7:17           ` Andi Kleen
2009-08-12  2:49           ` Hidehiro Kawai
2009-08-12  2:49             ` Hidehiro Kawai
2009-08-12  7:46             ` Andi Kleen
2009-08-12  7:46               ` Andi Kleen
2009-08-12  9:52               ` Hidehiro Kawai
2009-08-12  9:52                 ` Hidehiro Kawai
2009-08-12 10:16                 ` Andi Kleen
2009-08-12 10:16                   ` Andi Kleen
2009-08-12  8:05           ` Nick Piggin
2009-08-12  8:05             ` Nick Piggin
2009-08-12  8:23             ` Andi Kleen
2009-08-12  8:23               ` Andi Kleen
2009-08-12  8:46               ` Nick Piggin
2009-08-12  8:46                 ` Nick Piggin
2009-08-12  8:57                 ` Andi Kleen
2009-08-12  8:57                   ` Andi Kleen
2009-08-12  9:05                   ` Nick Piggin
2009-08-12  9:05                     ` Nick Piggin
2009-08-12  9:39                     ` Wu Fengguang
2009-08-12  9:39                       ` Wu Fengguang
2009-08-05  9:36 ` [PATCH] [17/19] HWPOISON: Enable error_remove_page for NFS Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [18/19] HWPOISON: Add madvise() based injector for hardware poisoned pages v3 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [19/19] HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs Andi Kleen
2009-08-05  9:36   ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A80EAA3.7040107@hitachi.com \
    --to=hidehiro.kawai.ez@hitachi.com \
    --cc=aia21@cantab.net \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mfasheh@suse.com \
    --cc=npiggin@suse.de \
    --cc=satoshi.oshima.fk@hitachi.com \
    --cc=swhiteho@redhat.com \
    --cc=taketoshi.sakuraba.hc@hitachi.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.