All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	tytso@mit.edu, hch@infradead.org, mfasheh@suse.com,
	aia21@cantab.net, hugh.dickins@tiscali.co.uk,
	swhiteho@redhat.com, akpm@linux-foundation.org, npiggin@suse.de,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	fengguang.wu@intel.com,
	Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>,
	Taketoshi Sakuraba <taketoshi.sakuraba.hc@hitachi.com>
Subject: Re: [PATCH] [16/19] HWPOISON: Enable .remove_error_page for migration aware file systems
Date: Wed, 12 Aug 2009 09:46:11 +0200	[thread overview]
Message-ID: <20090812074611.GC28848@basil.fritz.box> (raw)
In-Reply-To: <4A822DD4.1050202@hitachi.com>

On Wed, Aug 12, 2009 at 11:49:56AM +0900, Hidehiro Kawai wrote:
> > I don't think there's much we can do if the application doesn't
> > check for IO errors properly. What would you do if it doesn't
> > check for IO errors at all? If it checks for IO errors it simply
> > has to check for them on all IO operations -- if they do 
> > they will detect hwpoison errors correctly too.
> 
> I believe it's not uncommon for applications to do buffered write
> and then exit without fsync().  And I think it's difficult to
> preclude such applications and commands from the system perfectly.

That's true, but for anything mission critical you would expect them
to use some transactional mechanism, either with O_SYNC or fsync().
Otherwise they always risk data loss anyways.

> > It's unclear to me this special mode is really desirable.
> > Does it bring enough value to the user to justify the complexity
> > of another exotic option?  The case is relatively exotic,
> > as in dirty write cache that is mapped to a file.
> > 
> > Try to explain it in documentation and you see how ridiculous it sounds; u
> > it simply doesn't have clean semantics
> > 
> > ("In case you have applications with broken error IO handling on
> > your mission critical system ...") 
> 
> Generally, dropping unwritten dirty page caches is considered to be
> risky.  So the "panic on IO error" policy has been used as usual
> practice for some systems.  I just suggested that we adopted
> this policy into machine check errors. 

Hmm, what we could possibly do -- as followon patches -- would be to
let error_remove_page check the per file system panic-on-io-error
super block setting for dirty pages and panic in this case too.  
Unfortunately this setting is currently per file system, not generic,
so it would need to be a fs specific check (or the flag would need
to be moved into a generic fs superblock field first)

I think that would be relatively clean semantics wise. Would you be 
interested in working on patches for that? 

> Another option is to introduce "ignore all" policy instead of
> panicking at the beginig of memory_failure().  Perhaps it finally
> causes SRAR machine check, and then kernel will panic or a process
> will be killed.  Anyway, this is a topic for the next stage.

The problem is memory_failure() would then need to start distingushing
between AR=1 and AR=0 which it doesn't today.

It could be done, but would need some more work. 

> > If you want to have improved IO error handling feel free to
> > submit it separately. I agree this area could use some work.
> > But it probably needs more design work first.
> 
> Well, this patch set itself looks good to me.
> I also looked into the other patches, I couldn't find any
> problems (although I'm not good judge of reviewing).
> 
> Reviewed-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

Thanks for your review and your comments.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

WARNING: multiple messages have this Message-ID (diff)
From: Andi Kleen <andi@firstfloor.org>
To: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	tytso@mit.edu, hch@infradead.org, mfasheh@suse.com,
	aia21@cantab.net, hugh.dickins@tiscali.co.uk,
	swhiteho@redhat.com, akpm@linux-foundation.org, npiggin@suse.de,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	fengguang.wu@intel.com,
	Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>,
	Taketoshi Sakuraba <taketoshi.sakuraba.hc@hitachi.com>
Subject: Re: [PATCH] [16/19] HWPOISON: Enable .remove_error_page for migration aware file systems
Date: Wed, 12 Aug 2009 09:46:11 +0200	[thread overview]
Message-ID: <20090812074611.GC28848@basil.fritz.box> (raw)
In-Reply-To: <4A822DD4.1050202@hitachi.com>

On Wed, Aug 12, 2009 at 11:49:56AM +0900, Hidehiro Kawai wrote:
> > I don't think there's much we can do if the application doesn't
> > check for IO errors properly. What would you do if it doesn't
> > check for IO errors at all? If it checks for IO errors it simply
> > has to check for them on all IO operations -- if they do 
> > they will detect hwpoison errors correctly too.
> 
> I believe it's not uncommon for applications to do buffered write
> and then exit without fsync().  And I think it's difficult to
> preclude such applications and commands from the system perfectly.

That's true, but for anything mission critical you would expect them
to use some transactional mechanism, either with O_SYNC or fsync().
Otherwise they always risk data loss anyways.

> > It's unclear to me this special mode is really desirable.
> > Does it bring enough value to the user to justify the complexity
> > of another exotic option?  The case is relatively exotic,
> > as in dirty write cache that is mapped to a file.
> > 
> > Try to explain it in documentation and you see how ridiculous it sounds; u
> > it simply doesn't have clean semantics
> > 
> > ("In case you have applications with broken error IO handling on
> > your mission critical system ...") 
> 
> Generally, dropping unwritten dirty page caches is considered to be
> risky.  So the "panic on IO error" policy has been used as usual
> practice for some systems.  I just suggested that we adopted
> this policy into machine check errors. 

Hmm, what we could possibly do -- as followon patches -- would be to
let error_remove_page check the per file system panic-on-io-error
super block setting for dirty pages and panic in this case too.  
Unfortunately this setting is currently per file system, not generic,
so it would need to be a fs specific check (or the flag would need
to be moved into a generic fs superblock field first)

I think that would be relatively clean semantics wise. Would you be 
interested in working on patches for that? 

> Another option is to introduce "ignore all" policy instead of
> panicking at the beginig of memory_failure().  Perhaps it finally
> causes SRAR machine check, and then kernel will panic or a process
> will be killed.  Anyway, this is a topic for the next stage.

The problem is memory_failure() would then need to start distingushing
between AR=1 and AR=0 which it doesn't today.

It could be done, but would need some more work. 

> > If you want to have improved IO error handling feel free to
> > submit it separately. I agree this area could use some work.
> > But it probably needs more design work first.
> 
> Well, this patch set itself looks good to me.
> I also looked into the other patches, I couldn't find any
> problems (although I'm not good judge of reviewing).
> 
> Reviewed-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

Thanks for your review and your comments.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-08-12  7:46 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-05  9:36 [PATCH] [0/19] HWPOISON: Intro Andi Kleen
2009-08-05  9:36 ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [1/19] HWPOISON: Add page flag for poisoned pages Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [2/19] HWPOISON: Export some rmap vma locking to outside world Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [3/19] HWPOISON: Add support for poison swap entries v2 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [4/19] HWPOISON: Add new SIGBUS error codes for hardware poison signals Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [5/19] HWPOISON: Add basic support for poisoned pages in fault handler v3 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [6/19] HWPOISON: Add various poison checks in mm/memory.c v2 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [7/19] HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [8/19] HWPOISON: Use bitmask/action code for try_to_unmap behaviour Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [9/19] HWPOISON: Handle hardware poisoned pages in try_to_unmap Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [10/19] HWPOISON: check and isolate corrupted free pages v2 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [11/19] HWPOISON: Refactor truncate to allow direct truncating of page v2 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05 10:20   ` Nick Piggin
2009-08-05 10:20     ` Nick Piggin
2009-08-05 12:37     ` Wu Fengguang
2009-08-05 12:37       ` Wu Fengguang
2009-08-05 13:46     ` Andi Kleen
2009-08-05 13:46       ` Andi Kleen
2009-08-05 14:01       ` Nick Piggin
2009-08-05 14:01         ` Nick Piggin
2009-08-05 14:10         ` Andi Kleen
2009-08-05 14:10           ` Andi Kleen
2009-08-05 14:16           ` Nick Piggin
2009-08-05 14:16             ` Nick Piggin
2009-08-05 14:41             ` Andi Kleen
2009-08-05 14:41               ` Andi Kleen
2009-08-05 14:44               ` Nick Piggin
2009-08-05 14:44                 ` Nick Piggin
2009-08-05 15:00               ` Matthew Wilcox
2009-08-05 15:00                 ` Matthew Wilcox
2009-08-06 11:48             ` Martin Schwidefsky
2009-08-06 11:48               ` Martin Schwidefsky
2009-08-06 12:04               ` Andi Kleen
2009-08-06 12:04                 ` Andi Kleen
2009-08-05 15:12         ` Wu Fengguang
2009-08-05 15:12           ` Wu Fengguang
2009-08-05  9:36 ` [PATCH] [12/19] HWPOISON: Add invalidate_inode_page Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [13/19] HWPOISON: Define a new error_remove_page address space op for async truncation Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [14/19] HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [15/19] HWPOISON: The high level memory error handler in the VM v7 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [16/19] HWPOISON: Enable .remove_error_page for migration aware file systems Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05 11:12   ` Christoph Hellwig
2009-08-05 11:12     ` Christoph Hellwig
2009-08-05 11:52     ` Wu Fengguang
2009-08-05 11:52       ` Wu Fengguang
2009-08-05 13:50     ` Andi Kleen
2009-08-05 13:50       ` Andi Kleen
2009-08-10  6:36   ` Hidehiro Kawai
2009-08-10  6:36     ` Hidehiro Kawai
2009-08-10  7:07     ` Wu Fengguang
2009-08-10  7:07       ` Wu Fengguang
2009-08-11  3:48       ` Hidehiro Kawai
2009-08-11  3:48         ` Hidehiro Kawai
2009-08-11  6:59         ` Andi Kleen
2009-08-11  6:59           ` Andi Kleen
2009-08-11 12:38         ` Wu Fengguang
2009-08-11 12:38           ` Wu Fengguang
2009-08-10  7:44     ` Andi Kleen
2009-08-10  7:44       ` Andi Kleen
2009-08-11  3:50       ` Hidehiro Kawai
2009-08-11  3:50         ` Hidehiro Kawai
2009-08-11  7:17         ` Andi Kleen
2009-08-11  7:17           ` Andi Kleen
2009-08-12  2:49           ` Hidehiro Kawai
2009-08-12  2:49             ` Hidehiro Kawai
2009-08-12  7:46             ` Andi Kleen [this message]
2009-08-12  7:46               ` Andi Kleen
2009-08-12  9:52               ` Hidehiro Kawai
2009-08-12  9:52                 ` Hidehiro Kawai
2009-08-12 10:16                 ` Andi Kleen
2009-08-12 10:16                   ` Andi Kleen
2009-08-12  8:05           ` Nick Piggin
2009-08-12  8:05             ` Nick Piggin
2009-08-12  8:23             ` Andi Kleen
2009-08-12  8:23               ` Andi Kleen
2009-08-12  8:46               ` Nick Piggin
2009-08-12  8:46                 ` Nick Piggin
2009-08-12  8:57                 ` Andi Kleen
2009-08-12  8:57                   ` Andi Kleen
2009-08-12  9:05                   ` Nick Piggin
2009-08-12  9:05                     ` Nick Piggin
2009-08-12  9:39                     ` Wu Fengguang
2009-08-12  9:39                       ` Wu Fengguang
2009-08-05  9:36 ` [PATCH] [17/19] HWPOISON: Enable error_remove_page for NFS Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [18/19] HWPOISON: Add madvise() based injector for hardware poisoned pages v3 Andi Kleen
2009-08-05  9:36   ` Andi Kleen
2009-08-05  9:36 ` [PATCH] [19/19] HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs Andi Kleen
2009-08-05  9:36   ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090812074611.GC28848@basil.fritz.box \
    --to=andi@firstfloor.org \
    --cc=aia21@cantab.net \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mfasheh@suse.com \
    --cc=npiggin@suse.de \
    --cc=satoshi.oshima.fk@hitachi.com \
    --cc=swhiteho@redhat.com \
    --cc=taketoshi.sakuraba.hc@hitachi.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.