linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] [0/16] HWPOISON: Intro
Date: Wed, 10 Jun 2009 14:47:27 +0200	[thread overview]
Message-ID: <20090610124727.GB22161@wotan.suse.de> (raw)
In-Reply-To: <20090610123600.GD5657@localhost>

On Wed, Jun 10, 2009 at 08:36:00PM +0800, Wu Fengguang wrote:
> On Wed, Jun 10, 2009 at 07:15:41PM +0800, Nick Piggin wrote:
> > > We can make read() IO succeed even if the relevant pages are corrupted
> > > - they can be isolated transparent to user space readers :-)
> > 
> > But if the page was dirty and you throw out the dirty data,
> > then next read will give inconsistent data.
> 
> Yup. That's a big problem - the application won't get any error
> feedback here if it doesn't call fsync() to commit IO.

Right.


> > > > So even if we did change existing EIO semantics then the
> > > > memory corruption case of throwing away dirty data is still
> > > > going to be "different" (wrong, I would say).
> > > 
> > > Oh well.
> > 
> > Well I just think SIGKILL is the much safer behaviour to
> > start with (and matches behaviour with mmapped pagecache
> > and anon), and does not introduce these different semantics.
> 
> So what?  SIGKILL any future processes visiting the corrupted file?
> Or better to return EIO to them? Either way we'll be maintaining
> a consistent AS_EIO_HWPOISON bit.

If you don't throw the page out of the pagecache, it could
be left in there as a marker to SIGKILL anybody who tries to
access that page. OTOH this might present some other
difficulties regarding supression of writeback etc. Not
quite sure.

Of course the safest mode, IMO, is to panic the kernel in
situations like this (eg. corruption in dirty pagecache). I
would almost like to see that made as the default mode. That
avoids all questions of how exactly to handle these things.
Then if you can subsequently justify what kind of application
or case would work better with a particular behaviour (such
as throw away the data) then we can discuss and merge that.


> > > 1) under read IO hwpoison pages can be hidden to user space
> > 
> > I mean for cases where the recovery cannot be transparent
> > (ie. error in dirty page).
> 
> OK. That's a good point.
> 
> > > 2) under write IO hwpoison pages are normally committed by pdflush,
> > >    so cannot find the impacted application to kill at all.
> > 
> > Correct.
> > 
> > > 3) fsync() users can be caught though. But then the application
> > >    have the option to check its return code. If it doesn't do it,
> > >    it may well don't care. So why kill it?
> > 
> > Well if it does not check, then we cannot find it to kill
> > it anyway. If it does care (and hence check with fsync),
> > then we could kill it.
> 
> If it really care, it will check EIO after fsync ;)
> But yes, if it moderately care, it may ignore the return value.
> 
> So SIGKILL on fsync() seems to be a good option.
> 
> > > Think about a multimedia server. Shall we kill the daemon if some IO
> > > page in the movie get corrupted?
> > 
> > My multimedia server is using mmap for data...
> > 
> > > And a mission critical server? 
> > 
> > Mission critical server should be killed too because it
> > likely does not understand this semantic of throwing out
> > dirty data page. It should be detected and restarted and
> > should recover or fail over to another server.
> 
> Sorry for the confusion. I meant one server may want to survive,
> while another want to kill (and restart service).

Yes I just don't think even a really good admin will know
what to choose. At which point might as well remove the option
and just try to implement something sane...

But maybe you can write some good documentation for it, I will
stand corrected ;) 

> > > Obviously the admin will want the right to choose.
> > 
> > I don't know if they are equipped to really know. Do they
> > know that their application will correctly handle these
> > semantics of throwing out dirty data? It is potentially
> > much more dangerous to do this exactly because it can confuse
> > the case where it matters most (ie. ones that care about
> > data integrity).
> > 
> > It just seems like killing is far less controversial and
> > simpler. Start with that and it should do the right thing
> > for most people anyway. We could discuss possible ways
> > to recover in another patch if you want to do this
> > EIO thing.
> 
> OK, we can
>         - kill fsync() users
>         - and then return EIO for later read()/write()s
>         - forget about the EIO condition on last file close()
> Do you agree?

I really don't know ;) Anything I can think could be wrong
for a given situation. panic seems like the best default
option to me.

I don't want to sound like I'm quibbling. I don't actually
care too much what options are implemented so long as each
is justified and documented, and so long as the default is a
sane one.

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-06-10 12:46 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-03 18:46 [PATCH] [0/16] HWPOISON: Intro Andi Kleen
2009-06-03 18:46 ` [PATCH] [1/16] HWPOISON: Add page flag for poisoned pages Andi Kleen
2009-06-03 18:46 ` [PATCH] [2/16] HWPOISON: Export some rmap vma locking to outside world Andi Kleen
2009-06-03 18:46 ` [PATCH] [3/16] HWPOISON: Add support for poison swap entries v2 Andi Kleen
2009-06-03 18:46 ` [PATCH] [4/16] HWPOISON: Add new SIGBUS error codes for hardware poison signals Andi Kleen
2009-06-03 18:46 ` [PATCH] [5/16] HWPOISON: Add basic support for poisoned pages in fault handler v3 Andi Kleen
2009-06-03 18:46 ` [PATCH] [6/16] HWPOISON: Add various poison checks in mm/memory.c Andi Kleen
2009-06-04  4:26   ` Wu Fengguang
2009-06-04  5:19     ` Andi Kleen
2009-06-04 11:55       ` Wu Fengguang
2009-06-04 12:52         ` Andi Kleen
2009-06-04 12:50           ` Wu Fengguang
2009-06-04 13:02             ` Andi Kleen
2009-06-04 13:16               ` Wu Fengguang
2009-06-09 10:25   ` Nick Piggin
2009-06-09 12:21     ` Wu Fengguang
2009-06-09 12:35       ` Nick Piggin
2009-06-03 18:46 ` [PATCH] [7/16] HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 Andi Kleen
2009-06-09  9:54   ` Nick Piggin
2009-06-09 12:34     ` [PATCH] HWPOISON: define VM_FAULT_HWPOISON to 0 when feature is disabled Wu Fengguang
2009-06-03 18:46 ` [PATCH] [8/16] HWPOISON: Use bitmask/action code for try_to_unmap behaviour Andi Kleen
2009-06-09  9:57   ` Nick Piggin
2009-06-10  2:27     ` Wu Fengguang
2009-06-10  6:07       ` Nick Piggin
2009-06-03 18:46 ` [PATCH] [9/16] HWPOISON: Handle hardware poisoned pages in try_to_unmap Andi Kleen
2009-06-04  4:35   ` Wu Fengguang
2009-06-04  5:21     ` Andi Kleen
2009-06-03 18:46 ` [PATCH] [10/16] HWPOISON: Handle poisoned pages in set_page_dirty() Andi Kleen
2009-06-04  0:36   ` Wu Fengguang
2009-06-04  5:27     ` Andi Kleen
2009-06-09  9:59   ` Nick Piggin
2009-06-09 12:51     ` Wu Fengguang
2009-06-03 18:46 ` [PATCH] [11/16] HWPOISON: check and isolate corrupted free pages v2 Andi Kleen
2009-06-09 10:02   ` Nick Piggin
2009-06-09 13:03     ` Wu Fengguang
2009-06-09 13:28       ` Nick Piggin
2009-06-09 13:49         ` Wu Fengguang
2009-06-09 13:55           ` Nick Piggin
2009-06-09 14:56             ` Wu Fengguang
2009-06-09 15:31               ` Nick Piggin
2009-06-03 18:46 ` [PATCH] [12/16] Refactor truncate to allow direct truncating of page Andi Kleen
2009-06-04  4:32   ` Wu Fengguang
2009-06-04  5:20     ` Andi Kleen
2009-06-03 18:46 ` [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v5 Andi Kleen
2009-06-04  3:24   ` Wu Fengguang
2009-06-04  5:13     ` Andi Kleen
2009-06-04  9:07       ` Wu Fengguang
2009-06-04  9:26         ` Andi Kleen
2009-06-09  9:51   ` Nick Piggin
2009-06-09 11:14     ` Nick Piggin
2009-06-09 10:09   ` Nick Piggin
2009-06-09 16:05     ` Hugh Dickins
2009-06-09 16:35       ` Nick Piggin
2009-06-10  8:38       ` Wu Fengguang
2009-06-10  8:59         ` Nick Piggin
2009-06-10  9:20           ` Wu Fengguang
2009-06-10 11:03             ` Nick Piggin
2009-06-10 12:16               ` Wu Fengguang
2009-06-10 12:36                 ` Nick Piggin
2009-06-12  9:58       ` Andi Kleen
2009-06-10  3:10     ` [PATCH] HWPOISON: fix tasklist_lock/anon_vma locking order Wu Fengguang
2009-06-03 18:46 ` [PATCH] [14/16] HWPOISON: FOR TESTING: Enable memory failure code unconditionally Andi Kleen
2009-06-03 18:46 ` [PATCH] [15/16] HWPOISON: Add madvise() based injector for hardware poisoned pages v3 Andi Kleen
2009-06-03 18:46 ` [PATCH] [16/16] HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs Andi Kleen
2009-06-09 10:20 ` [PATCH] [0/16] HWPOISON: Intro Nick Piggin
2009-06-10  9:07   ` Wu Fengguang
2009-06-10  9:18     ` Nick Piggin
2009-06-10  9:45       ` Wu Fengguang
2009-06-10 11:15         ` Nick Piggin
2009-06-10 12:36           ` Wu Fengguang
2009-06-10 12:47             ` Nick Piggin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2009-05-29 21:35 Andi Kleen
2009-05-29 21:52 ` Alan Cox
2009-05-29 22:24   ` Andi Kleen
2009-05-27 20:12 Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090610124727.GB22161@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=fengguang.wu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).