IO error semantics - Nick Piggin

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: Jan Kara <jack@suse.cz>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
	linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Andreas Dilger <adilger@sun.com>, Theodore Ts'o <tytso@mit.edu>,
	Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>,
	linux-fsdevel@vger.kernel.org
Subject: IO error semantics
Date: Mon, 18 Jan 2010 17:05:18 +1100	[thread overview]
Message-ID: <20100118060518.GA9151@laptop> (raw)
In-Reply-To: <20100118051847.GA8678@laptop>

On Mon, Jan 18, 2010 at 04:18:47PM +1100, Nick Piggin wrote:
> We also need to remove some ClearPageUptodate calls I think (similar
> issues), so keep those in mind too. Unfortunately it looks like there
> are also a lot of filesystem specific tests of PageUptodate... but you
> could also move those under the new compatibility s_flag.
> 
> I don't know of a really good way to inject and test filesystem errors.
> Make request failures causes most fs to quickly go readonly or have
> bigger problems. If you're careful like try to only fail read IOs for
> data, or only fail write IOs not involved in integrity or journal
> operations, then test programs just tend to abort pretty quickly. Does
> anyone know of anything more systematic?

This might be a good time to bring up IO error behaviour again. I got
into some debates I think on Andi's hwpoison thread a while back, but
probably not appropriate thread to find a real solution to this.

The problem we have now is that IO error semantics are not well defined.
It is hard to even enumerate all the issues.

read IOs
  how to retry? appropriate defaults should happen at the block layer I
  think. Should retry behaviour be tunable by the mm/fs, or should that
  be coded explicitly as submission retry loops? Either way does imply
  there is either similar defaults for all types (or maybe classes) of
  drivers, or some way to query/set this.

  It would be nice to be able to set fs/driver behaviour from userspace
  too, in a generic (not driver or fs specific way). But defaults should
  be reasonable and similar between all, I guess.

write IOs
  This is more interesting. How to handle write IO errors. In my opinion
  we must not invalidate the data before an IO error is returned to
  somebody (whether it be fsync or a synchronous write syscall). Any
  earlier and the app just gets RAW consistency randomly violated. And I
  think it is important to treat IO errors as transparently as possible
  until the error can be detected.

  I happen to think that actually we should go further and not
  invalidate the data at all. This makes implementation simpler, and
  also allows us to retry writes like we can retry reads. It's also
  problematic to throw out errors at that point because *sync syscalls
  coming from elsewhere could result in loss of error reporting (think,
  sys_sync).

  If we go this way, we probably need another syscall and fs helper call
  to invalidate the dirty data when we give up on retries. truncate_range
  probably not appropriate because it is much harder to implement and
  maybe we want to try to get at the most recent data that is on disk.

  Also do we need to think about O_SYNC or -o sync type of writes that
  are implemented via writeback cache? We could invalidate the dirtied
  cache ASAP, which would leave a window where a concurrent read can see
  first new, then old data. It would also kind of break the above scheme
  in case the pagecache was already dirty via a descriptor without
  O_SYNC. It might just make sense to leave the pagecache dirty. Either
  way it should be documented I think.

Do we even care enough to bother thinking about this now? (serious question)

next prev parent reply	other threads:[~2010-01-18  6:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-14  6:12 [PATCH] ext3: prevent reread after write IO error Hidehiro Kawai
2010-01-14  9:05 ` Hidehiro Kawai
2010-01-14 10:14   ` [PATCH] ext3: prevent reread after write IO error v2 Hidehiro Kawai
2010-01-14 14:18     ` Jan Kara
2010-01-15 10:38       ` Hidehiro Kawai
2010-01-18  5:18       ` Nick Piggin
2010-01-18  6:05         ` Nick Piggin [this message]
2010-01-18 12:24           ` IO error semantics Dave Chinner
2010-01-18 14:00             ` Nick Piggin
2010-01-18 22:51               ` Dave Chinner
2010-01-18 23:33               ` Anton Altaparmakov
2010-01-25 15:23                 ` Ric Wheeler
2010-01-25 16:15                   ` Greg Freemyer
2010-01-25 17:47                   ` tytso
2010-01-25 17:50                     ` Ric Wheeler
2010-01-25 17:59                       ` Nick Piggin
     [not found]                     ` <20100125175529.GB2018@laptop>
2010-01-26  6:19                       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100118060518.GA9151@laptop \
    --to=npiggin@suse.de \
    --cc=adilger@sun.com \
    --cc=akpm@linux-foundation.org \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=satoshi.oshima.fk@hitachi.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).