From: Nick Piggin <npiggin@suse.de>
To: Jan Kara <jack@suse.cz>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Andreas Dilger <adilger@sun.com>, Theodore Ts'o <tytso@mit.edu>,
Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>,
linux-fsdevel@vger.kernel.org
Subject: IO error semantics
Date: Mon, 18 Jan 2010 17:05:18 +1100 [thread overview]
Message-ID: <20100118060518.GA9151@laptop> (raw)
In-Reply-To: <20100118051847.GA8678@laptop>
On Mon, Jan 18, 2010 at 04:18:47PM +1100, Nick Piggin wrote:
> We also need to remove some ClearPageUptodate calls I think (similar
> issues), so keep those in mind too. Unfortunately it looks like there
> are also a lot of filesystem specific tests of PageUptodate... but you
> could also move those under the new compatibility s_flag.
>
> I don't know of a really good way to inject and test filesystem errors.
> Make request failures causes most fs to quickly go readonly or have
> bigger problems. If you're careful like try to only fail read IOs for
> data, or only fail write IOs not involved in integrity or journal
> operations, then test programs just tend to abort pretty quickly. Does
> anyone know of anything more systematic?
This might be a good time to bring up IO error behaviour again. I got
into some debates I think on Andi's hwpoison thread a while back, but
probably not appropriate thread to find a real solution to this.
The problem we have now is that IO error semantics are not well defined.
It is hard to even enumerate all the issues.
read IOs
how to retry? appropriate defaults should happen at the block layer I
think. Should retry behaviour be tunable by the mm/fs, or should that
be coded explicitly as submission retry loops? Either way does imply
there is either similar defaults for all types (or maybe classes) of
drivers, or some way to query/set this.
It would be nice to be able to set fs/driver behaviour from userspace
too, in a generic (not driver or fs specific way). But defaults should
be reasonable and similar between all, I guess.
write IOs
This is more interesting. How to handle write IO errors. In my opinion
we must not invalidate the data before an IO error is returned to
somebody (whether it be fsync or a synchronous write syscall). Any
earlier and the app just gets RAW consistency randomly violated. And I
think it is important to treat IO errors as transparently as possible
until the error can be detected.
I happen to think that actually we should go further and not
invalidate the data at all. This makes implementation simpler, and
also allows us to retry writes like we can retry reads. It's also
problematic to throw out errors at that point because *sync syscalls
coming from elsewhere could result in loss of error reporting (think,
sys_sync).
If we go this way, we probably need another syscall and fs helper call
to invalidate the dirty data when we give up on retries. truncate_range
probably not appropriate because it is much harder to implement and
maybe we want to try to get at the most recent data that is on disk.
Also do we need to think about O_SYNC or -o sync type of writes that
are implemented via writeback cache? We could invalidate the dirtied
cache ASAP, which would leave a window where a concurrent read can see
first new, then old data. It would also kind of break the above scheme
in case the pagecache was already dirty via a descriptor without
O_SYNC. It might just make sense to leave the pagecache dirty. Either
way it should be documented I think.
Do we even care enough to bother thinking about this now? (serious question)
next prev parent reply other threads:[~2010-01-18 6:05 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-14 6:12 [PATCH] ext3: prevent reread after write IO error Hidehiro Kawai
2010-01-14 9:05 ` Hidehiro Kawai
2010-01-14 10:14 ` [PATCH] ext3: prevent reread after write IO error v2 Hidehiro Kawai
2010-01-14 14:18 ` Jan Kara
2010-01-15 10:38 ` Hidehiro Kawai
2010-01-18 5:18 ` Nick Piggin
2010-01-18 6:05 ` Nick Piggin [this message]
2010-01-18 12:24 ` IO error semantics Dave Chinner
2010-01-18 14:00 ` Nick Piggin
2010-01-18 22:51 ` Dave Chinner
2010-01-18 23:33 ` Anton Altaparmakov
2010-01-25 15:23 ` Ric Wheeler
2010-01-25 16:15 ` Greg Freemyer
2010-01-25 17:47 ` tytso
2010-01-25 17:50 ` Ric Wheeler
2010-01-25 17:59 ` Nick Piggin
[not found] ` <20100125175529.GB2018@laptop>
2010-01-26 6:19 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100118060518.GA9151@laptop \
--to=npiggin@suse.de \
--cc=adilger@sun.com \
--cc=akpm@linux-foundation.org \
--cc=hidehiro.kawai.ez@hitachi.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=satoshi.oshima.fk@hitachi.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).