From: "Theodore Ts'o" <tytso@mit.edu>
To: Junfeng Yang <yjf@stanford.edu>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
ext2-devl@stanford.edu
Subject: Re: [CHECKER] crash after fsync causing serious FS corruptions (ext2, 2.6.11)
Date: Tue, 8 Mar 2005 07:31:09 -0500 [thread overview]
Message-ID: <20050308123109.GA7005@thunk.org> (raw)
In-Reply-To: <Pine.GSO.4.44.0503070124460.29202-100000@elaine24.Stanford.EDU>
On Mon, Mar 07, 2005 at 01:57:10AM -0800, Junfeng Yang wrote:
> FiSC (our FS checker) issues a warning on ext2, complaining that crash
> after fsync causes file system to corrupt. FS corrupts in two different
> ways: 1. file contains illegal blocks (such as block # -2) 2. one block
> owned by two different files.
>
> I diagnosed the warning a little bit and it appears that this warning can
> be triggered by the following steps:
>
> 1. a file is truncated, so several blocks are freed
> 2. a new file is created, and the blocks freed in step 1 are reused
> 3. fsync on the new file
> 4. crash and run fsck to recover.
>
> fsync should guarantee that a specific file is persistent on disk.
> Presumably, operations on other files should not mess up with the file we
> just fsync (true ?) However, I also understand that ext2 by default
> relies on e2fsck to provide file system consistency. Do you guys consider
> the above warning as a bug or not? Any clarification on this will be very
> helpful.
Whether or not it is a bug is debateable. (Talking to certain *BSD
folks on this subject will cause them to jump and down and froth at
the mouth.) It is *expected* behaviour, yes, and it is mitigated by
two factors. (1) Metadata for ext2 is synced out every 5 seconds,
while data is synced out every 60, so the max window for this race
described above is 5 seconds, and in practice rarely shows up if you
are not using fsync. (2) Unlike BSD's fsck, when a block is owned by
two different files, we offer an option to clone the affected files so
data isn't lost, while BSD's fsck shoots both files and asks questions
later.
I believe the warning should go away if you mount -o sync (but then
the filesystem will perform very slowly :-).
As I had alluded to earlier, this has historically been a bone of
contention with the *BSD folks since ext2 was much, much faster than
the BSD FFS, and the main reason why is that the FFS had all sorts of
logic to make sure metadata blocks would be synced in certain order to
make life easier for their fsck, and this made ext2 substantially
faster than the BSD FFS. Given that most people only pay attention to
benchmarks, this upset them. Ext2's approach relied on a simpler and
more performant kernel implementation, and a more intelligent fsck.
Also, as I had pointed out to the BSD folks, BSD 4.3/4.4's FFS also
had the property that they did just enough write ordering to guarantee
that fsck -p would run smoothly, but not enough to make any guarantees
about recently written files, and only a filesystem weenie would care
that fsck was clean when user data files were silently corrupted after
a crash. (This was all pre-soft updates, by the way.)
The tradeoff as far as ext2 was concerned was that if you got unlucky
and managed to trigger this warning it did require a manual fsck
instead of an automatic fsck. In actual practice, in the real world
this happened extremely rarely.
Should we fix it today? Given that we have ext3, I'd probably answer
no. It's a known property of ext2; we've lived with it for over ten
years, and to add this would just slow down ext2 (which gets used
often as benchmark standard to aspire to), and make the ext2 codebase
more complicated.
- Ted
next prev parent reply other threads:[~2005-03-08 12:31 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-07 9:57 [CHECKER] crash after fsync causing serious FS corruptions (ext2, 2.6.11) Junfeng Yang
2005-03-07 10:45 ` Jens Axboe
2005-03-07 22:55 ` Junfeng Yang
2005-03-07 23:22 ` [Ext2-devel] " Andreas Dilger
2005-03-08 0:25 ` Junfeng Yang
2005-03-08 11:02 ` Pavel Machek
2005-03-08 11:04 ` Jens Axboe
2005-03-08 12:31 ` Theodore Ts'o [this message]
2005-03-08 20:27 ` Junfeng Yang
2005-03-09 7:19 ` --update-- " Junfeng Yang
2005-03-20 2:00 ` Bernd Eckenfels
[not found] <3Fnc7-mf-11@gated-at.bofh.it>
[not found] ` <3FnYi-11J-1@gated-at.bofh.it>
2005-03-08 0:49 ` Robert Hancock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050308123109.GA7005@thunk.org \
--to=tytso@mit.edu \
--cc=ext2-devl@stanford.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=yjf@stanford.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox