From: Pavel Machek <pavel@suse.cz>
To: Theodore Tso <tytso@mit.edu>, Chris Friesen <cfriesen@nortel.com>,
mikulas@artax.karlin.mff.cuni.cz, clock@atrey.karlin.mff.cuni.cz,
kernel list <linux-kernel@vger.kernel.org>,
aviro@redhat.com
Subject: Re: writing file to disk: not as easy as it looks
Date: Mon, 15 Dec 2008 12:03:10 +0100 [thread overview]
Message-ID: <20081215110310.GA10274@elf.ucw.cz> (raw)
In-Reply-To: <20081202205558.GD20858@mit.edu>
On Tue 2008-12-02 15:55:58, Theodore Tso wrote:
> On Tue, Dec 02, 2008 at 11:22:58AM -0600, Chris Friesen wrote:
> > Theodore Tso wrote:
> >
> >> Even for ext3/ext4 which is doing physical journalling, it's still the
> >> case that the journal commits first, and it's only later when the
> >> write happens that we write out the change. If the disk fails some of
> >> the writes, it's possible to lose data, especially if the two blocks
> >> involved in the node split are far apart, and the write to the
> >> existing old btree block fails.
> >
> > Yikes. I was under the impression that once the journal hit the platter
> > then the data were safe (barring media corruption).
>
> Well, this is a case of media corruption (or a cosmic ray hitting
> hitting a ribbon cable in the disk controller sending the write to the
> wrong location on disk, or someone bumping the server causing the disk
> head to lift up a little higher than normal while it was writing the
> disk sector, etc.). But it is a case of the hard drive misbehaving.
>
> Heck, if you have a hiccup while writing an inode table block out to
> disk (for example a power failure at just the wrong time), so the
...
> Ext3 tends to recover from this better than other filesystems, thanks
> to the fact that it does physical block journalling, but you do pay
> for this in terms of performance if you have a metadata-intensive
> workload, because you're writing more bytes to the journal for each
> metadata opeation.
>
> > It seems like the more I learn about filesystems, the more failure modes
> > there are and the fewer guarantees can be made. It's amazing that
> > things work as well as they do...
>
> There are certainly things you can do. Put your fileservers's on
> UPS's. Use RAID. Make backups. Do all three. :-)
Okay, so we pretty much know that ext3 journalling helps in "user hit
the reset button" case. (And we are pretty sure ext2/ext3 works in
"clean unmount" case). Otherwise
*) kernel bug -> journalling does not help.
*) sudden powerfail -> journalling helps works on SGI high-end
hardware. It may or may not help on PC-class hardware.
We already do periodic checks, even on ext3. Maybe we should do fsck
more often if we see evidence of unclean shutdowns (because we know
PC hardware is crap...). I actually have patch somewhere, should I
ressurect it?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
next prev parent reply other threads:[~2008-12-15 11:01 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-02 9:40 writing file to disk: not as easy as it looks Pavel Machek
2008-12-02 14:04 ` Theodore Tso
2008-12-02 15:26 ` Pavel Machek
2008-12-02 16:37 ` Theodore Tso
2008-12-02 17:22 ` Chris Friesen
2008-12-02 20:55 ` Theodore Tso
2008-12-02 22:44 ` Pavel Machek
2008-12-02 22:50 ` Pavel Machek
2008-12-03 5:07 ` Theodore Tso
2008-12-03 8:46 ` Pavel Machek
2008-12-03 15:50 ` Mikulas Patocka
2008-12-03 15:54 ` Alan Cox
2008-12-03 17:37 ` Mikulas Patocka
2008-12-03 17:52 ` Alan Cox
2008-12-03 18:16 ` Pavel Machek
2008-12-03 18:33 ` Mikulas Patocka
2008-12-03 16:42 ` Theodore Tso
2008-12-03 17:43 ` Mikulas Patocka
2008-12-03 18:26 ` Pavel Machek
2008-12-03 15:34 ` Mikulas Patocka
2008-12-15 10:24 ` [patch] " Pavel Machek
2008-12-15 11:03 ` Pavel Machek [this message]
2008-12-15 20:08 ` Folkert van Heusden
2008-12-02 19:10 ` Folkert van Heusden
2008-12-02 23:01 ` Mikulas Patocka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081215110310.GA10274@elf.ucw.cz \
--to=pavel@suse.cz \
--cc=aviro@redhat.com \
--cc=cfriesen@nortel.com \
--cc=clock@atrey.karlin.mff.cuni.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=mikulas@artax.karlin.mff.cuni.cz \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox