From: Theodore Tso <tytso@mit.edu>
To: Chris Friesen <cfriesen@nortel.com>
Cc: Pavel Machek <pavel@suse.cz>,
mikulas@artax.karlin.mff.cuni.cz, clock@atrey.karlin.mff.cuni.cz,
kernel list <linux-kernel@vger.kernel.org>,
aviro@redhat.com
Subject: Re: writing file to disk: not as easy as it looks
Date: Tue, 2 Dec 2008 15:55:58 -0500 [thread overview]
Message-ID: <20081202205558.GD20858@mit.edu> (raw)
In-Reply-To: <49356EF2.7060806@nortel.com>
On Tue, Dec 02, 2008 at 11:22:58AM -0600, Chris Friesen wrote:
> Theodore Tso wrote:
>
>> Even for ext3/ext4 which is doing physical journalling, it's still the
>> case that the journal commits first, and it's only later when the
>> write happens that we write out the change. If the disk fails some of
>> the writes, it's possible to lose data, especially if the two blocks
>> involved in the node split are far apart, and the write to the
>> existing old btree block fails.
>
> Yikes. I was under the impression that once the journal hit the platter
> then the data were safe (barring media corruption).
Well, this is a case of media corruption (or a cosmic ray hitting
hitting a ribbon cable in the disk controller sending the write to the
wrong location on disk, or someone bumping the server causing the disk
head to lift up a little higher than normal while it was writing the
disk sector, etc.). But it is a case of the hard drive misbehaving.
Heck, if you have a hiccup while writing an inode table block out to
disk (for example a power failure at just the wrong time), so the
memory (which is more voltage sensitive than hard drives) DMA's
garbage which gets written to the inode table, you could lose a large
number of adjacent inodes when garbage gets splatted over the inode
table.
Ext3 tends to recover from this better than other filesystems, thanks
to the fact that it does physical block journalling, but you do pay
for this in terms of performance if you have a metadata-intensive
workload, because you're writing more bytes to the journal for each
metadata opeation.
> It seems like the more I learn about filesystems, the more failure modes
> there are and the fewer guarantees can be made. It's amazing that
> things work as well as they do...
There are certainly things you can do. Put your fileservers's on
UPS's. Use RAID. Make backups. Do all three. :-)
- Ted
next prev parent reply other threads:[~2008-12-02 20:56 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-02 9:40 writing file to disk: not as easy as it looks Pavel Machek
2008-12-02 14:04 ` Theodore Tso
2008-12-02 15:26 ` Pavel Machek
2008-12-02 16:37 ` Theodore Tso
2008-12-02 17:22 ` Chris Friesen
2008-12-02 20:55 ` Theodore Tso [this message]
2008-12-02 22:44 ` Pavel Machek
2008-12-02 22:50 ` Pavel Machek
2008-12-03 5:07 ` Theodore Tso
2008-12-03 8:46 ` Pavel Machek
2008-12-03 15:50 ` Mikulas Patocka
2008-12-03 15:54 ` Alan Cox
2008-12-03 17:37 ` Mikulas Patocka
2008-12-03 17:52 ` Alan Cox
2008-12-03 18:16 ` Pavel Machek
2008-12-03 18:33 ` Mikulas Patocka
2008-12-03 16:42 ` Theodore Tso
2008-12-03 17:43 ` Mikulas Patocka
2008-12-03 18:26 ` Pavel Machek
2008-12-03 15:34 ` Mikulas Patocka
2008-12-15 10:24 ` [patch] " Pavel Machek
2008-12-15 11:03 ` Pavel Machek
2008-12-15 20:08 ` Folkert van Heusden
2008-12-02 19:10 ` Folkert van Heusden
2008-12-02 23:01 ` Mikulas Patocka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081202205558.GD20858@mit.edu \
--to=tytso@mit.edu \
--cc=aviro@redhat.com \
--cc=cfriesen@nortel.com \
--cc=clock@atrey.karlin.mff.cuni.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=mikulas@artax.karlin.mff.cuni.cz \
--cc=pavel@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox