From: Pavel Machek <pavel@suse.cz>
To: Theodore Tso <tytso@mit.edu>, Chris Friesen <cfriesen@nortel.com>,
mikulas@artax.karlin.mff.cuni.cz, clock@atrey.karlin.mff.cuni.cz,
kernel list <linux-kernel@vger.kernel.org>,
aviro@redhat.com
Cc: Andrew Morton <akpm@osdl.org>
Subject: [patch] Re: writing file to disk: not as easy as it looks
Date: Mon, 15 Dec 2008 11:24:50 +0100 [thread overview]
Message-ID: <20081215102450.GA9064@elf.ucw.cz> (raw)
In-Reply-To: <20081203050709.GL20858@mit.edu>
Hi!
> > > Heck, if you have a hiccup while writing an inode table block out to
> > > disk (for example a power failure at just the wrong time), so the
> > > memory (which is more voltage sensitive than hard drives) DMA's
> > > garbage which gets written to the inode table, you could lose a large
> > > number of adjacent inodes when garbage gets splatted over the inode
> > > table.
> >
> > Ok, "memory failed before disk" is ... bad hardware.
>
> It's PC class hardware. Live with it. Back when SGI made their own
> hardware, they noticed this problem, and so they wired up their SGI
> machines with powerfail interrupts, and extra big capacitors in
> their
Seems like bad hardware is very common indeed. Anyway, I guess it
would be fair to document what ext3 expects from disk subsystem for
safe operation. Does that summary sound correct/fair?
Signed-off-by: Pavel Machek <pavel@suse.cz>
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index 9dd2a3b..3855fbd 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -188,6 +188,34 @@ mke2fs: create a ext3 partition with th
debugfs: ext2 and ext3 file system debugger.
ext2online: online (mounted) ext2 and ext3 filesystem resizer
+Requirements
+============
+
+Ext3 expects disk/storage subsystem to behave sanely. On sanely
+behaving disk subsystem, data that have been successfully synced will
+stay on the disk. Sane means:
+
+* writes to media never fail. Even if disk returns error condition during
+ write, ext3 can't handle that correctly, because success on fsync was already
+ returned when data hit the journal.
+
+ (Fortunately writes failing are very uncommon on disks, as they
+ have spare sectors they use when write fails.)
+
+* either whole sector is correctly written or nothing is written during
+ powerfail.
+
+ (Unfortuantely, all the cheap USB/SD flash cards I seen do behave
+ like this, and are unsuitable for ext3. Because RAM tends to fail
+ faster than rest of system during powerfail, special hw killing
+ DMA transfers may be neccessary. Not sure how common that problem
+ is on generic PC machines).
+
+* either write caching is disabled, or hw can do barriers and they are enabled.
+
+ (Note that barriers are disabled by default, use "barrier=1"
+ mount option after making sure hw can support them).
+
References
==========
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
next prev parent reply other threads:[~2008-12-15 10:23 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-02 9:40 writing file to disk: not as easy as it looks Pavel Machek
2008-12-02 14:04 ` Theodore Tso
2008-12-02 15:26 ` Pavel Machek
2008-12-02 16:37 ` Theodore Tso
2008-12-02 17:22 ` Chris Friesen
2008-12-02 20:55 ` Theodore Tso
2008-12-02 22:44 ` Pavel Machek
2008-12-02 22:50 ` Pavel Machek
2008-12-03 5:07 ` Theodore Tso
2008-12-03 8:46 ` Pavel Machek
2008-12-03 15:50 ` Mikulas Patocka
2008-12-03 15:54 ` Alan Cox
2008-12-03 17:37 ` Mikulas Patocka
2008-12-03 17:52 ` Alan Cox
2008-12-03 18:16 ` Pavel Machek
2008-12-03 18:33 ` Mikulas Patocka
2008-12-03 16:42 ` Theodore Tso
2008-12-03 17:43 ` Mikulas Patocka
2008-12-03 18:26 ` Pavel Machek
2008-12-03 15:34 ` Mikulas Patocka
2008-12-15 10:24 ` Pavel Machek [this message]
2008-12-15 11:03 ` Pavel Machek
2008-12-15 20:08 ` Folkert van Heusden
2008-12-02 19:10 ` Folkert van Heusden
2008-12-02 23:01 ` Mikulas Patocka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081215102450.GA9064@elf.ucw.cz \
--to=pavel@suse.cz \
--cc=akpm@osdl.org \
--cc=aviro@redhat.com \
--cc=cfriesen@nortel.com \
--cc=clock@atrey.karlin.mff.cuni.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=mikulas@artax.karlin.mff.cuni.cz \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.