public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Pavel Machek <pavel@suse.cz>
To: Theodore Tso <tytso@mit.edu>, Chris Friesen <cfriesen@nortel.com>,
	mikulas@artax.karlin.mff.cuni.cz, clock@atrey.karlin.mff.cuni.cz,
	kernel list <linux-kernel@vger.kernel.org>,
	aviro@redhat.com
Cc: Andrew Morton <akpm@osdl.org>
Subject: [patch] Re: writing file to disk: not as easy as it looks
Date: Mon, 15 Dec 2008 11:24:50 +0100	[thread overview]
Message-ID: <20081215102450.GA9064@elf.ucw.cz> (raw)
In-Reply-To: <20081203050709.GL20858@mit.edu>

Hi!

> > > Heck, if you have a hiccup while writing an inode table block out to
> > > disk (for example a power failure at just the wrong time), so the
> > > memory (which is more voltage sensitive than hard drives) DMA's
> > > garbage which gets written to the inode table, you could lose a large
> > > number of adjacent inodes when garbage gets splatted over the inode
> > > table.
> > 
> > Ok, "memory failed before disk" is ... bad hardware.
> 
> It's PC class hardware.  Live with it.  Back when SGI made their own
> hardware, they noticed this problem, and so they wired up their SGI
> machines with powerfail interrupts, and extra big capacitors in
> their

Seems like bad hardware is very common indeed. Anyway, I guess it
would be fair to document what ext3 expects from disk subsystem for
safe operation. Does that summary sound correct/fair?

Signed-off-by: Pavel Machek <pavel@suse.cz>

diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index 9dd2a3b..3855fbd 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -188,6 +188,34 @@ mke2fs: 	create a ext3 partition with th
 debugfs: 	ext2 and ext3 file system debugger.
 ext2online:	online (mounted) ext2 and ext3 filesystem resizer
 
+Requirements
+============
+
+Ext3 expects disk/storage subsystem to behave sanely. On sanely
+behaving disk subsystem, data that have been successfully synced will
+stay on the disk. Sane means:
+
+* writes to media never fail. Even if disk returns error condition during
+  write, ext3 can't handle that correctly, because success on fsync was already
+  returned when data hit the journal.
+
+	   (Fortunately writes failing are very uncommon on disks, as they
+	   have spare sectors they use when write fails.)
+
+* either whole sector is correctly written or nothing is written during
+  powerfail.
+
+	   (Unfortuantely, all the cheap USB/SD flash cards I seen do behave
+	   like this, and are unsuitable for ext3. Because RAM tends to fail
+	   faster than rest of system during powerfail, special hw killing
+	   DMA transfers may be neccessary. Not sure how common that problem
+	   is on generic PC machines).
+
+* either write caching is disabled, or hw can do barriers and they are enabled.
+
+	   (Note that barriers are disabled by default, use "barrier=1"
+	   mount option after making sure hw can support them). 
+
 
 References
 ==========





-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

  parent reply	other threads:[~2008-12-15 10:23 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-02  9:40 writing file to disk: not as easy as it looks Pavel Machek
2008-12-02 14:04 ` Theodore Tso
2008-12-02 15:26   ` Pavel Machek
2008-12-02 16:37     ` Theodore Tso
2008-12-02 17:22       ` Chris Friesen
2008-12-02 20:55         ` Theodore Tso
2008-12-02 22:44           ` Pavel Machek
2008-12-02 22:50             ` Pavel Machek
2008-12-03  5:07             ` Theodore Tso
2008-12-03  8:46               ` Pavel Machek
2008-12-03 15:50                 ` Mikulas Patocka
2008-12-03 15:54                   ` Alan Cox
2008-12-03 17:37                     ` Mikulas Patocka
2008-12-03 17:52                       ` Alan Cox
2008-12-03 18:16                       ` Pavel Machek
2008-12-03 18:33                         ` Mikulas Patocka
2008-12-03 16:42                 ` Theodore Tso
2008-12-03 17:43                   ` Mikulas Patocka
2008-12-03 18:26                     ` Pavel Machek
2008-12-03 15:34               ` Mikulas Patocka
2008-12-15 10:24               ` Pavel Machek [this message]
2008-12-15 11:03           ` Pavel Machek
2008-12-15 20:08             ` Folkert van Heusden
2008-12-02 19:10       ` Folkert van Heusden
2008-12-02 23:01 ` Mikulas Patocka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081215102450.GA9064@elf.ucw.cz \
    --to=pavel@suse.cz \
    --cc=akpm@osdl.org \
    --cc=aviro@redhat.com \
    --cc=cfriesen@nortel.com \
    --cc=clock@atrey.karlin.mff.cuni.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikulas@artax.karlin.mff.cuni.cz \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox