From: Pavel Machek <pavel@suse.cz>
To: Theodore Tso <tytso@mit.edu>, Rob Landley <rob@landley.net>,
kernel list <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@osdl.org>,
mtk.manpages@gmail.com, rdunlap@xenotime.net,
linux-doc@vger.kernel.org
Subject: [patch] Re: document ext3 requirements
Date: Mon, 5 Jan 2009 00:00:53 +0100 [thread overview]
Message-ID: <20090104230053.GG1913@elf.ucw.cz> (raw)
In-Reply-To: <20090104220634.GD22958@mit.edu>
On Sun 2009-01-04 17:06:34, Theodore Tso wrote:
> On Sun, Jan 04, 2009 at 01:49:49PM -0600, Rob Landley wrote:
> >
> > Want to document the granularity issues with flash, while you're at it?
> >
> > An inherent problem with using flash as a normal block device is that the
> > flash erase size is bigger than most filesystem sector sizes. So when you
> > request a write, it may erase and rewrite the next 64k, 128k, or even a couple
> > megabytes on the really _big_ ones.
> >
> > If you lose power in the middle of that, ext3 won't notice that data in the
> > "sectors" _after_ the one your were trying to write to got trashed.
>
> True enough, although the newer SSD's will have this problem addressed
> (although at least initially, they are **far** more costly than the
> el-cheapo 32GB SD cards you can find at the checkout counter at Fry's
> alongside battery-powered shavers and trashy ipod speakers).
>
> I will stress again, that most of this doesn't belong in
> Documentation/filesystems/ext3.txt, as most of this is *not*
> ext3-specific.
Agreed... So what about this one?
---
Document linux filesystem expectations. Ext3 can't handle write errors
of any kind, and can't handle non-atomic sector writes. Other
filesystems are probably even worse...
Signed-off-by: Pavel Machek <pavel@suse.cz>
diff --git a/Documentation/filesystems/expectations.txt b/Documentation/filesystems/expectations.txt
new file mode 100644
index 0000000..7817a9c
--- /dev/null
+++ b/Documentation/filesystems/expectations.txt
@@ -0,0 +1,44 @@
+Linux filesystems can only work correctly when several conditions are
+met in the block layer and below (disks, flash cards). Some of them
+are obvious ("data on media should not change randomly"), some are
+less so.
+
+Write errors not allowed (NO-WRITE-ERRORS)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Writes to media never fail. Even if disk returns error condition
+during write, filesystems can't handle that correctly, because success
+on fsync was already returned when data hit the journal.
+
+ Fortunately writes failing are very uncommon on traditional
+ spinning disks, as they have spare sectors they use when write
+ fails.
+
+Sector writes are atomic (ATOMIC-SECTORS)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Either whole sector is correctly written or nothing is written during
+powerfail.
+
+ Unfortuantely, none of the cheap USB/SD flash cards I seen do
+ behave like this, and are unsuitable for all linux filesystems
+ I know.
+
+ An inherent problem with using flash as a normal block
+ device is that the flash erase size is bigger than
+ most filesystem sector sizes. So when you request a
+ write, it may erase and rewrite the next 64k, 128k, or
+ even a couple megabytes on the really _big_ ones.
+
+ If you lose power in the middle of that, filesystem
+ won't notice that data in the "sectors" _after_ the
+ one your were trying to write to got trashed.
+
+ Because RAM tends to fail faster than rest of system during
+ powerfail, special hw killing DMA transfers may be neccessary;
+ otherwise, disks may write garbage during powerfail.
+ Not sure how common that problem is on generic PC machines.
+
+
+
+
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index 9dd2a3b..8cb64b0 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -188,6 +197,25 @@ mke2fs: create a ext3 partition with th
debugfs: ext2 and ext3 file system debugger.
ext2online: online (mounted) ext2 and ext3 filesystem resizer
+Requirements
+============
+
+Ext3 expects disk/storage subsystem to behave sanely. On sanely
+behaving disk subsystem, data that have been successfully synced will
+stay on the disk. Sane means:
+
+* write errors not allowed
+
+* sector writes are atomic
+
+(see expectations.txt; note that most/all linux filesystems have similar
+expectations)
+
+* either write caching is disabled, or hw can do barriers and they are enabled.
+
+ (Note that barriers are disabled by default, use "barrier=1"
+ mount option after making sure hw can support them).
+
References
==========
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
next prev parent reply other threads:[~2009-01-04 22:59 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-03 12:38 document ext3 requirements Pavel Machek
2009-01-03 21:17 ` Martin MOKREJŠ
2009-01-03 22:06 ` Pavel Machek
2009-01-03 22:17 ` Duane Griffin
2009-01-03 22:29 ` Pavel Machek
2009-01-03 23:01 ` Martin MOKREJŠ
2009-01-03 23:38 ` Duane Griffin
2009-01-03 23:50 ` Martin MOKREJŠ
2009-01-03 23:58 ` Robert Hancock
2009-01-04 0:08 ` Martin MOKREJŠ
2009-01-04 21:49 ` Ingo Oeser
2009-01-04 0:00 ` Duane Griffin
2009-01-04 0:11 ` Martin MOKREJŠ
2009-01-04 0:41 ` Duane Griffin
2009-01-04 3:52 ` Valdis.Kletnieks
2009-01-04 14:24 ` Duane Griffin
2009-01-04 18:40 ` Theodore Tso
2009-01-04 19:21 ` Geert Uytterhoeven
2009-01-04 19:36 ` Theodore Tso
2009-01-04 19:51 ` Duane Griffin
2009-01-04 21:55 ` Theodore Tso
2009-01-04 22:06 ` Duane Griffin
2009-01-04 22:42 ` Bron Gondwana
2009-01-05 3:22 ` Rob Landley
2009-01-04 0:19 ` Pavel Machek
2009-01-05 2:55 ` Rob Landley
2009-01-04 19:56 ` Rob Landley
2009-01-05 19:16 ` Theodore Tso
2009-01-06 19:20 ` Rob Landley
2009-01-06 10:08 ` Matthias Andree
2009-01-06 15:23 ` Theodore Tso
2009-01-03 23:12 ` Duane Griffin
2009-01-06 10:06 ` Matthias Andree
2009-01-04 2:32 ` Theodore Tso
2009-01-04 22:33 ` Pavel Machek
2009-01-04 22:34 ` [patch] document ext3 a bit better Pavel Machek
2009-01-05 14:57 ` Theodore Tso
2009-01-06 9:21 ` Pavel Machek
2009-01-09 23:24 ` Jiri Kosina
2009-01-09 23:36 ` Randy Dunlap
2009-01-09 23:47 ` Jiri Kosina
2009-01-04 13:35 ` document ext3 requirements Alexander E. Patrakov
2009-01-04 13:53 ` Valdis.Kletnieks
2009-01-04 18:21 ` Michael Tokarev
2009-01-04 18:38 ` Theodore Tso
2009-01-04 22:37 ` Pavel Machek
2009-01-04 23:58 ` Theodore Tso
2009-01-05 11:43 ` Alan Cox
2009-01-07 11:59 ` Rob Landley
2009-01-04 20:10 ` Pavel Machek
2009-01-04 19:49 ` Rob Landley
2009-01-04 22:06 ` Theodore Tso
2009-01-04 22:25 ` Pavel Machek
2009-01-04 23:00 ` Pavel Machek [this message]
2009-01-05 2:42 ` [patch] " Rob Landley
2009-01-05 9:54 ` Pavel Machek
2009-01-04 23:07 ` Pavel Machek
2009-01-05 1:38 ` Rob Landley
2009-01-04 22:55 ` Pavel Machek
2009-01-05 0:16 ` david
2009-01-05 9:38 ` Pavel Machek
2009-01-05 1:50 ` Rob Landley
2009-01-05 3:20 ` Martin K. Petersen
2009-01-05 9:45 ` Pavel Machek
2009-01-05 11:28 ` Alan Cox
2009-01-05 19:15 ` Martin K. Petersen
2009-01-05 20:19 ` Theodore Tso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090104230053.GG1913@elf.ucw.cz \
--to=pavel@suse.cz \
--cc=akpm@osdl.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=rdunlap@xenotime.net \
--cc=rob@landley.net \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox