All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Machek <pavel@suse.cz>
To: Theodore Tso <tytso@mit.edu>, Rob Landley <rob@landley.net>,
	kernel list <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>,
	mtk.manpages@gmail.com, rdunlap@xenotime.net,
	linux-doc@vger.kernel.org
Subject: [patch] Re: document ext3 requirements
Date: Mon, 5 Jan 2009 00:00:53 +0100	[thread overview]
Message-ID: <20090104230053.GG1913@elf.ucw.cz> (raw)
In-Reply-To: <20090104220634.GD22958@mit.edu>

On Sun 2009-01-04 17:06:34, Theodore Tso wrote:
> On Sun, Jan 04, 2009 at 01:49:49PM -0600, Rob Landley wrote:
> > 
> > Want to document the granularity issues with flash, while you're at it?
> > 
> > An inherent problem with using flash as a normal block device is that the 
> > flash erase size is bigger than most filesystem sector sizes.  So when you 
> > request a write, it may erase and rewrite the next 64k, 128k, or even a couple 
> > megabytes on the really _big_ ones.
> > 
> > If you lose power in the middle of that, ext3 won't notice that data in the 
> > "sectors" _after_ the one your were trying to write to got trashed.
> 
> True enough, although the newer SSD's will have this problem addressed
> (although at least initially, they are **far** more costly than the
> el-cheapo 32GB SD cards you can find at the checkout counter at Fry's
> alongside battery-powered shavers and trashy ipod speakers).
> 
> I will stress again, that most of this doesn't belong in
> Documentation/filesystems/ext3.txt, as most of this is *not*
> ext3-specific.

Agreed... So what about this one?

---

Document linux filesystem expectations. Ext3 can't handle write errors
of any kind, and can't handle non-atomic sector writes. Other
filesystems are probably even worse...

Signed-off-by: Pavel Machek <pavel@suse.cz>

diff --git a/Documentation/filesystems/expectations.txt b/Documentation/filesystems/expectations.txt
new file mode 100644
index 0000000..7817a9c
--- /dev/null
+++ b/Documentation/filesystems/expectations.txt
@@ -0,0 +1,44 @@
+Linux filesystems can only work correctly when several conditions are
+met in the block layer and below (disks, flash cards). Some of them
+are obvious ("data on media should not change randomly"), some are
+less so.
+
+Write errors not allowed (NO-WRITE-ERRORS)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Writes to media never fail. Even if disk returns error condition
+during write, filesystems can't handle that correctly, because success
+on fsync was already returned when data hit the journal.
+
+	Fortunately writes failing are very uncommon on traditional 
+	spinning disks, as they have spare sectors they use when write
+	fails.
+
+Sector writes are atomic (ATOMIC-SECTORS)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Either whole sector is correctly written or nothing is written during
+powerfail.
+
+	Unfortuantely, none of the cheap USB/SD flash cards I seen do 
+	behave like this, and are unsuitable for all linux filesystems 
+	I know. 
+
+		An inherent problem with using flash as a normal block
+		device is that the flash erase size is bigger than
+		most filesystem sector sizes.  So when you request a
+		write, it may erase and rewrite the next 64k, 128k, or
+		even a couple megabytes on the really _big_ ones.
+
+		If you lose power in the middle of that, filesystem
+		won't notice that data in the "sectors" _after_ the
+		one your were trying to write to got trashed.
+
+	Because RAM tends to fail faster than rest of system during 
+	powerfail, special hw killing DMA transfers may be neccessary;
+	otherwise, disks may write garbage during powerfail.
+	Not sure how common that problem is on generic PC machines.
+
+
+
+
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index 9dd2a3b..8cb64b0 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -188,6 +197,25 @@ mke2fs: 	create a ext3 partition with th
 debugfs: 	ext2 and ext3 file system debugger.
 ext2online:	online (mounted) ext2 and ext3 filesystem resizer
 
+Requirements
+============
+
+Ext3 expects disk/storage subsystem to behave sanely. On sanely
+behaving disk subsystem, data that have been successfully synced will
+stay on the disk. Sane means:
+
+* write errors not allowed
+
+* sector writes are atomic
+
+(see expectations.txt; note that most/all linux filesystems have similar
+expectations)
+
+* either write caching is disabled, or hw can do barriers and they are enabled.
+
+	   (Note that barriers are disabled by default, use "barrier=1"
+	   mount option after making sure hw can support them). 
+
 
 References
 ==========


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

  parent reply	other threads:[~2009-01-04 22:59 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-03 12:38 document ext3 requirements Pavel Machek
2009-01-03 21:17 ` Martin MOKREJŠ
2009-01-03 22:06   ` Pavel Machek
2009-01-03 22:17   ` Duane Griffin
2009-01-03 22:29     ` Pavel Machek
2009-01-03 23:01       ` Martin MOKREJŠ
2009-01-03 23:38         ` Duane Griffin
2009-01-03 23:50           ` Martin MOKREJŠ
2009-01-03 23:58             ` Robert Hancock
2009-01-04  0:08               ` Martin MOKREJŠ
2009-01-04 21:49               ` Ingo Oeser
2009-01-04  0:00             ` Duane Griffin
2009-01-04  0:11               ` Martin MOKREJŠ
2009-01-04  0:41                 ` Duane Griffin
2009-01-04  3:52                   ` Valdis.Kletnieks
2009-01-04 14:24                     ` Duane Griffin
2009-01-04 18:40                       ` Theodore Tso
2009-01-04 19:21                         ` Geert Uytterhoeven
2009-01-04 19:36                           ` Theodore Tso
2009-01-04 19:51                             ` Duane Griffin
2009-01-04 21:55                               ` Theodore Tso
2009-01-04 22:06                                 ` Duane Griffin
2009-01-04 22:42                           ` Bron Gondwana
2009-01-05  3:22                           ` Rob Landley
2009-01-04  0:19         ` Pavel Machek
2009-01-05  2:55           ` Rob Landley
2009-01-04 19:56         ` Rob Landley
2009-01-05 19:16           ` Theodore Tso
2009-01-06 19:20             ` Rob Landley
2009-01-06 10:08         ` Matthias Andree
2009-01-06 15:23           ` Theodore Tso
2009-01-03 23:12       ` Duane Griffin
2009-01-06 10:06       ` Matthias Andree
2009-01-04  2:32 ` Theodore Tso
2009-01-04 22:33   ` Pavel Machek
2009-01-04 22:34   ` [patch] document ext3 a bit better Pavel Machek
2009-01-05 14:57     ` Theodore Tso
2009-01-06  9:21       ` Pavel Machek
2009-01-09 23:24         ` Jiri Kosina
2009-01-09 23:36           ` Randy Dunlap
2009-01-09 23:47             ` Jiri Kosina
2009-01-04 13:35 ` document ext3 requirements Alexander E. Patrakov
2009-01-04 13:53   ` Valdis.Kletnieks
2009-01-04 18:21   ` Michael Tokarev
2009-01-04 18:38   ` Theodore Tso
2009-01-04 22:37     ` Pavel Machek
2009-01-04 23:58       ` Theodore Tso
2009-01-05 11:43     ` Alan Cox
2009-01-07 11:59       ` Rob Landley
2009-01-04 20:10   ` Pavel Machek
2009-01-04 19:49 ` Rob Landley
2009-01-04 22:06   ` Theodore Tso
2009-01-04 22:25     ` Pavel Machek
2009-01-04 23:00     ` Pavel Machek [this message]
2009-01-05  2:42       ` [patch] " Rob Landley
2009-01-05  9:54         ` Pavel Machek
2009-01-04 23:07     ` Pavel Machek
2009-01-05  1:38     ` Rob Landley
2009-01-04 22:55   ` Pavel Machek
2009-01-05  0:16     ` david
2009-01-05  9:38       ` Pavel Machek
2009-01-05  1:50     ` Rob Landley
2009-01-05  3:20     ` Martin K. Petersen
2009-01-05  9:45       ` Pavel Machek
2009-01-05 11:28         ` Alan Cox
2009-01-05 19:15         ` Martin K. Petersen
2009-01-05 20:19           ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090104230053.GG1913@elf.ucw.cz \
    --to=pavel@suse.cz \
    --cc=akpm@osdl.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=rdunlap@xenotime.net \
    --cc=rob@landley.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.