public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Pavel Machek <pavel@suse.cz>
To: Theodore Tso <tytso@mit.edu>, Rob Landley <rob@landley.net>,
	kernel list <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>,
	mtk.manpages@gmail.com, rdunlap@xenotime.net,
	linux-doc@vger.kernel.org
Subject: [patch] Re: document ext3 requirements
Date: Mon, 5 Jan 2009 00:00:53 +0100	[thread overview]
Message-ID: <20090104230053.GG1913@elf.ucw.cz> (raw)
In-Reply-To: <20090104220634.GD22958@mit.edu>

On Sun 2009-01-04 17:06:34, Theodore Tso wrote:
> On Sun, Jan 04, 2009 at 01:49:49PM -0600, Rob Landley wrote:
> > 
> > Want to document the granularity issues with flash, while you're at it?
> > 
> > An inherent problem with using flash as a normal block device is that the 
> > flash erase size is bigger than most filesystem sector sizes.  So when you 
> > request a write, it may erase and rewrite the next 64k, 128k, or even a couple 
> > megabytes on the really _big_ ones.
> > 
> > If you lose power in the middle of that, ext3 won't notice that data in the 
> > "sectors" _after_ the one your were trying to write to got trashed.
> 
> True enough, although the newer SSD's will have this problem addressed
> (although at least initially, they are **far** more costly than the
> el-cheapo 32GB SD cards you can find at the checkout counter at Fry's
> alongside battery-powered shavers and trashy ipod speakers).
> 
> I will stress again, that most of this doesn't belong in
> Documentation/filesystems/ext3.txt, as most of this is *not*
> ext3-specific.

Agreed... So what about this one?

---

Document linux filesystem expectations. Ext3 can't handle write errors
of any kind, and can't handle non-atomic sector writes. Other
filesystems are probably even worse...

Signed-off-by: Pavel Machek <pavel@suse.cz>

diff --git a/Documentation/filesystems/expectations.txt b/Documentation/filesystems/expectations.txt
new file mode 100644
index 0000000..7817a9c
--- /dev/null
+++ b/Documentation/filesystems/expectations.txt
@@ -0,0 +1,44 @@
+Linux filesystems can only work correctly when several conditions are
+met in the block layer and below (disks, flash cards). Some of them
+are obvious ("data on media should not change randomly"), some are
+less so.
+
+Write errors not allowed (NO-WRITE-ERRORS)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Writes to media never fail. Even if disk returns error condition
+during write, filesystems can't handle that correctly, because success
+on fsync was already returned when data hit the journal.
+
+	Fortunately writes failing are very uncommon on traditional 
+	spinning disks, as they have spare sectors they use when write
+	fails.
+
+Sector writes are atomic (ATOMIC-SECTORS)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Either whole sector is correctly written or nothing is written during
+powerfail.
+
+	Unfortuantely, none of the cheap USB/SD flash cards I seen do 
+	behave like this, and are unsuitable for all linux filesystems 
+	I know. 
+
+		An inherent problem with using flash as a normal block
+		device is that the flash erase size is bigger than
+		most filesystem sector sizes.  So when you request a
+		write, it may erase and rewrite the next 64k, 128k, or
+		even a couple megabytes on the really _big_ ones.
+
+		If you lose power in the middle of that, filesystem
+		won't notice that data in the "sectors" _after_ the
+		one your were trying to write to got trashed.
+
+	Because RAM tends to fail faster than rest of system during 
+	powerfail, special hw killing DMA transfers may be neccessary;
+	otherwise, disks may write garbage during powerfail.
+	Not sure how common that problem is on generic PC machines.
+
+
+
+
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index 9dd2a3b..8cb64b0 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -188,6 +197,25 @@ mke2fs: 	create a ext3 partition with th
 debugfs: 	ext2 and ext3 file system debugger.
 ext2online:	online (mounted) ext2 and ext3 filesystem resizer
 
+Requirements
+============
+
+Ext3 expects disk/storage subsystem to behave sanely. On sanely
+behaving disk subsystem, data that have been successfully synced will
+stay on the disk. Sane means:
+
+* write errors not allowed
+
+* sector writes are atomic
+
+(see expectations.txt; note that most/all linux filesystems have similar
+expectations)
+
+* either write caching is disabled, or hw can do barriers and they are enabled.
+
+	   (Note that barriers are disabled by default, use "barrier=1"
+	   mount option after making sure hw can support them). 
+
 
 References
 ==========


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

  parent reply	other threads:[~2009-01-04 22:59 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-03 12:38 document ext3 requirements Pavel Machek
2009-01-03 21:17 ` Martin MOKREJŠ
2009-01-03 22:06   ` Pavel Machek
2009-01-03 22:17   ` Duane Griffin
2009-01-03 22:29     ` Pavel Machek
2009-01-03 23:01       ` Martin MOKREJŠ
2009-01-03 23:38         ` Duane Griffin
2009-01-03 23:50           ` Martin MOKREJŠ
2009-01-03 23:58             ` Robert Hancock
2009-01-04  0:08               ` Martin MOKREJŠ
2009-01-04 21:49               ` Ingo Oeser
2009-01-04  0:00             ` Duane Griffin
2009-01-04  0:11               ` Martin MOKREJŠ
2009-01-04  0:41                 ` Duane Griffin
2009-01-04  3:52                   ` Valdis.Kletnieks
2009-01-04 14:24                     ` Duane Griffin
2009-01-04 18:40                       ` Theodore Tso
2009-01-04 19:21                         ` Geert Uytterhoeven
2009-01-04 19:36                           ` Theodore Tso
2009-01-04 19:51                             ` Duane Griffin
2009-01-04 21:55                               ` Theodore Tso
2009-01-04 22:06                                 ` Duane Griffin
2009-01-04 22:42                           ` Bron Gondwana
2009-01-05  3:22                           ` Rob Landley
2009-01-04  0:19         ` Pavel Machek
2009-01-05  2:55           ` Rob Landley
2009-01-04 19:56         ` Rob Landley
2009-01-05 19:16           ` Theodore Tso
2009-01-06 19:20             ` Rob Landley
2009-01-06 10:08         ` Matthias Andree
2009-01-06 15:23           ` Theodore Tso
2009-01-03 23:12       ` Duane Griffin
2009-01-06 10:06       ` Matthias Andree
2009-01-04  2:32 ` Theodore Tso
2009-01-04 22:33   ` Pavel Machek
2009-01-04 22:34   ` [patch] document ext3 a bit better Pavel Machek
2009-01-05 14:57     ` Theodore Tso
2009-01-06  9:21       ` Pavel Machek
2009-01-09 23:24         ` Jiri Kosina
2009-01-09 23:36           ` Randy Dunlap
2009-01-09 23:47             ` Jiri Kosina
2009-01-04 13:35 ` document ext3 requirements Alexander E. Patrakov
2009-01-04 13:53   ` Valdis.Kletnieks
2009-01-04 18:21   ` Michael Tokarev
2009-01-04 18:38   ` Theodore Tso
2009-01-04 22:37     ` Pavel Machek
2009-01-04 23:58       ` Theodore Tso
2009-01-05 11:43     ` Alan Cox
2009-01-07 11:59       ` Rob Landley
2009-01-04 20:10   ` Pavel Machek
2009-01-04 19:49 ` Rob Landley
2009-01-04 22:06   ` Theodore Tso
2009-01-04 22:25     ` Pavel Machek
2009-01-04 23:00     ` Pavel Machek [this message]
2009-01-05  2:42       ` [patch] " Rob Landley
2009-01-05  9:54         ` Pavel Machek
2009-01-04 23:07     ` Pavel Machek
2009-01-05  1:38     ` Rob Landley
2009-01-04 22:55   ` Pavel Machek
2009-01-05  0:16     ` david
2009-01-05  9:38       ` Pavel Machek
2009-01-05  1:50     ` Rob Landley
2009-01-05  3:20     ` Martin K. Petersen
2009-01-05  9:45       ` Pavel Machek
2009-01-05 11:28         ` Alan Cox
2009-01-05 19:15         ` Martin K. Petersen
2009-01-05 20:19           ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090104230053.GG1913@elf.ucw.cz \
    --to=pavel@suse.cz \
    --cc=akpm@osdl.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=rdunlap@xenotime.net \
    --cc=rob@landley.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox