linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Pavel Machek <pavel@ucw.cz>
Cc: Rob Landley <rob@landley.net>,
	kernel list <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>,
	mtk.manpages@gmail.com, rdunlap@xenotime.net,
	linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: ext2/3: document conditions when reliable operation is possible
Date: Mon, 16 Mar 2009 15:03:08 -0400	[thread overview]
Message-ID: <20090316190308.GE6357@mit.edu> (raw)
In-Reply-To: <20090316123051.GJ2405@elf.ucw.cz>

On Mon, Mar 16, 2009 at 01:30:51PM +0100, Pavel Machek wrote:
> Updated version here.
> 
> On Thu 2009-03-12 14:13:03, Rob Landley wrote:
> > On Thursday 12 March 2009 04:21:14 Pavel Machek wrote:
> > > Not all block devices are suitable for all filesystems. In fact, some
> > > block devices are so broken that reliable operation is pretty much
> > > impossible. Document stuff ext2/ext3 needs for reliable operation.

Some of what is here are bugs, and some are legitimate long-term
interfaces (for example, the question of losing I/O errors when two
processes are writing to the same file, or to a directory entry, and
errors aren't or in some cases, can't, be reflected back to userspace).

I'm a little concerned that some of this reads a bit too much like a
rant (and I know Pavel was very frustrated when he tried to use a
flash card with a sucky flash card socket) and it will get used the
wrong way by apoligists, because it mixes areas where "we suck, we
should do better", which a re bug reports, and "Posix or the
underlying block device layer makes it hard", and simply states them
as fundamental design requirements, when that's probably not true.

There's a lot of work that we could do to make I/O errors get better
reflected to userspace by fsync().  So state things as bald
requirements I think goes a little too far IMHO.  We can surely do
better.


> diff --git a/Documentation/filesystems/expectations.txt b/Documentation/filesystems/expectations.txt
> new file mode 100644
> index 0000000..710d119
> --- /dev/null
> +++ b/Documentation/filesystems/expectations.txt
> +
> +Write errors not allowed (NO-WRITE-ERRORS)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Writes to media never fail. Even if disk returns error condition
> +during write, filesystems can't handle that correctly, because success
> +on fsync was already returned when data hit the journal.

The last half of this sentence "because success on fsync was already
returned when data hit the journal", obviously doesn't apply to all
filesystems, since some filesystems, like ext2, don't journal data.
Even for ext3, it only applies in the case of data=journal mode.  

There are other issues here, such as fsync() only reports an I/O
problem to one caller, and in some cases I/O errors aren't propagated
up the storage stack.  The latter is clearly just a bug that should be
fixed; the former is more of an interface limitation.  But you don't
talk about in this section, and I think it would be good to have a
more extended discussion about I/O errors when writing data blocks,
and I/O errors writing metadata blocks, etc.


> +
> +Sector writes are atomic (ATOMIC-SECTORS)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Either whole sector is correctly written or nothing is written during
> +powerfail.

This requirement is not quite the same as what you discuss below.

> +
> +	Unfortunately, none of the cheap USB/SD flash cards I've seen
> +	do behave like this, and are thus unsuitable for all Linux
> +	filesystems I know.
> +
> +		An inherent problem with using flash as a normal block
> +		device is that the flash erase size is bigger than
> +		most filesystem sector sizes.  So when you request a
> +		write, it may erase and rewrite some 64k, 128k, or
> +		even a couple megabytes on the really _big_ ones.
> +
> +		If you lose power in the middle of that, filesystem
> +		won't notice that data in the "sectors" _around_ the
> +		one your were trying to write to got trashed.

The characteristic you descrive here is not an issue about whether
the whole sector is either written or nothing happens to the data ---
but rather, or at least in addition to that, there is also the issue
that when a there is a flash card failure --- particularly one caused
by a sucky flash card reader design causing the SD card to disconnect
from the laptop in the middle of a write --- there may be "collateral
damange"; that is, in addition to corrupting sector being writen,
adjacent sectors might also end up getting list as an unfortunate side
effect.

So there are actually two desirable properties for a storage system to
have; one is "don't damage the old data on a failed write"; and the
other is "don't cause collateral damage to adjacent sectors on a
failed write".

> +	Because RAM tends to fail faster than rest of system during 
> +	powerfail, special hw killing DMA transfers may be necessary;
> +	otherwise, disks may write garbage during powerfail.
> +	Not sure how common that problem is on generic PC machines.

This problem is still relatively common, from what I can tell.  And
ext3 handles this surprisingly well at least in the catastrophic case
of garbage getting written into the inode table, since the journal
replay often will "repair" the garbage that was written into the
filesystem metadata blocks.  It won't do a bit of good for the data
blocks, of course (unless you are using data=journal mode).  But this
means that in fact, ext3 is more resistant to suriving failures to the
first problem (powerfail while writing can damage old data on a failed
write) but fortunately, hard drives generally don't cause collateral
damage on a failed write.  Of course, there are some spectaular
exemption to this rule --- a physical shock which causes the head to
slam into a surface moving at 7200rpm can throw a lot of debris into
the hard drive enclosure, causing loss to adjacent sectors.

    	       		  	       	  - Ted

  reply	other threads:[~2009-03-16 19:04 UTC|newest]

Thread overview: 269+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-12  9:21 ext2/3: document conditions when reliable operation is possible Pavel Machek
2009-03-12 11:40 ` Jochen Voß
2009-03-21 11:24   ` Pavel Machek
2009-03-12 19:13 ` Rob Landley
2009-03-16 12:28   ` Pavel Machek
2009-03-16 19:26     ` Rob Landley
2009-03-23 10:45       ` Pavel Machek
2009-03-30 15:06         ` Goswin von Brederlow
     [not found]           ` <20090824093143.GD25591@elf.ucw.cz>
2009-08-24 11:19             ` [patch] " Florian Weimer
2009-08-24 13:01               ` Theodore Tso
2009-08-24 14:55                 ` Artem Bityutskiy
2009-08-24 22:30                   ` Rob Landley
2009-08-24 19:52                 ` Pavel Machek
2009-08-24 20:24                   ` Ric Wheeler
2009-08-24 20:52                     ` Pavel Machek
2009-08-24 21:08                       ` Ric Wheeler
2009-08-24 21:25                         ` Pavel Machek
2009-08-24 22:05                           ` Ric Wheeler
2009-08-24 22:22                             ` Zan Lynx
2009-08-24 22:44                               ` Pavel Machek
2009-08-25  0:34                                 ` Ric Wheeler
2009-08-24 23:42                               ` david
2009-08-24 22:41                             ` Pavel Machek
2009-08-24 22:39                           ` Theodore Tso
2009-08-24 23:00                             ` Pavel Machek
     [not found]                             ` <20090824230036.GK29763@elf.ucw.cz>
2009-08-25  0:02                               ` david
2009-08-25  9:32                                 ` Pavel Machek
2009-08-25  0:06                               ` Ric Wheeler
2009-08-25  9:34                                 ` Pavel Machek
2009-08-25 15:34                                   ` david
2009-08-26  3:32                                   ` Rik van Riel
2009-08-26 11:17                                     ` Pavel Machek
2009-08-26 11:29                                       ` david
2009-08-26 13:10                                         ` Pavel Machek
2009-08-26 13:43                                           ` david
2009-08-26 18:02                                             ` Theodore Tso
2009-08-27  6:28                                               ` Eric Sandeen
2009-11-09  8:53                                               ` periodic fsck was " Pavel Machek
     [not found]                                               ` <20091109085318.GE4818@elf.ucw.cz>
2009-11-09 14:05                                                 ` Theodore Tso
2009-11-09 15:58                                                   ` Andreas Dilger
2009-08-30  7:03                                             ` Pavel Machek
2009-08-26 12:28                                       ` Theodore Tso
2009-08-27  6:06                                         ` Rob Landley
2009-08-27  6:54                                           ` david
2009-08-27  7:34                                             ` Rob Landley
2009-08-28 14:37                                               ` david
2009-08-30  7:19                                             ` Pavel Machek
2009-08-30 12:48                                               ` david
2009-08-27  5:27                                     ` Rob Landley
2009-08-25  0:08                               ` Theodore Tso
2009-08-25  9:42                                 ` Pavel Machek
     [not found]                                 ` <20090825094244.GC15563@elf.ucw.cz>
2009-08-25 13:37                                   ` Ric Wheeler
2009-08-25 13:42                                     ` Alan Cox
2009-08-27  3:16                                       ` Rob Landley
2009-08-25 21:15                                     ` Pavel Machek
2009-08-25 22:42                                       ` Ric Wheeler
2009-08-25 22:51                                         ` Pavel Machek
2009-08-25 23:03                                           ` david
2009-08-25 23:29                                             ` Pavel Machek
2009-08-25 23:03                                           ` Ric Wheeler
2009-08-25 23:26                                             ` Pavel Machek
2009-08-25 23:40                                               ` Ric Wheeler
2009-08-25 23:48                                                 ` david
2009-08-25 23:53                                                 ` Pavel Machek
2009-08-26  0:11                                                   ` Ric Wheeler
2009-08-26  0:16                                                     ` Pavel Machek
2009-08-26  0:31                                                       ` Ric Wheeler
2009-08-26  1:00                                                         ` Theodore Tso
2009-08-26  1:15                                                           ` Ric Wheeler
2009-08-26  1:16                                                           ` Pavel Machek
2009-08-26  2:53                                                           ` Henrique de Moraes Holschuh
     [not found]                                                           ` <20090826011605.GS4300@elf.ucw.cz>
2009-08-26  2:55                                                             ` Theodore Tso
2009-08-26 13:37                                                               ` Ric Wheeler
     [not found]                                                           ` <4A948C94.7040103@redhat.com>
2009-08-26  2:58                                                             ` Theodore Tso
2009-08-26 10:39                                                               ` Ric Wheeler
     [not found]                                                               ` <4A9510D2.1090704@redhat.com>
2009-08-26 11:12                                                                 ` Pavel Machek
2009-08-26 11:28                                                                   ` david
2009-08-29  9:49                                                                     ` [testcase] test your fs/storage stack (was Re: [patch] ext2/3: document conditions when reliable operation is possible) Pavel Machek
2009-08-29 11:28                                                                       ` Ric Wheeler
2009-09-02 20:12                                                                         ` Pavel Machek
2009-09-02 20:42                                                                           ` Ric Wheeler
2009-09-02 23:00                                                                             ` Rob Landley
2009-09-02 23:09                                                                               ` david
2009-09-03  8:55                                                                                 ` Pavel Machek
2009-09-03  0:36                                                                               ` jim owens
2009-09-03  2:41                                                                                 ` Rob Landley
2009-09-03 14:14                                                                                   ` jim owens
2009-09-04  7:44                                                                                     ` Rob Landley
2009-09-04 11:49                                                                                       ` Ric Wheeler
2009-09-05 10:28                                                                                         ` Pavel Machek
2009-09-05 12:20                                                                                           ` Ric Wheeler
2009-09-05 13:54                                                                                           ` Jonathan Corbet
2009-09-05 21:27                                                                                             ` Pavel Machek
2009-09-05 21:56                                                                                               ` Theodore Tso
2009-09-02 22:45                                                                           ` Rob Landley
2009-09-02 22:49                                                                           ` [PATCH] Update Documentation/md.txt to mention journaling won't help dirty+degraded case Rob Landley
2009-09-03  9:08                                                                             ` Pavel Machek
2009-09-03 12:05                                                                             ` Ric Wheeler
2009-09-03 12:31                                                                               ` Pavel Machek
2009-08-29 16:35                                                                       ` [testcase] test your fs/storage stack (was Re: [patch] ext2/3: document conditions when reliable operation is possible) david
2009-08-30  7:07                                                                         ` Pavel Machek
2009-08-26 12:01                                                                   ` [patch] ext2/3: document conditions when reliable operation is possible Ric Wheeler
2009-08-26 12:23                                                                   ` Theodore Tso
2009-08-30  7:01                                                                     ` Pavel Machek
2009-08-27  5:19                                                               ` Rob Landley
2009-08-27 12:24                                                                 ` Theodore Tso
2009-08-27 13:10                                                                   ` Ric Wheeler
     [not found]                                                                   ` <4A9685D4.2070906@redhat.com>
2009-08-27 16:54                                                                     ` MD/DM and barriers (was Re: [patch] ext2/3: document conditions when reliable operation is possible) Jeff Garzik
2009-08-27 18:09                                                                       ` Alasdair G Kergon
2009-09-01 14:01                                                                       ` Pavel Machek
2009-09-02 16:17                                                                         ` Michael Tokarev
2009-08-29 10:02                                                                   ` [patch] ext2/3: document conditions when reliable operation is possible Pavel Machek
2009-09-03  9:47                                                           ` Pavel Machek
2009-08-26  3:50                                                   ` Rik van Riel
2009-08-27  3:53                                                 ` Rob Landley
2009-08-27 11:43                                                   ` Ric Wheeler
2009-08-27 20:51                                                     ` Rob Landley
2009-08-27 22:00                                                       ` Ric Wheeler
2009-08-28 14:49                                                       ` david
2009-08-29 10:05                                                         ` Pavel Machek
2009-08-29 20:22                                                           ` Rob Landley
2009-08-29 21:34                                                             ` Pavel Machek
2009-09-03 16:56                                                             ` what fsck can (and can't) do was " david
2009-09-03 19:27                                                               ` Theodore Tso
2009-08-27 22:13                                                     ` raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible) Pavel Machek
2009-08-28  1:32                                                       ` Ric Wheeler
2009-08-28  6:44                                                         ` Pavel Machek
2009-08-28  7:31                                                           ` NeilBrown
2009-11-09 10:50                                                             ` Pavel Machek
2009-08-28 11:16                                                           ` Ric Wheeler
2009-09-01 13:58                                                             ` Pavel Machek
2009-08-28  7:11                                                         ` raid is dangerous but that's secret Florian Weimer
2009-08-28  7:23                                                           ` NeilBrown
2009-08-28 12:08                                                         ` raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible) Theodore Tso
2009-08-30  7:51                                                           ` Pavel Machek
     [not found]                                                           ` <20090830075135.GA1874@ucw.cz>
2009-08-30  9:01                                                             ` Christian Kujau
2009-09-02 20:55                                                               ` Pavel Machek
2009-08-30 12:55                                                             ` david
2009-08-30 14:12                                                               ` Ric Wheeler
2009-08-30 14:44                                                                 ` Michael Tokarev
2009-08-30 16:10                                                                   ` Ric Wheeler
2009-08-30 16:35                                                                   ` Christoph Hellwig
2009-08-31 13:15                                                                     ` Ric Wheeler
2009-08-31 13:16                                                                       ` Christoph Hellwig
2009-08-31 13:19                                                                         ` Mark Lord
2009-08-31 13:21                                                                           ` Christoph Hellwig
2009-08-31 15:14                                                                             ` jim owens
2009-09-03  1:59                                                                             ` Ric Wheeler
2009-09-03 11:12                                                                               ` Krzysztof Halasa
2009-09-03 11:18                                                                                 ` Ric Wheeler
2009-09-03 13:34                                                                                   ` Krzysztof Halasa
2009-09-03 13:50                                                                                     ` Ric Wheeler
2009-09-03 13:59                                                                                       ` Krzysztof Halasa
2009-09-03 14:15                                                                                         ` wishful thinking about atomic, multi-sector or full MD stripe width, writes in storage Ric Wheeler
2009-09-03 14:26                                                                                           ` Florian Weimer
2009-09-03 15:09                                                                                             ` Ric Wheeler
2009-09-03 23:50                                                                                           ` Krzysztof Halasa
2009-09-04  0:39                                                                                             ` Ric Wheeler
2009-09-04 21:21                                                                                           ` Mark Lord
2009-09-04 21:29                                                                                             ` Ric Wheeler
2009-09-05 12:57                                                                                               ` Mark Lord
2009-09-05 13:40                                                                                                 ` Ric Wheeler
2009-09-05 21:43                                                                                                 ` NeilBrown
2009-09-07 11:45                                                                                           ` Pavel Machek
2009-09-07 13:10                                                                                             ` Theodore Tso
2010-04-04 13:47                                                                                               ` fsck more often when powerfail is detected (was Re: wishful thinking about atomic, multi-sector or full MD stripe width, writes in storage) Pavel Machek
2010-04-04 17:39                                                                                                 ` tytso
2010-04-04 17:59                                                                                                 ` Rob Landley
2010-04-04 18:45                                                                                                   ` Pavel Machek
2010-04-04 19:35                                                                                                     ` tytso
2010-04-04 19:29                                                                                                   ` tytso
2010-04-04 23:58                                                                                                     ` Rob Landley
2009-09-03 14:35                                                                                     ` raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible) david
2009-08-31 13:22                                                                         ` Ric Wheeler
2009-08-31 15:50                                                                           ` david
2009-08-31 16:21                                                                             ` Ric Wheeler
2009-08-31 18:31                                                                             ` Christoph Hellwig
2009-08-31 19:11                                                                               ` david
2009-08-30 15:05                                                               ` Pavel Machek
2009-08-30 15:20                                                             ` Theodore Tso
2009-08-31 17:49                                                               ` Jesse Brandeburg
     [not found]                                                               ` <4807377b0908311049id9a2167r937bc8447c2b3546@mail.gmail.com>
2009-08-31 18:01                                                                 ` Ric Wheeler
2009-08-31 21:01                                                                   ` MD5/6? (was Re: raid is dangerous but that's secret ...) Ron Johnson
2009-08-31 18:07                                                                 ` raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible) martin f krafft
2009-08-31 22:26                                                                   ` Jesse Brandeburg
2009-08-31 23:19                                                                     ` Ron Johnson
2009-09-01  5:45                                                                     ` martin f krafft
2009-09-05 10:34                                                               ` Pavel Machek
2009-08-25 23:46                                               ` [patch] ext2/3: document conditions when reliable operation is possible david
2009-08-25 23:08                                       ` Neil Brown
2009-08-25 23:44                                         ` Pavel Machek
2009-08-26  4:08                                           ` Rik van Riel
2009-08-26 11:15                                             ` Pavel Machek
2009-08-27  3:29                                               ` Rik van Riel
2009-08-25 16:11                                   ` Theodore Tso
2009-08-25 22:21                                     ` [patch] document flash/RAID dangers Pavel Machek
2009-08-25 22:33                                       ` david
2009-08-25 22:40                                         ` Pavel Machek
2009-08-25 22:59                                           ` david
2009-08-25 23:37                                             ` Pavel Machek
2009-08-25 23:48                                               ` Ric Wheeler
2009-08-26  0:06                                                 ` Pavel Machek
2009-08-26  0:12                                                   ` Ric Wheeler
2009-08-26  0:20                                                     ` Pavel Machek
2009-08-26  0:26                                                       ` david
2009-08-26  0:28                                                       ` Ric Wheeler
2009-08-26  0:38                                                         ` Pavel Machek
2009-08-26  0:45                                                           ` Ric Wheeler
2009-08-26 11:21                                                             ` Pavel Machek
2009-08-26 11:58                                                               ` Ric Wheeler
2009-08-26 12:40                                                                 ` Theodore Tso
2009-08-26 13:11                                                                   ` Ric Wheeler
     [not found]                                                                   ` <4A95349E.7010101@redhat.com>
2009-08-26 13:44                                                                     ` david
2009-08-29  9:38                                                                 ` Pavel Machek
2009-08-26  4:24                                                       ` Rik van Riel
2009-08-26 11:22                                                         ` Pavel Machek
2009-08-26 14:45                                                           ` Rik van Riel
2009-08-29  9:39                                                             ` Pavel Machek
2009-08-29 11:47                                                               ` Ron Johnson
2009-08-29 16:12                                                                 ` jim owens
2009-08-25 23:56                                               ` david
2009-08-26  0:12                                                 ` Pavel Machek
2009-08-26  0:20                                                   ` david
2009-08-26  0:39                                                     ` Pavel Machek
2009-08-26  1:17                                                       ` david
2009-08-26  0:26                                                   ` Ric Wheeler
2009-08-26  0:44                                                     ` Pavel Machek
2009-08-26  0:50                                                       ` Ric Wheeler
2009-08-26  1:19                                                       ` david
2009-08-26 11:25                                                         ` Pavel Machek
2009-08-26 12:37                                                           ` Theodore Tso
2009-08-30  6:49                                                             ` Pavel Machek
2009-08-26  4:20                                           ` Rik van Riel
2009-08-25 22:27                                     ` [patch] document that ext2 can't handle barriers Pavel Machek
2009-08-27  3:34                                 ` [patch] ext2/3: document conditions when reliable operation is possible Rob Landley
2009-08-27  8:46                                 ` David Woodhouse
2009-08-28 14:46                                   ` david
2009-08-29 10:09                                     ` Pavel Machek
2009-08-29 16:27                                       ` david
2009-08-29 21:33                                         ` Pavel Machek
2009-08-25 22:58                             ` Neil Brown
2009-08-25 23:10                               ` Ric Wheeler
2009-08-25 23:32                                 ` NeilBrown
2009-08-24 21:11                       ` Greg Freemyer
2009-08-25 20:56                         ` Rob Landley
2009-08-25 21:08                           ` david
2009-08-25 18:52                     ` Rob Landley
2009-08-25 14:43                 ` Florian Weimer
2009-08-24 13:50               ` Theodore Tso
2009-08-24 18:48                 ` Pavel Machek
2009-08-24 18:39               ` Pavel Machek
2009-08-24 13:21             ` Greg Freemyer
2009-08-24 18:44               ` Pavel Machek
2009-08-25 23:28               ` Neil Brown
2009-08-26  1:34                 ` david
2009-08-24 21:11             ` Rob Landley
2009-08-24 21:33               ` Pavel Machek
2009-08-25 18:45                 ` Jan Kara
2009-03-16 12:30   ` Pavel Machek
2009-03-16 19:03     ` Theodore Tso [this message]
2009-03-23 18:23       ` Pavel Machek
2009-03-16 19:40     ` Sitsofe Wheeler
2009-03-16 21:43       ` Rob Landley
2009-03-17  4:55         ` Kyle Moffett
2009-03-23 11:00       ` Pavel Machek
2009-08-29  1:33   ` Robert Hancock
2009-08-29 13:04     ` Alan Cox
2009-03-16 19:45 ` Greg Freemyer
2009-03-16 21:48   ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090316190308.GE6357@mit.edu \
    --to=tytso@mit.edu \
    --cc=akpm@osdl.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=pavel@ucw.cz \
    --cc=rdunlap@xenotime.net \
    --cc=rob@landley.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).