From: Theodore Tso <tytso@mit.edu>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Pavel Machek <pavel@suse.cz>, Rob Landley <rob@landley.net>,
kernel list <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@osdl.org>,
mtk.manpages@gmail.com, rdunlap@xenotime.net,
linux-doc@vger.kernel.org
Subject: Re: document ext3 requirements
Date: Mon, 5 Jan 2009 15:19:28 -0500 [thread overview]
Message-ID: <20090105201928.GD8939@mit.edu> (raw)
In-Reply-To: <yq1sknx1s4f.fsf@sermon.lab.mkp.net>
On Mon, Jan 05, 2009 at 02:15:44PM -0500, Martin K. Petersen wrote:
>
> It works some of the time. But in reality if you yank power halfway
> during a write operation the end result is undefined.
>
> The saving grace for normal users is that the potential corruption is
> limited to a couple of sectors.
A few years ago it was asserted to me that the internal block size for
spinning magnetic media was around 32k. So if the hard drive doesn't
have enough of a capacitor or other energy reserve to complete its
internal read-modify-write cycle, attempts to read the 32k chunk of
disk could result in hard ECC failures that would cause the blocks in
question to all return uncorrectiable read errors when they are
accessed.
Of course, if the memory goes south first, and you're in the middle of
streaming a 128k update to the inode the filesystem, and the power
fails, and the memory start returning garbage during the DMA
operation, you may have much bigger problems. :-)
So it's probably more than "a couple of sectors"....
> The current suck of flash SSDs is that the erase block size amplifies
> this problem by at least one order of magnitude, often two. I have a
> couple of SSDs here that will leave my filesystem in shambles every time
> the machine crashes. I quickly got tired of reinstalling Fedora several
> times per week so now my main machine is back to spinning media.
The erase block size is typically 1 to 4 megabytes, from my
understanding. So yeah, that's easily 1-2 orders of magnitude. Worse
yet, flash's sequential streaming write speeds are much slower than
hard drive's (anywhere from a factor of 3 to 12 depending on
cheap/trashy the flash drive happens to be), so that opens the time
window even further, by possibly as much as another order of magnitude.
I also suspect that HDD manufactures have learned various tricks (due
to enterprise storage/database vendors leaning on them) to make the
drives appear more atomic in the face of hard drive errors, and also,
in Pavel's case, as I recall he was using the card in a laptop where
the SD card protruded slightly from the laptop case, and it was very
easy for it to get dislodged, meaning that power failures during
writes were even more likely than you would expect with a fixed HDD or
SDD which is secured into place using screws or other more reliable
mounting hardware.
Put all of this together, given that Pavel's Really Trashy 32GB SD was
probably the full 3 orders of magnitude worse than traditional HDD,
and he was having many more failures due to physical mounting issues,
it's not surprising that most people haven't see problems with
traditional HDD's, even none of this is guaranteed by the hard drive
vendors.
> The people that truly and deeply care about this type of write atomicity
> (i.e. enterprises) deploy disk arrays that will do the right thing in
> face of an error. This involves NVRAM, mirrored caches, uninterruptible
> power supplies, etc. Brute force if you will.
Don't forget non-cheasy mounting options so an accidental brush
against the side of the unit doesn't cause the hard drive to become
disconnected from system and suffer a power drop. I guess that gets
filed under "Brute force" as well. :-)
- Ted
P.S. I feel obliged to point out that in my Lenovo X61s, the SD card
is flush with the laptop case when inserted, and I've never had a
problem with the SD card prematurely ejected during operaiton. :-)
next prev parent reply other threads:[~2009-01-05 20:19 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-03 12:38 document ext3 requirements Pavel Machek
2009-01-03 21:17 ` Martin MOKREJŠ
2009-01-03 22:06 ` Pavel Machek
2009-01-03 22:17 ` Duane Griffin
2009-01-03 22:29 ` Pavel Machek
2009-01-03 23:01 ` Martin MOKREJŠ
2009-01-03 23:38 ` Duane Griffin
2009-01-03 23:50 ` Martin MOKREJŠ
2009-01-03 23:58 ` Robert Hancock
2009-01-04 0:08 ` Martin MOKREJŠ
2009-01-04 21:49 ` Ingo Oeser
2009-01-04 0:00 ` Duane Griffin
2009-01-04 0:11 ` Martin MOKREJŠ
2009-01-04 0:41 ` Duane Griffin
2009-01-04 3:52 ` Valdis.Kletnieks
2009-01-04 14:24 ` Duane Griffin
2009-01-04 18:40 ` Theodore Tso
2009-01-04 19:21 ` Geert Uytterhoeven
2009-01-04 19:36 ` Theodore Tso
2009-01-04 19:51 ` Duane Griffin
2009-01-04 21:55 ` Theodore Tso
2009-01-04 22:06 ` Duane Griffin
2009-01-04 22:42 ` Bron Gondwana
2009-01-05 3:22 ` Rob Landley
2009-01-04 0:19 ` Pavel Machek
2009-01-05 2:55 ` Rob Landley
2009-01-04 19:56 ` Rob Landley
2009-01-05 19:16 ` Theodore Tso
2009-01-06 19:20 ` Rob Landley
2009-01-06 10:08 ` Matthias Andree
2009-01-06 15:23 ` Theodore Tso
2009-01-03 23:12 ` Duane Griffin
2009-01-06 10:06 ` Matthias Andree
2009-01-04 2:32 ` Theodore Tso
2009-01-04 22:33 ` Pavel Machek
2009-01-04 22:34 ` [patch] document ext3 a bit better Pavel Machek
2009-01-05 14:57 ` Theodore Tso
2009-01-06 9:21 ` Pavel Machek
2009-01-09 23:24 ` Jiri Kosina
2009-01-09 23:36 ` Randy Dunlap
2009-01-09 23:47 ` Jiri Kosina
2009-01-04 13:35 ` document ext3 requirements Alexander E. Patrakov
2009-01-04 13:53 ` Valdis.Kletnieks
2009-01-04 18:21 ` Michael Tokarev
2009-01-04 18:38 ` Theodore Tso
2009-01-04 22:37 ` Pavel Machek
2009-01-04 23:58 ` Theodore Tso
2009-01-05 11:43 ` Alan Cox
2009-01-07 11:59 ` Rob Landley
2009-01-04 20:10 ` Pavel Machek
2009-01-04 19:49 ` Rob Landley
2009-01-04 22:06 ` Theodore Tso
2009-01-04 22:25 ` Pavel Machek
2009-01-04 23:00 ` [patch] " Pavel Machek
2009-01-05 2:42 ` Rob Landley
2009-01-05 9:54 ` Pavel Machek
2009-01-04 23:07 ` Pavel Machek
2009-01-05 1:38 ` Rob Landley
2009-01-04 22:55 ` Pavel Machek
2009-01-05 0:16 ` david
2009-01-05 9:38 ` Pavel Machek
2009-01-05 1:50 ` Rob Landley
2009-01-05 3:20 ` Martin K. Petersen
2009-01-05 9:45 ` Pavel Machek
2009-01-05 11:28 ` Alan Cox
2009-01-05 19:15 ` Martin K. Petersen
2009-01-05 20:19 ` Theodore Tso [this message]
[not found] <fa.pmCH9X+XujDl6RH6/TpkNtsTnbk@ifi.uio.no>
[not found] ` <fa.b62zZFe5e154PhgA+0sdwVXD9F0@ifi.uio.no>
[not found] ` <fa.ZTpiSvxEhp3YJDepiUQs+cU0C98@ifi.uio.no>
[not found] ` <fa.xvvufQC6zTpU9R6vhDl51DR5V7A@ifi.uio.no>
[not found] ` <fa.pkV69eXC76Pb9fnmERdAwXX9OKA@ifi.uio.no>
[not found] ` <fa.hQTLXdIllf+hs4yQb092u6fowq0@ifi.uio.no>
2009-01-04 19:08 ` Sitsofe Wheeler
2009-01-04 19:31 ` Theodore Tso
2009-01-04 22:40 ` Pavel Machek
2009-01-04 23:30 ` Theodore Tso
2009-01-05 3:49 ` Rob Landley
2009-01-05 4:31 ` Robert Hancock
2009-01-05 5:00 ` david
2009-01-05 11:19 ` Alan Cox
2009-01-05 19:00 ` Rob Landley
2009-01-05 19:27 ` Martin K. Petersen
2009-01-06 10:41 ` Matthias Andree
2009-01-06 15:30 ` Theodore Tso
[not found] ` <20090106153020.GB13086__11022.1833143898$1231255950$gmane$org@mit.edu>
2009-01-06 15:40 ` Andi Kleen
2009-01-06 15:57 ` Theodore Tso
2009-01-06 17:31 ` Andi Kleen
2009-01-06 19:31 ` Rob Landley
2009-01-27 13:24 ` Thierry Vignaud
2009-01-27 13:37 ` Alan Cox
2009-01-06 10:36 ` Matthias Andree
[not found] <fa.P4z5CJpM0xT37PWJuOuCHDkO76o@ifi.uio.no>
[not found] ` <fa.eOwOqydZi0qs6K1nmNxBFGQMV40@ifi.uio.no>
[not found] ` <fa.26o5IHCAC3TQdXupl62CLYwQ+Wk@ifi.uio.no>
2009-01-04 23:13 ` Sitsofe Wheeler
2009-01-05 2:51 ` Rob Landley
2009-01-05 3:33 ` Martin K. Petersen
2009-01-05 4:02 ` david
2009-01-05 3:52 ` Rob Landley
[not found] ` <fa.GBkQuKdRj+YRVczlNLFhGvaw3WY@ifi.uio.no>
[not found] ` <fa.rCyCghh/+staAmYi/+gwYvefIS0@ifi.uio.no>
[not found] ` <fa.c5j7jAMUnJPvgI9Oj/VczSDNakE@ifi.uio.no>
[not found] ` <fa.377DMq2lPMyaHxadPnApFSJFoCg@ifi.uio.no>
2009-01-05 20:36 ` Sitsofe Wheeler
2009-01-05 23:09 ` Theodore Tso
[not found] ` <fa.ucJLoSQwk9OAj6T6x60tbWaiTAo@ifi.uio.no>
2009-01-05 22:25 ` Sitsofe Wheeler
2009-01-06 4:08 ` Rob Landley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090105201928.GD8939@mit.edu \
--to=tytso@mit.edu \
--cc=akpm@osdl.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mtk.manpages@gmail.com \
--cc=pavel@suse.cz \
--cc=rdunlap@xenotime.net \
--cc=rob@landley.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox