From: Ric Wheeler <rwheeler@redhat.com>
To: Theodore Tso <tytso@mit.edu>, Pavel Machek <pavel@ucw.cz>,
david@lang.hm, Florian Weimer <fweimer@bfk.de>,
Goswin von Brederlow <goswin-v-b@web.de>,
Rob Landley <rob@landley.net>,
kernel list <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@osdl.org>,
mtk.manpages@gmail.com, rdunlap@xenotime.net,
linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org,
corbet@lwn.net
Subject: Re: [patch] document flash/RAID dangers
Date: Wed, 26 Aug 2009 09:11:58 -0400 [thread overview]
Message-ID: <4A95349E.7010101@redhat.com> (raw)
In-Reply-To: <20090826124058.GK32712@mit.edu>
On 08/26/2009 08:40 AM, Theodore Tso wrote:
> On Wed, Aug 26, 2009 at 07:58:40AM -0400, Ric Wheeler wrote:
>>> Drive in raid 5 failed; hot spare was available (no idea about
>>> UPS). System apparently locked up trying to talk to the failed drive,
>>> or maybe admin just was not patient enough, so he just powercycled the
>>> array. He lost the array.
>>>
>>> So while most people will not agressively powercycle the RAID array,
>>> drive failure still provokes little tested error paths, and getting
>>> unclean shutdown is quite easy in such case.
>>
>> Then what we need to document is do not power cycle an array during a
>> rebuild, right?
>
> Well, the softwar raid layer could be improved so that it implements
> scrubbing by default (i.e., have the md package install a cron job to
> implement a periodict scrub pass automatically). The MD code could
> also regularly check to make sure the hot spare is OK; the other
> possibility is that hot spare, which hadn't been used in a long time,
> had silently failed.
Actually, MD does this scan already (not automatically, but you can set up a
simple cron job to kick off a periodic "check"). It is a delicate balance to get
the frequency of the scrubbing correct.
On one hand, you want to make sure that you detect errors in a timely fashion,
certainly detection of single sector errors before you might develop a second
sector level error on another drive.
On the other hand, running scans/scrubs continually impacts the performance of
your real workload and can potentially impact your components' life span by
subjecting them to a heavy workload.
Rule of thumb seems from my experience is that most people settle in with a scan
once a week or two (done at a throttled rate).
>
>> In the end, there are cascading failures that will defeat any data
>> protection scheme, but that does not mean that the value of that scheme
>> is zero. We need to be get more people to use RAID (including MD5) and
>> try to enhance it as we go. Just using a single disk is not a good
>> thing...
>
> Yep; the solution is to improve the storage devices. It is *not* to
> encourage people to think RAID is not worth it, or that somehow ext2
> is better than ext3 because it runs fsck's all the time at boot up.
> That's just crazy talk.
>
> - Ted
Agreed....
ric
next prev parent reply other threads:[~2009-08-26 13:11 UTC|newest]
Thread overview: 309+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-12 9:21 ext2/3: document conditions when reliable operation is possible Pavel Machek
2009-03-12 11:40 ` Jochen Voß
2009-03-21 11:24 ` Pavel Machek
2009-03-21 11:24 ` Pavel Machek
2009-03-12 19:13 ` Rob Landley
2009-03-16 12:28 ` Pavel Machek
2009-03-16 19:26 ` Rob Landley
2009-03-23 10:45 ` Pavel Machek
2009-03-30 15:06 ` Goswin von Brederlow
2009-08-24 9:26 ` Pavel Machek
2009-08-24 9:31 ` [patch] " Pavel Machek
2009-08-24 11:19 ` Florian Weimer
2009-08-24 13:01 ` Theodore Tso
2009-08-24 14:55 ` Artem Bityutskiy
2009-08-24 22:30 ` Rob Landley
2009-08-24 19:52 ` Pavel Machek
2009-08-24 19:52 ` Pavel Machek
2009-08-24 20:24 ` Ric Wheeler
2009-08-24 20:52 ` Pavel Machek
2009-08-24 21:08 ` Ric Wheeler
2009-08-24 21:25 ` Pavel Machek
2009-08-24 22:05 ` Ric Wheeler
2009-08-24 22:22 ` Zan Lynx
2009-08-24 22:44 ` Pavel Machek
2009-08-25 0:34 ` Ric Wheeler
2009-08-24 23:42 ` david
2009-08-24 22:41 ` Pavel Machek
2009-08-24 22:39 ` Theodore Tso
2009-08-24 23:00 ` Pavel Machek
2009-08-25 0:02 ` david
2009-08-25 9:32 ` Pavel Machek
2009-08-25 0:06 ` Ric Wheeler
2009-08-25 9:34 ` Pavel Machek
2009-08-25 15:34 ` david
2009-08-26 3:32 ` Rik van Riel
2009-08-26 11:17 ` Pavel Machek
2009-08-26 11:29 ` david
2009-08-26 13:10 ` Pavel Machek
2009-08-26 13:43 ` david
2009-08-26 18:02 ` Theodore Tso
2009-08-27 6:28 ` Eric Sandeen
2009-08-27 6:28 ` Eric Sandeen
2009-11-09 8:53 ` periodic fsck was " Pavel Machek
2009-11-09 14:05 ` Theodore Tso
2009-11-09 15:58 ` Andreas Dilger
2009-11-09 8:53 ` Pavel Machek
2009-08-30 7:03 ` Pavel Machek
2009-08-26 12:28 ` Theodore Tso
2009-08-27 6:06 ` Rob Landley
2009-08-27 6:54 ` david
2009-08-27 7:34 ` Rob Landley
2009-08-28 14:37 ` david
2009-08-30 7:19 ` Pavel Machek
2009-08-30 12:48 ` david
2009-08-27 5:27 ` Rob Landley
2009-08-25 0:08 ` Theodore Tso
2009-08-25 9:42 ` Pavel Machek
2009-08-25 9:42 ` Pavel Machek
2009-08-25 13:37 ` Ric Wheeler
2009-08-25 13:42 ` Alan Cox
2009-08-27 3:16 ` Rob Landley
2009-08-25 21:15 ` Pavel Machek
2009-08-25 22:42 ` Ric Wheeler
2009-08-25 22:51 ` Pavel Machek
2009-08-25 23:03 ` david
2009-08-25 23:29 ` Pavel Machek
2009-08-25 23:03 ` Ric Wheeler
2009-08-25 23:26 ` Pavel Machek
2009-08-25 23:40 ` Ric Wheeler
2009-08-25 23:48 ` david
2009-08-25 23:53 ` Pavel Machek
2009-08-26 0:11 ` Ric Wheeler
2009-08-26 0:16 ` Pavel Machek
2009-08-26 0:31 ` Ric Wheeler
2009-08-26 1:00 ` Theodore Tso
2009-08-26 1:15 ` Ric Wheeler
2009-08-26 2:58 ` Theodore Tso
2009-08-26 10:39 ` Ric Wheeler
2009-08-26 10:39 ` Ric Wheeler
2009-08-26 11:12 ` Pavel Machek
2009-08-26 11:28 ` david
2009-08-29 9:49 ` [testcase] test your fs/storage stack (was Re: [patch] ext2/3: document conditions when reliable operation is possible) Pavel Machek
2009-08-29 11:28 ` Ric Wheeler
2009-09-02 20:12 ` Pavel Machek
2009-09-02 20:42 ` Ric Wheeler
2009-09-02 23:00 ` Rob Landley
2009-09-02 23:09 ` david
2009-09-03 8:55 ` Pavel Machek
2009-09-03 0:36 ` jim owens
2009-09-03 2:41 ` Rob Landley
2009-09-03 14:14 ` jim owens
2009-09-04 7:44 ` Rob Landley
2009-09-04 11:49 ` Ric Wheeler
2009-09-05 10:28 ` Pavel Machek
2009-09-05 12:20 ` Ric Wheeler
2009-09-05 13:54 ` Jonathan Corbet
2009-09-05 21:27 ` Pavel Machek
2009-09-05 21:56 ` Theodore Tso
2009-09-02 22:45 ` Rob Landley
2009-09-02 22:49 ` [PATCH] Update Documentation/md.txt to mention journaling won't help dirty+degraded case Rob Landley
2009-09-03 9:08 ` Pavel Machek
2009-09-03 12:05 ` Ric Wheeler
2009-09-03 12:31 ` Pavel Machek
2009-08-29 16:35 ` [testcase] test your fs/storage stack (was Re: [patch] ext2/3: document conditions when reliable operation is possible) david
2009-08-29 16:35 ` david
2009-08-30 7:07 ` Pavel Machek
2009-08-26 12:01 ` [patch] ext2/3: document conditions when reliable operation is possible Ric Wheeler
2009-08-26 12:23 ` Theodore Tso
2009-08-30 7:01 ` Pavel Machek
2009-08-30 7:01 ` Pavel Machek
2009-08-27 5:19 ` Rob Landley
2009-08-27 12:24 ` Theodore Tso
2009-08-27 13:10 ` Ric Wheeler
2009-08-27 16:54 ` MD/DM and barriers (was Re: [patch] ext2/3: document conditions when reliable operation is possible) Jeff Garzik
2009-08-27 18:09 ` Alasdair G Kergon
2009-09-01 14:01 ` Pavel Machek
2009-09-02 16:17 ` Michael Tokarev
2009-08-27 13:10 ` [patch] ext2/3: document conditions when reliable operation is possible Ric Wheeler
2009-08-29 10:02 ` Pavel Machek
2009-08-29 10:02 ` Pavel Machek
2009-08-26 1:15 ` Ric Wheeler
2009-08-26 1:16 ` Pavel Machek
2009-08-26 1:16 ` Pavel Machek
2009-08-26 2:55 ` Theodore Tso
2009-08-26 13:37 ` Ric Wheeler
2009-08-26 13:37 ` Ric Wheeler
2009-08-26 2:53 ` Henrique de Moraes Holschuh
2009-08-26 2:53 ` Henrique de Moraes Holschuh
2009-09-03 9:47 ` Pavel Machek
2009-09-03 9:47 ` Pavel Machek
2009-08-26 3:50 ` Rik van Riel
2009-08-27 3:53 ` Rob Landley
2009-08-27 11:43 ` Ric Wheeler
2009-08-27 20:51 ` Rob Landley
2009-08-27 22:00 ` Ric Wheeler
2009-08-28 14:49 ` david
2009-08-29 10:05 ` Pavel Machek
2009-08-29 20:22 ` Rob Landley
2009-08-29 21:34 ` Pavel Machek
2009-09-03 16:56 ` what fsck can (and can't) do was " david
2009-09-03 19:27 ` Theodore Tso
2009-08-27 22:13 ` raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible) Pavel Machek
2009-08-28 1:32 ` Ric Wheeler
2009-08-28 6:44 ` Pavel Machek
2009-08-28 7:31 ` NeilBrown
2009-08-28 7:31 ` NeilBrown
2009-11-09 10:50 ` Pavel Machek
2009-08-28 11:16 ` Ric Wheeler
2009-09-01 13:58 ` Pavel Machek
2009-08-28 12:08 ` Theodore Tso
2009-08-30 7:51 ` Pavel Machek
2009-08-30 7:51 ` Pavel Machek
2009-08-30 9:01 ` Christian Kujau
2009-09-02 20:55 ` Pavel Machek
2009-08-30 12:55 ` david
2009-08-30 14:12 ` Ric Wheeler
2009-08-30 14:44 ` Michael Tokarev
2009-08-30 16:10 ` Ric Wheeler
2009-08-30 16:35 ` Christoph Hellwig
2009-08-31 13:15 ` Ric Wheeler
2009-08-31 13:16 ` Christoph Hellwig
2009-08-31 13:19 ` Mark Lord
2009-08-31 13:21 ` Christoph Hellwig
2009-08-31 15:14 ` jim owens
2009-09-03 1:59 ` Ric Wheeler
2009-09-03 11:12 ` Krzysztof Halasa
2009-09-03 11:18 ` Ric Wheeler
2009-09-03 13:34 ` Krzysztof Halasa
2009-09-03 13:50 ` Ric Wheeler
2009-09-03 13:59 ` Krzysztof Halasa
2009-09-03 14:15 ` wishful thinking about atomic, multi-sector or full MD stripe width, writes in storage Ric Wheeler
2009-09-03 14:26 ` Florian Weimer
2009-09-03 15:09 ` Ric Wheeler
2009-09-03 23:50 ` Krzysztof Halasa
2009-09-04 0:39 ` Ric Wheeler
2009-09-04 21:21 ` Mark Lord
2009-09-04 21:29 ` Ric Wheeler
2009-09-05 12:57 ` Mark Lord
2009-09-05 13:40 ` Ric Wheeler
2009-09-05 21:43 ` NeilBrown
2009-09-07 11:45 ` Pavel Machek
2009-09-07 13:10 ` Theodore Tso
2010-04-04 13:47 ` fsck more often when powerfail is detected (was Re: wishful thinking about atomic, multi-sector or full MD stripe width, writes in storage) Pavel Machek
2010-04-04 13:47 ` Pavel Machek
2010-04-04 17:39 ` tytso
2010-04-04 17:59 ` Rob Landley
2010-04-04 18:45 ` Pavel Machek
2010-04-04 19:35 ` tytso
2010-04-04 19:29 ` tytso
2010-04-04 23:58 ` Rob Landley
2009-09-03 14:35 ` raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible) david
2009-08-31 13:22 ` Ric Wheeler
2009-08-31 15:50 ` david
2009-08-31 16:21 ` Ric Wheeler
2009-08-31 18:31 ` Christoph Hellwig
2009-08-31 19:11 ` david
2009-08-30 15:05 ` Pavel Machek
2009-08-30 15:20 ` Theodore Tso
2009-08-31 17:49 ` Jesse Brandeburg
2009-08-31 18:01 ` Ric Wheeler
2009-08-31 21:01 ` MD5/6? (was Re: raid is dangerous but that's secret ...) Ron Johnson
2009-08-31 18:07 ` raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible) martin f krafft
2009-08-31 22:26 ` Jesse Brandeburg
2009-08-31 22:26 ` Jesse Brandeburg
2009-08-31 23:19 ` Ron Johnson
2009-09-01 5:45 ` martin f krafft
2009-08-31 17:49 ` Jesse Brandeburg
2009-09-05 10:34 ` Pavel Machek
2009-09-05 10:34 ` Pavel Machek
2009-08-28 7:11 ` raid is dangerous but that's secret Florian Weimer
2009-08-28 7:23 ` NeilBrown
2009-08-25 23:46 ` [patch] ext2/3: document conditions when reliable operation is possible david
2009-08-25 23:08 ` Neil Brown
2009-08-25 23:44 ` Pavel Machek
2009-08-26 4:08 ` Rik van Riel
2009-08-26 11:15 ` Pavel Machek
2009-08-27 3:29 ` Rik van Riel
2009-08-25 16:11 ` Theodore Tso
2009-08-25 22:21 ` [patch] document flash/RAID dangers Pavel Machek
2009-08-25 22:21 ` Pavel Machek
2009-08-25 22:33 ` david
2009-08-25 22:40 ` Pavel Machek
2009-08-25 22:59 ` david
2009-08-25 23:37 ` Pavel Machek
2009-08-25 23:48 ` Ric Wheeler
2009-08-26 0:06 ` Pavel Machek
2009-08-26 0:12 ` Ric Wheeler
2009-08-26 0:20 ` Pavel Machek
2009-08-26 0:26 ` david
2009-08-26 0:28 ` Ric Wheeler
2009-08-26 0:38 ` Pavel Machek
2009-08-26 0:45 ` Ric Wheeler
2009-08-26 11:21 ` Pavel Machek
2009-08-26 11:58 ` Ric Wheeler
2009-08-26 12:40 ` Theodore Tso
2009-08-26 13:11 ` Ric Wheeler
2009-08-26 13:11 ` Ric Wheeler [this message]
2009-08-26 13:44 ` david
2009-08-26 13:40 ` Chris Adams
2009-08-26 13:47 ` Alan Cox
2009-08-26 14:11 ` Chris Adams
2009-08-27 21:50 ` Pavel Machek
2009-08-29 9:38 ` Pavel Machek
2009-08-26 4:24 ` Rik van Riel
2009-08-26 11:22 ` Pavel Machek
2009-08-26 14:45 ` Rik van Riel
2009-08-29 9:39 ` Pavel Machek
2009-08-29 11:47 ` Ron Johnson
2009-08-29 16:12 ` jim owens
2009-08-25 23:56 ` david
2009-08-26 0:12 ` Pavel Machek
2009-08-26 0:20 ` david
2009-08-26 0:39 ` Pavel Machek
2009-08-26 1:17 ` david
2009-08-26 0:26 ` Ric Wheeler
2009-08-26 0:44 ` Pavel Machek
2009-08-26 0:50 ` Ric Wheeler
2009-08-26 1:19 ` david
2009-08-26 11:25 ` Pavel Machek
2009-08-26 12:37 ` Theodore Tso
2009-08-30 6:49 ` Pavel Machek
2009-08-30 6:49 ` Pavel Machek
2009-08-26 4:20 ` Rik van Riel
2009-08-25 22:27 ` [patch] document that ext2 can't handle barriers Pavel Machek
2009-08-25 22:27 ` Pavel Machek
2009-08-27 3:34 ` [patch] ext2/3: document conditions when reliable operation is possible Rob Landley
2009-08-27 8:46 ` David Woodhouse
2009-08-28 14:46 ` david
2009-08-29 10:09 ` Pavel Machek
2009-08-29 16:27 ` david
2009-08-29 21:33 ` Pavel Machek
2009-08-24 23:00 ` Pavel Machek
2009-08-25 13:57 ` Chris Adams
2009-08-25 22:58 ` Neil Brown
2009-08-25 23:10 ` Ric Wheeler
2009-08-25 23:32 ` NeilBrown
2009-08-25 23:32 ` NeilBrown
2009-08-24 21:11 ` Greg Freemyer
2009-08-24 21:11 ` Greg Freemyer
2009-08-25 20:56 ` Rob Landley
2009-08-25 21:08 ` david
2009-08-25 18:52 ` Rob Landley
2009-08-25 14:43 ` Florian Weimer
2009-08-24 13:50 ` Theodore Tso
2009-08-24 18:48 ` Pavel Machek
2009-08-24 18:48 ` Pavel Machek
2009-08-24 18:39 ` Pavel Machek
2009-08-24 13:21 ` Greg Freemyer
2009-08-24 13:21 ` Greg Freemyer
2009-08-24 18:44 ` Pavel Machek
2009-08-25 23:28 ` Neil Brown
2009-08-25 23:28 ` Neil Brown
2009-08-26 1:34 ` david
2009-08-24 21:11 ` Rob Landley
2009-08-24 21:33 ` Pavel Machek
2009-08-25 18:45 ` Jan Kara
2009-03-16 12:30 ` Pavel Machek
2009-03-16 19:03 ` Theodore Tso
2009-03-23 18:23 ` Pavel Machek
2009-03-23 18:23 ` Pavel Machek
2009-03-16 19:40 ` Sitsofe Wheeler
2009-03-16 21:43 ` Rob Landley
2009-03-17 4:55 ` Kyle Moffett
2009-03-23 11:00 ` Pavel Machek
2009-08-29 1:33 ` Robert Hancock
2009-08-29 13:04 ` Alan Cox
2009-03-16 19:45 ` Greg Freemyer
2009-03-16 19:45 ` Greg Freemyer
2009-03-16 21:48 ` Pavel Machek
[not found] <ciPTu-7f2-47@gated-at.bofh.it>
[not found] ` <clrhX-3R1-13@gated-at.bofh.it>
[not found] ` <dcEc1-33Z-17@gated-at.bofh.it>
[not found] ` <dcFUz-4yN-9@gated-at.bofh.it>
[not found] ` <dcHtf-5XC-17@gated-at.bofh.it>
[not found] ` <dcNSc-2DU-11@gated-at.bofh.it>
[not found] ` <dcOl9-3aC-19@gated-at.bofh.it>
[not found] ` <dcOOd-3qM-33@gated-at.bofh.it>
[not found] ` <dcOXN-3M5-15@gated-at.bofh.it>
[not found] <dddMw-7Xt-57@gated-at.bofh.it>
[not found] ` <dddWc-83W-51@gated-at.bofh.it>
[not found] ` <ddefx-5R-37@gated-at.bofh.it>
[not found] ` <ddep7-bI-7@gated-at.bofh.it>
[not found] ` <ddeyR-iy-11@gated-at.bofh.it>
[not found] ` <ddeyR-iy-9@gated-at.bofh.it>
[not found] ` <ddeIu-n9-9@gated-at.bofh.it>
[not found] ` <ddeSb-th-29@gated-at.bofh.it>
[not found] ` <ddoRI-11a-39@gated-at.bofh.it>
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A95349E.7010101@redhat.com \
--to=rwheeler@redhat.com \
--cc=akpm@osdl.org \
--cc=corbet@lwn.net \
--cc=david@lang.hm \
--cc=fweimer@bfk.de \
--cc=goswin-v-b@web.de \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=pavel@ucw.cz \
--cc=rdunlap@xenotime.net \
--cc=rob@landley.net \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.