From: Theodore Tso <tytso@mit.edu>
To: Rob Landley <rob@landley.net>
Cc: Ric Wheeler <rwheeler@redhat.com>, Pavel Machek <pavel@ucw.cz>,
Florian Weimer <fweimer@bfk.de>,
Goswin von Brederlow <goswin-v-b@web.de>,
kernel list <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@osdl.org>,
mtk.manpages@gmail.com, rdunlap@xenotime.net,
linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org,
corbet@lwn.net
Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible
Date: Thu, 27 Aug 2009 08:24:23 -0400 [thread overview]
Message-ID: <20090827122423.GO6997@mit.edu> (raw)
In-Reply-To: <200908270019.04231.rob@landley.net>
On Thu, Aug 27, 2009 at 12:19:02AM -0500, Rob Landley wrote:
> > To me, this isn't a particularly interesting or newsworthy point,
> > since a competent system administrator
>
> I'm a bit concerned by the argument that we don't need to document
> serious pitfalls because every Linux system has a sufficiently
> competent administrator they already know stuff that didn't even
> come up until the second or third day it was discussed on lkml.
I'm not convinced that information which needs to be known by System
Administrators is best documented in the kernel Documentation
directory. Should there be a HOWTO document on stuff like that?
Sure, if someone wants to put something like that together, having
free documentation about ways to set up your storage stack in a sane
way is not a bad thing.
It should be noted that these sorts of issues are discussed in various
books targetted at System Administrators, and in Usenix's System
Administration tutorials. The computer industry is highly
specialized, and so just because an OS kernel hacker might not be
familiar with these issues, doesn't mean that professionals whose job
it is to run data centers don't know about these things! Similarly,
you could be a whiz at Linux's networking stack, but you might not
know about certain pitfalls in configuring a Cisco router using IOS;
does that mean we should have an IOS tutorial in the kernel
documentation directory? I'm not so sure about that!
> "You're documenting it wrong" != "you shouldn't document it".
Sure, but the fact that we don't currently say much about storage
stacks doesn't mean we should accept a patch that might actively
mislead people. I'm NACK'ing the patch on that basis.
> > who cares about his data and/or
> > his hardware will (a) have a UPS,
>
> I worked at a company that retested their UPSes a year after
> installing them and found that _none_ of them supplied more than 15
> seconds charge, and when they dismantled them the batteries had
> physically bloated inside their little plastic cases. (Same company
> as the dead air conditioner, possibly overheating was involved but
> the little _lights_ said everything was ok.)
>
> That was by no means the first UPS I'd seen die, the suckers have a
> higher failure rate than hard drives in my experience. This is a
> device where the batteries get constantly charged and almost never
> tested because if it _does_ fail you just rebooted your production
> server, so a lot of smaller companies think they have one but
> actually don't.
Sounds like they were using really cheap UPS's; certainly not the kind
I would expect to find in a data center. And if company's system
administrator is using the cheapest possible consumer-grade UPS's,
then yes, they might have a problem. Even an educational institution
like MIT, where I was an network administrator some 15 years ago, had
proper UPS's, *and* we had a diesel generator which kicked in after 15
seconds --- and we tested the diesel generator every Friday morning,
to make sure it worked properly.
> > , and (b) be running with a hot spare
> > and/or will imediately replace a failed drive in a RAID array.
>
> Here's hoping they shut the system down properly to install the new
> drive in the raid then, eh? Not accidentally pull the plug before
> it's finished running the ~7 minutes of shutdown scripts in the last
> Red Hat Enterprise I messed with...
Even my home RAID array uses hot-plug SATA disks, so I can replace a
failed disk without shutting down my system. (And yes, I have a
backup battery for the hardware RAID, and the firmware runs periodic
tests on it; the hardware RAID card also will send me e-mail if a RAID
array drive fails and it needs to use my hot-spare. At that point, I
order a new hard drive, secure in the knowledge that the system can
still suffer another drive failure before falling into degraded mode.
And no, this isn't some expensive enterprise RAID setup; this is just
a mid-range Areca RAID card.)
> If "degraded array" just means "don't have a replacement disk yet",
> then it sounds like what Pavel wants to document is "don't write to
> a degraded array at all, because power failures can cost you data
> due to write granularity being larger than filesystem block size".
> (Which still comes as news to some of us, and you need a way to
> remount mount the degraded array read only until the sysadmin can
> fix it.)
If you want to document that as a property of RAID arrays, sure. But
it's not something that should live in Documentation/filesystems/ext2.txt
and Documentation/filesystems/ext3.txt. The MD RAID howto might be a
better place, since it's far more likely more users will read it. How
many system administrators read what's in the kernel's Documentation
directory, after all, and this is basic information about how RAID
works; it's not necessarily something that someone would *expect* to
be in kernel documentation, nor would necessarily go looking for it
there. And the reality is that it's not like most people go reading
Documentation/* for pleasure. :-)
BTW, the RAID write atomicity issue and the possibility of failures
cause data loss *is* documented in the Wikipedia article on RAID.
It's not as written as direct practical advice to a system
administrator (you'd have to go to a book that is really targetted at
system administrators to find that sort of thing).
- Ted
next prev parent reply other threads:[~2009-08-27 12:25 UTC|newest]
Thread overview: 272+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-12 9:21 ext2/3: document conditions when reliable operation is possible Pavel Machek
2009-03-12 11:40 ` Jochen Voß
2009-03-21 11:24 ` Pavel Machek
2009-03-12 19:13 ` Rob Landley
2009-03-16 12:28 ` Pavel Machek
2009-03-16 19:26 ` Rob Landley
2009-03-23 10:45 ` Pavel Machek
2009-03-30 15:06 ` Goswin von Brederlow
2009-08-24 9:26 ` Pavel Machek
2009-08-24 9:31 ` [patch] " Pavel Machek
2009-08-24 11:19 ` Florian Weimer
2009-08-24 13:01 ` Theodore Tso
2009-08-24 14:55 ` Artem Bityutskiy
2009-08-24 22:30 ` Rob Landley
2009-08-24 19:52 ` Pavel Machek
2009-08-24 20:24 ` Ric Wheeler
2009-08-24 20:52 ` Pavel Machek
2009-08-24 21:08 ` Ric Wheeler
2009-08-24 21:25 ` Pavel Machek
2009-08-24 22:05 ` Ric Wheeler
2009-08-24 22:22 ` Zan Lynx
2009-08-24 22:44 ` Pavel Machek
2009-08-25 0:34 ` Ric Wheeler
2009-08-24 23:42 ` david
2009-08-24 22:41 ` Pavel Machek
2009-08-24 22:39 ` Theodore Tso
2009-08-24 23:00 ` Pavel Machek
2009-08-25 0:02 ` david
2009-08-25 9:32 ` Pavel Machek
2009-08-25 0:06 ` Ric Wheeler
2009-08-25 9:34 ` Pavel Machek
2009-08-25 15:34 ` david
2009-08-26 3:32 ` Rik van Riel
2009-08-26 11:17 ` Pavel Machek
2009-08-26 11:29 ` david
2009-08-26 13:10 ` Pavel Machek
2009-08-26 13:43 ` david
2009-08-26 18:02 ` Theodore Tso
2009-08-27 6:28 ` Eric Sandeen
2009-11-09 8:53 ` periodic fsck was " Pavel Machek
2009-11-09 14:05 ` Theodore Tso
2009-11-09 15:58 ` Andreas Dilger
2009-08-30 7:03 ` Pavel Machek
2009-08-26 12:28 ` Theodore Tso
2009-08-27 6:06 ` Rob Landley
2009-08-27 6:54 ` david
2009-08-27 7:34 ` Rob Landley
2009-08-28 14:37 ` david
2009-08-30 7:19 ` Pavel Machek
2009-08-30 12:48 ` david
2009-08-27 5:27 ` Rob Landley
2009-08-25 0:08 ` Theodore Tso
2009-08-25 9:42 ` Pavel Machek
2009-08-25 13:37 ` Ric Wheeler
2009-08-25 13:42 ` Alan Cox
2009-08-27 3:16 ` Rob Landley
2009-08-25 21:15 ` Pavel Machek
2009-08-25 22:42 ` Ric Wheeler
2009-08-25 22:51 ` Pavel Machek
2009-08-25 23:03 ` david
2009-08-25 23:29 ` Pavel Machek
2009-08-25 23:03 ` Ric Wheeler
2009-08-25 23:26 ` Pavel Machek
2009-08-25 23:40 ` Ric Wheeler
2009-08-25 23:48 ` david
2009-08-25 23:53 ` Pavel Machek
2009-08-26 0:11 ` Ric Wheeler
2009-08-26 0:16 ` Pavel Machek
2009-08-26 0:31 ` Ric Wheeler
2009-08-26 1:00 ` Theodore Tso
2009-08-26 1:15 ` Ric Wheeler
2009-08-26 2:58 ` Theodore Tso
2009-08-26 10:39 ` Ric Wheeler
2009-08-26 11:12 ` Pavel Machek
2009-08-26 11:28 ` david
2009-08-29 9:49 ` [testcase] test your fs/storage stack (was Re: [patch] ext2/3: document conditions when reliable operation is possible) Pavel Machek
2009-08-29 11:28 ` Ric Wheeler
2009-09-02 20:12 ` Pavel Machek
2009-09-02 20:42 ` Ric Wheeler
2009-09-02 23:00 ` Rob Landley
2009-09-02 23:09 ` david
2009-09-03 8:55 ` Pavel Machek
2009-09-03 0:36 ` jim owens
2009-09-03 2:41 ` Rob Landley
2009-09-03 14:14 ` jim owens
2009-09-04 7:44 ` Rob Landley
2009-09-04 11:49 ` Ric Wheeler
2009-09-05 10:28 ` Pavel Machek
2009-09-05 12:20 ` Ric Wheeler
2009-09-05 13:54 ` Jonathan Corbet
2009-09-05 21:27 ` Pavel Machek
2009-09-05 21:56 ` Theodore Tso
2009-09-02 22:45 ` Rob Landley
2009-09-02 22:49 ` [PATCH] Update Documentation/md.txt to mention journaling won't help dirty+degraded case Rob Landley
2009-09-03 9:08 ` Pavel Machek
2009-09-03 12:05 ` Ric Wheeler
2009-09-03 12:31 ` Pavel Machek
2009-08-29 16:35 ` [testcase] test your fs/storage stack (was Re: [patch] ext2/3: document conditions when reliable operation is possible) david
2009-08-30 7:07 ` Pavel Machek
2009-08-26 12:01 ` [patch] ext2/3: document conditions when reliable operation is possible Ric Wheeler
2009-08-26 12:23 ` Theodore Tso
2009-08-30 7:01 ` Pavel Machek
2009-08-27 5:19 ` Rob Landley
2009-08-27 12:24 ` Theodore Tso [this message]
2009-08-27 13:10 ` Ric Wheeler
2009-08-27 16:54 ` MD/DM and barriers (was Re: [patch] ext2/3: document conditions when reliable operation is possible) Jeff Garzik
2009-08-27 18:09 ` Alasdair G Kergon
2009-09-01 14:01 ` Pavel Machek
2009-09-02 16:17 ` Michael Tokarev
2009-08-29 10:02 ` [patch] ext2/3: document conditions when reliable operation is possible Pavel Machek
2009-08-26 1:16 ` Pavel Machek
2009-08-26 2:55 ` Theodore Tso
2009-08-26 13:37 ` Ric Wheeler
2009-08-26 2:53 ` Henrique de Moraes Holschuh
2009-09-03 9:47 ` Pavel Machek
2009-08-26 3:50 ` Rik van Riel
2009-08-27 3:53 ` Rob Landley
2009-08-27 11:43 ` Ric Wheeler
2009-08-27 20:51 ` Rob Landley
2009-08-27 22:00 ` Ric Wheeler
2009-08-28 14:49 ` david
2009-08-29 10:05 ` Pavel Machek
2009-08-29 20:22 ` Rob Landley
2009-08-29 21:34 ` Pavel Machek
2009-09-03 16:56 ` what fsck can (and can't) do was " david
2009-09-03 19:27 ` Theodore Tso
2009-08-27 22:13 ` raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible) Pavel Machek
2009-08-28 1:32 ` Ric Wheeler
2009-08-28 6:44 ` Pavel Machek
2009-08-28 7:31 ` NeilBrown
2009-11-09 10:50 ` Pavel Machek
2009-08-28 11:16 ` Ric Wheeler
2009-09-01 13:58 ` Pavel Machek
2009-08-28 12:08 ` Theodore Tso
2009-08-30 7:51 ` Pavel Machek
2009-08-30 9:01 ` Christian Kujau
2009-09-02 20:55 ` Pavel Machek
2009-08-30 12:55 ` david
2009-08-30 14:12 ` Ric Wheeler
2009-08-30 14:44 ` Michael Tokarev
2009-08-30 16:10 ` Ric Wheeler
2009-08-30 16:35 ` Christoph Hellwig
2009-08-31 13:15 ` Ric Wheeler
2009-08-31 13:16 ` Christoph Hellwig
2009-08-31 13:19 ` Mark Lord
2009-08-31 13:21 ` Christoph Hellwig
2009-08-31 15:14 ` jim owens
2009-09-03 1:59 ` Ric Wheeler
2009-09-03 11:12 ` Krzysztof Halasa
2009-09-03 11:18 ` Ric Wheeler
2009-09-03 13:34 ` Krzysztof Halasa
2009-09-03 13:50 ` Ric Wheeler
2009-09-03 13:59 ` Krzysztof Halasa
2009-09-03 14:15 ` wishful thinking about atomic, multi-sector or full MD stripe width, writes in storage Ric Wheeler
2009-09-03 14:26 ` Florian Weimer
2009-09-03 15:09 ` Ric Wheeler
2009-09-03 23:50 ` Krzysztof Halasa
2009-09-04 0:39 ` Ric Wheeler
2009-09-04 21:21 ` Mark Lord
2009-09-04 21:29 ` Ric Wheeler
2009-09-05 12:57 ` Mark Lord
2009-09-05 13:40 ` Ric Wheeler
2009-09-05 21:43 ` NeilBrown
2009-09-07 11:45 ` Pavel Machek
2009-09-07 13:10 ` Theodore Tso
2010-04-04 13:47 ` fsck more often when powerfail is detected (was Re: wishful thinking about atomic, multi-sector or full MD stripe width, writes in storage) Pavel Machek
2010-04-04 17:39 ` tytso
2010-04-04 17:59 ` Rob Landley
2010-04-04 18:45 ` Pavel Machek
2010-04-04 19:35 ` tytso
2010-04-04 19:29 ` tytso
2010-04-04 23:58 ` Rob Landley
2009-09-03 14:35 ` raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible) david
2009-08-31 13:22 ` Ric Wheeler
2009-08-31 15:50 ` david
2009-08-31 16:21 ` Ric Wheeler
2009-08-31 18:31 ` Christoph Hellwig
2009-08-31 19:11 ` david
2009-08-30 15:05 ` Pavel Machek
2009-08-30 15:20 ` Theodore Tso
2009-08-31 17:49 ` Jesse Brandeburg
2009-08-31 18:01 ` Ric Wheeler
2009-08-31 18:07 ` martin f krafft
2009-08-31 22:26 ` Jesse Brandeburg
2009-09-01 5:45 ` martin f krafft
2009-09-05 10:34 ` Pavel Machek
2009-08-28 7:11 ` raid is dangerous but that's secret Florian Weimer
2009-08-28 7:23 ` NeilBrown
2009-08-25 23:46 ` [patch] ext2/3: document conditions when reliable operation is possible david
2009-08-25 23:08 ` Neil Brown
2009-08-25 23:44 ` Pavel Machek
2009-08-26 4:08 ` Rik van Riel
2009-08-26 11:15 ` Pavel Machek
2009-08-27 3:29 ` Rik van Riel
2009-08-25 16:11 ` Theodore Tso
2009-08-25 22:21 ` [patch] document flash/RAID dangers Pavel Machek
2009-08-25 22:33 ` david
2009-08-25 22:40 ` Pavel Machek
2009-08-25 22:59 ` david
2009-08-25 23:37 ` Pavel Machek
2009-08-25 23:48 ` Ric Wheeler
2009-08-26 0:06 ` Pavel Machek
2009-08-26 0:12 ` Ric Wheeler
2009-08-26 0:20 ` Pavel Machek
2009-08-26 0:26 ` david
2009-08-26 0:28 ` Ric Wheeler
2009-08-26 0:38 ` Pavel Machek
2009-08-26 0:45 ` Ric Wheeler
2009-08-26 11:21 ` Pavel Machek
2009-08-26 11:58 ` Ric Wheeler
2009-08-26 12:40 ` Theodore Tso
2009-08-26 13:11 ` Ric Wheeler
2009-08-26 13:44 ` david
2009-08-26 13:40 ` Chris Adams
2009-08-26 13:47 ` Alan Cox
2009-08-26 14:11 ` Chris Adams
2009-08-27 21:50 ` Pavel Machek
2009-08-29 9:38 ` Pavel Machek
2009-08-26 4:24 ` Rik van Riel
2009-08-26 11:22 ` Pavel Machek
2009-08-26 14:45 ` Rik van Riel
2009-08-29 9:39 ` Pavel Machek
2009-08-25 23:56 ` david
2009-08-26 0:12 ` Pavel Machek
2009-08-26 0:20 ` david
2009-08-26 0:39 ` Pavel Machek
2009-08-26 1:17 ` david
2009-08-26 0:26 ` Ric Wheeler
2009-08-26 0:44 ` Pavel Machek
2009-08-26 0:50 ` Ric Wheeler
2009-08-26 1:19 ` david
2009-08-26 11:25 ` Pavel Machek
2009-08-26 12:37 ` Theodore Tso
2009-08-30 6:49 ` Pavel Machek
2009-08-26 4:20 ` Rik van Riel
2009-08-25 22:27 ` [patch] document that ext2 can't handle barriers Pavel Machek
2009-08-27 3:34 ` [patch] ext2/3: document conditions when reliable operation is possible Rob Landley
2009-08-27 8:46 ` David Woodhouse
2009-08-28 14:46 ` david
2009-08-29 10:09 ` Pavel Machek
2009-08-29 16:27 ` david
2009-08-29 21:33 ` Pavel Machek
2009-08-25 13:57 ` Chris Adams
2009-08-25 22:58 ` Neil Brown
2009-08-25 23:10 ` Ric Wheeler
2009-08-25 23:32 ` NeilBrown
2009-08-24 21:11 ` Greg Freemyer
2009-08-25 20:56 ` Rob Landley
2009-08-25 21:08 ` david
2009-08-25 18:52 ` Rob Landley
2009-08-25 14:43 ` Florian Weimer
2009-08-24 13:50 ` Theodore Tso
2009-08-24 18:48 ` Pavel Machek
2009-08-24 18:39 ` Pavel Machek
2009-08-24 13:21 ` Greg Freemyer
2009-08-24 18:44 ` Pavel Machek
2009-08-25 23:28 ` Neil Brown
2009-08-26 1:34 ` david
2009-08-24 21:11 ` Rob Landley
2009-08-24 21:33 ` Pavel Machek
2009-08-25 18:45 ` Jan Kara
2009-03-16 12:30 ` Pavel Machek
2009-03-16 19:03 ` Theodore Tso
2009-03-23 18:23 ` Pavel Machek
2009-03-16 19:40 ` Sitsofe Wheeler
2009-03-16 21:43 ` Rob Landley
2009-03-17 4:55 ` Kyle Moffett
2009-03-23 11:00 ` Pavel Machek
2009-08-29 1:33 ` Robert Hancock
2009-08-29 13:04 ` Alan Cox
2009-03-16 19:45 ` Greg Freemyer
2009-03-16 21:48 ` Pavel Machek
[not found] <dddMw-7Xt-57@gated-at.bofh.it>
[not found] ` <dddWc-83W-51@gated-at.bofh.it>
[not found] ` <ddefx-5R-37@gated-at.bofh.it>
[not found] ` <ddep7-bI-7@gated-at.bofh.it>
[not found] ` <ddeyR-iy-11@gated-at.bofh.it>
[not found] ` <ddeyR-iy-9@gated-at.bofh.it>
[not found] ` <ddeIu-n9-9@gated-at.bofh.it>
[not found] ` <ddeSb-th-29@gated-at.bofh.it>
[not found] ` <ddoRI-11a-39@gated-at.bofh.it>
[not found] <ciPTu-7f2-47@gated-at.bofh.it>
[not found] ` <clrhX-3R1-13@gated-at.bofh.it>
[not found] ` <dcEc1-33Z-17@gated-at.bofh.it>
[not found] ` <dcFUz-4yN-9@gated-at.bofh.it>
[not found] ` <dcHtf-5XC-17@gated-at.bofh.it>
[not found] ` <dcNSc-2DU-11@gated-at.bofh.it>
[not found] ` <dcOl9-3aC-19@gated-at.bofh.it>
[not found] ` <dcOOd-3qM-33@gated-at.bofh.it>
[not found] ` <dcOXN-3M5-15@gated-at.bofh.it>
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090827122423.GO6997@mit.edu \
--to=tytso@mit.edu \
--cc=akpm@osdl.org \
--cc=corbet@lwn.net \
--cc=fweimer@bfk.de \
--cc=goswin-v-b@web.de \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=pavel@ucw.cz \
--cc=rdunlap@xenotime.net \
--cc=rob@landley.net \
--cc=rwheeler@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).