From: David Greaves <david@dgreaves.com>
To: "Peter T. Breuer" <ptb@lab.it.uc3m.es>
Cc: linux-raid@vger.kernel.org
Subject: Re: ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)
Date: Tue, 04 Jan 2005 19:12:54 +0000 [thread overview]
Message-ID: <41DAEAB6.8000101@dgreaves.com> (raw)
In-Reply-To: <751ra2-amt.ln1@news.it.uc3m.es>
Peter T. Breuer wrote:
>A joournalled file system is always _consistent_. That does no mean it
>is correct!
>
>
To my knowledge no computers have the philosophical wherewithall to
provide that service ;)
If one is rude enough to stab a journalling filesystem in the back as it
tries to save your data it promises only to be consistent when it is
revived - it won't provide application correctness..
I think we agree on that.
>>The md driver (somehow) gets to decide which half of the mirror is 'best'.
>>
>>
>Yep - and which is correct?
>
>
Both are 'correct' - they simply represent different points in the
series of system calls made before the power went.
>Which is correct?
>
>
<grumble> ditto
>And the question remains - which outcome is correct?
>
>
same answer I'm afraid.
>Well, I'll answer that. Assuming that the fs layer is only notified
>when BOTH journal writes have happened, and tcp signals can be sent
>off-machine or something like that, then the correct result is the
>rollback, not the completion, as the world does not expect there to
>have been a completion given the data it has got.
>
>It's as I said. One always wants to rollback. So one doesn't want the
>journal to bother with data at all.
>
<cough>bullshit</cough> ;)
I write a,b,c and d to the filesystem
we begin our story when a,b and c all live on the fs device (raid or
not), all synced up and consistent.
I start to write d
it hits journal mirror A
it hits journal mirror B
it finalises on journal mirror B
I yank the plug
The mirrors are inconsistent
The filesystem is consistent
I reboot
scenario 1) the md device comes back using A
the journal isn't finalised - it's ignored
the filesystem contains a,b and c
Is that correct?
scenario 2) the md device comes back using B
the journal is finalised - it's rolled forward
the filesystem contains a,b,c and d
Is that correct?
Both are correct.
So, I think that deals with correctness and journalling - now on to
errors...
>>>No. I made no such assumption. I don't know or care what you do with a
>>>detectable error. I only say that whatever your test is, it detects it!
>>>IF it looks at the right spot, of course. And on raid the chances of
>>>doing that are halved, because it has to choose which disk to read.
>>>
>>>
>>I did when I defined detectable.... tentative definitions:
>>detectable = noticed by normal OS I/O. ie CRC sector failure etc
>>undetectable = noticed by special analysis (fsck, md5sum verification etc)
>>
>>
>
>A detectable error is one you detect with whatever your test is. If
>your test is fsck, then that's the kind of error that is detected by the
>detection that you do ... the only condition I imposed for the analysis
>was that the test be conducted on the raid array, not on its underlying
>components.
>
>
well, if we're going to get anywhere here we need to be clear about things.
There are all kinds of errors - raid and redundancy will help with some
and not others.
An md device does have underlying components and to refuse to allow
tests to compare them you remove one of the benefits of raid -
redundancy. It may make it easier to model mathmatically - but then the
model is wrong.
We need to make sure we're talking about bits on a device
md reads devices and it writes them.
We need to understand what an error is - stop talking bollocks about
"whatever the test is". This is *not* a math problem - it's simply not
well enough defined yet. Lets get back to reality to decide what to model.
I proposed definitions and tests (the ones used in the real world where
we don't run fsck) and you've ignored them.
I'll repeat them:
detectable = noticed by normal OS I/O. ie CRC sector failure etc
undetectable = noticed by special analysis (fsck, md5sum verification etc)
I'll add 'component device comparison' to the special analysis list.
No error is truly undetectable - if it were then it wouldn't matter
would it?
>>- nothing's broken but a bit flipped during the write/store process (or
>>the power went before it hit the media). Detectable errors are more
>>likely to be permanent (since most detection algorithms probably have a
>>retry).
>>
>>
>
>I think that for some reason you are considering that a test (a
>detection test) is carried out at every moment of time. No. Only ONE
>test is ever carried out. It is the test you apply when you do the
>observation: the experiment you run decides at that single point wether
>the disk (the raid array) has errors or not. In practical terms, you do
>it usualy when you boot the raid array, and run fsck on its file system.
>
>OK?
>You simply leave an experiment running for a while (leave the array up,
>let monkeys play on it, etc.) and then you test it. That test detects
>some errors. However, there are two types of errors - those you can
>detect with your test, and those you cannot detect. My analysis simply
>gave the probabilities for those on the array, in terms of basic
>parameters for the probabilities per an individual disk.
>
>I really do not see why people make such a fuss about this!
>
>
We care about our data and raid has some vulnerabilites to corruption.
We need to understand these to fix them - your analysis is woolly and
unhelpful and, although it may have certain elements that are
mathmatically correct - your model has flaws that mean that the
conclusions are not applicable.
>>>>However, we need to carry out risk analysis to decide if the increase in
>>>>susceptibility to certain kinds of corruption (cosmic rays) is
>>>>
>>>>
>>>>
>>>Ahh. Yes you do. No I don't! This is your own invention, and I said no
>>>such thing. By "errors", I meant anything at all that you consider to be
>>>an error. It's up to you. And I see no reason to restrict the term to
>>>what is produced by something like "cosmic rays". "People hitting the
>>>off switch at the wrong time" counts just as much, as far as I know.
>>>
>>>
>>>
>>>
>>You're talking about causes - I'm talking about classes of error.
>>
>>
>
>No, I'm talking about classes of error! You're talking about causes. :)
>
>
No, by comparing the risk between classes of error (detectable and not)
I'm talking about classes of errror - by arguing about cosmic rays and
power switches you _are_ talking about causes.
Personally I think there is a massive difference between the risk of
detectable errors and undetectable ones. Many orders of magnitude.
>>Hitting the power off switch doesn't cause a physical failure - it
>>causes inconsistency in the data.
>>
>>
>I don't understand you - it causes errors just like cosmic rays do (and
>we can even set out and describe the mechanisms involved). The word
>"failure" is meaningless to me here.
>
>
yes, you appear to have selectively quoted and ignored what I said a
line earlier:
> (I live in telco-land so most datacentres I know have more chance of
suffering cosmic ray damage than Joe Random user pulling the plug - but
conceptually these events are the same).
When that happens I begin to think that further discussion is meaningless.
>>>I would guess that you are trying to classify errors by the way their
>>>probabilities scale with number of disks.
>>>
>>>
>>>
>>Nope - detectable vs undetectable.
>>
>>
>
>Then what's the problem? An undetectable error is one you cannot detect
>via your test. Those scale with real estate. A detectible error is one
>you can spot with your test (on the array, not its components). The
>missed detectible errors scale as n-1, where n is the number of disks in
>the array.
>
>Thus a single disk suffers from no missed detectible errors, and a
>2-disk raid array does.
>
>That's all.
>
>No fuss, no muss!
>
>
and so obviously wrong!
An md device does have underlying components and to refuse to allow
tests to compare them you remove one of the benefits of raid - redundancy.
>>Also, it strikes me that raid can actually find undetectable errors by
>>doing a bit-comparison scan.
>>
>>
>
>No, it can't, by definition. Undetectible errors are undetectible. If
>you change your test, you change the class of errors that are
>undetectible.
>
>That's all.
>
>
>
>>Non-resilient devices with only one copy of each bit can't do that.
>>raid 6 could even fix undetectable errors.
>>
>>
>
>Then they are not "undetectible".
>
>
They are. Read my definition. They are not detected in normal operation
with some kind of event notification/error return code; hence undetectable.
However bit comparison with known good or md5 sums or with a mirror can
spot such bit flips.
They are still 'undetectable' in normal operation.
Be consistent in your terminology.
>The analisis in not affected by your changing the definition of what is
>in the undetectible class of error and what is not. It stands. I have
>made no assumption at all on what they are. I simply pointed out how
>the probabilities scale for a raid array.
>
>
What analysis - you are waving vague and changing definitions about and
talk about grandma's favourite colour
David
PS any dangling sentences are because I just found so many
inconsistencies that I gave up.
next prev parent reply other threads:[~2005-01-04 19:12 UTC|newest]
Thread overview: 130+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200501030916.j039Gqe23568@inv.it.uc3m.es>
2005-01-03 10:17 ` ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard) Guy
2005-01-03 11:31 ` Peter T. Breuer
2005-01-03 17:34 ` Guy
2005-01-03 19:20 ` ext3 Gordon Henderson
2005-01-03 19:47 ` ext3 Morten Sylvest Olsen
2005-01-03 20:05 ` ext3 Gordon Henderson
2005-01-03 17:46 ` ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard) maarten
2005-01-03 19:52 ` maarten
2005-01-03 20:41 ` Peter T. Breuer
2005-01-03 23:19 ` Peter T. Breuer
2005-01-03 23:46 ` Neil Brown
2005-01-04 0:28 ` Peter T. Breuer
2005-01-04 1:18 ` Alvin Oga
2005-01-04 4:29 ` Neil Brown
2005-01-04 8:43 ` Peter T. Breuer
2005-01-04 2:07 ` Neil Brown
2005-01-04 2:16 ` Ewan Grantham
2005-01-04 2:22 ` Neil Brown
2005-01-04 2:41 ` Andy Smith
2005-01-04 3:42 ` Neil Brown
2005-01-04 9:50 ` Peter T. Breuer
2005-01-04 14:15 ` David Greaves
2005-01-04 15:20 ` Peter T. Breuer
2005-01-04 16:42 ` Guy
2005-01-04 17:46 ` Peter T. Breuer
2005-01-04 9:30 ` Maarten
2005-01-04 10:18 ` Peter T. Breuer
2005-01-04 13:36 ` Maarten
2005-01-04 14:13 ` Peter T. Breuer
2005-01-04 19:22 ` maarten
2005-01-04 20:05 ` Peter T. Breuer
2005-01-04 21:38 ` Guy
2005-01-04 23:53 ` Peter T. Breuer
2005-01-05 0:58 ` Mikael Abrahamsson
2005-01-04 21:48 ` maarten
2005-01-04 23:14 ` Peter T. Breuer
2005-01-05 1:53 ` maarten
2005-01-04 9:46 ` Peter T. Breuer
2005-01-04 19:02 ` maarten
2005-01-04 19:12 ` David Greaves
2005-01-04 21:08 ` Peter T. Breuer
2005-01-04 22:02 ` Brad Campbell
2005-01-04 23:20 ` Peter T. Breuer
2005-01-05 5:44 ` Brad Campbell
2005-01-05 9:00 ` Peter T. Breuer
2005-01-05 9:14 ` Brad Campbell
2005-01-05 9:28 ` Peter T. Breuer
2005-01-05 9:43 ` Brad Campbell
2005-01-05 15:09 ` Guy
2005-01-05 15:52 ` maarten
2005-01-05 10:04 ` Andy Smith
2005-01-04 22:21 ` Neil Brown
2005-01-05 0:08 ` Peter T. Breuer
2005-01-04 22:29 ` Neil Brown
2005-01-05 0:19 ` Peter T. Breuer
2005-01-05 1:19 ` Jure Pe_ar
2005-01-05 2:29 ` Peter T. Breuer
2005-01-05 0:38 ` maarten
2005-01-04 9:40 ` Peter T. Breuer
2005-01-04 11:57 ` Which drive gets read in case of inconsistency? [was: ext3 journal on software raid etc] Michael Tokarev
2005-01-04 12:40 ` Morten Sylvest Olsen
2005-01-04 12:44 ` Peter T. Breuer
2005-01-04 14:22 ` Maarten
2005-01-04 14:56 ` Peter T. Breuer
2005-01-04 14:03 ` ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard) David Greaves
2005-01-04 14:07 ` Peter T. Breuer
2005-01-04 14:43 ` David Greaves
2005-01-04 15:12 ` Peter T. Breuer
2005-01-04 16:54 ` David Greaves
2005-01-04 17:42 ` Peter T. Breuer
2005-01-04 19:12 ` David Greaves [this message]
2005-01-04 0:45 ` maarten
2005-01-04 10:14 ` Peter T. Breuer
2005-01-04 13:24 ` Maarten
2005-01-04 14:05 ` Peter T. Breuer
2005-01-04 15:31 ` Maarten
2005-01-04 16:21 ` Peter T. Breuer
2005-01-04 20:55 ` maarten
2005-01-04 21:11 ` Peter T. Breuer
2005-01-04 21:38 ` Peter T. Breuer
2005-01-04 23:29 ` Guy
2005-01-04 19:57 ` Mikael Abrahamsson
2005-01-04 21:05 ` maarten
2005-01-04 21:26 ` Alvin Oga
2005-01-04 21:46 ` Guy
2005-01-03 20:22 ` Peter T. Breuer
2005-01-03 23:05 ` Guy
2005-01-04 0:08 ` maarten
2005-01-04 8:57 ` I'm glad I don't live in Spain (was Re: ext3 journal on software raid) David L. Smith-Uchida
2005-01-03 21:36 ` ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard) Guy
2005-01-04 0:15 ` maarten
2005-01-04 11:21 ` Michael Tokarev
2005-01-03 9:30 Peter T. Breuer
-- strict thread matches above, loose matches on Subject: below --
2004-12-30 0:31 PROBLEM: Kernel 2.6.10 crashing repeatedly and hard Georg C. F. Greve
2004-12-30 16:23 ` Georg C. F. Greve
2004-12-30 17:39 ` Peter T. Breuer
2004-12-30 19:50 ` Michael Tokarev
2004-12-30 21:39 ` Peter T. Breuer
2005-01-02 19:42 ` ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard) Andy Smith
2005-01-02 20:18 ` Peter T. Breuer
2005-01-03 0:30 ` Andy Smith
2005-01-03 6:41 ` Neil Brown
2005-01-03 8:37 ` Peter T. Breuer
2005-01-03 8:03 ` Peter T. Breuer
2005-01-03 8:58 ` Guy
2005-01-03 12:11 ` Michael Tokarev
2005-01-03 14:23 ` Peter T. Breuer
2005-01-03 18:30 ` maarten
2005-01-03 21:36 ` Michael Tokarev
2005-01-05 9:56 ` Andy Smith
2005-01-05 10:44 ` Alvin Oga
2005-01-05 10:56 ` Brad Campbell
2005-01-05 11:39 ` Alvin Oga
2005-01-05 12:02 ` Brad Campbell
2005-01-05 13:23 ` Alvin Oga
2005-01-05 13:33 ` Brad Campbell
2005-01-05 14:12 ` Erik Mouw
2005-01-05 14:37 ` Michael Tokarev
2005-01-05 17:11 ` Erik Mouw
2005-01-06 5:41 ` Brad Campbell
2005-01-05 15:17 ` Guy
2005-01-05 15:33 ` Alvin Oga
2005-01-05 16:22 ` Michael Tokarev
2005-01-05 17:23 ` Peter T. Breuer
2005-01-05 16:23 ` Andy Smith
2005-01-05 16:30 ` Andy Smith
2005-01-05 17:07 ` Guy
2005-01-05 17:21 ` Alvin Oga
2005-01-05 17:32 ` Guy
2005-01-05 18:37 ` Alvin Oga
2005-01-05 17:26 ` David Greaves
2005-01-05 18:16 ` Peter T. Breuer
2005-01-05 18:28 ` Guy
2005-01-05 18:26 ` Guy
2005-01-05 15:48 ` Peter T. Breuer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41DAEAB6.8000101@dgreaves.com \
--to=david@dgreaves.com \
--cc=linux-raid@vger.kernel.org \
--cc=ptb@lab.it.uc3m.es \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).