Re: ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Greaves <david@dgreaves.com>
To: "Peter T. Breuer" <ptb@lab.it.uc3m.es>
Cc: linux-raid@vger.kernel.org
Subject: Re: ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)
Date: Tue, 04 Jan 2005 16:54:17 +0000	[thread overview]
Message-ID: <41DACA39.4020700@dgreaves.com> (raw)
In-Reply-To: <kboqa2-ana.ln1@news.it.uc3m.es>

Peter T. Breuer wrote:

>David Greaves <david@dgreaves.com> wrote:
>  
>
>>Disks suffer from random *detectable* corruption events on (or after) 
>>write (eg media or transient cache being hit by a cosmic ray, cpu 
>>fluctuations during write, e/m or thermal variations).
>>    
>>
>
>Well, and also people hitting the off switch (or the power going off)
>during a write sequence to a mirror, but after one of a pair of mirror
>writes has gone to disk, but before the other of the pair has.
>
>(If you want to say "but the fs is journalled", then consider what if 
>the write is to the journal ...).
>  
>
Hmm.
In neither case would a journalling filesystem be corrupted.

The md driver (somehow) gets to decide which half of the mirror is 'best'.

If the journal uses the fully written half of the mirror then it's replayed.
If the journal uses the partially written half of the mirror then it's 
not replayed.
It's just the same as powering off a normal non-resilient device.

(Is your point here back to the failure to guarantee write ordering? I 
thought Neil answered that?)


but lets carry on...

>>Disks suffer from random *undetectable* corruption events on (or after) 
>>write (eg media or transient cache being hit by a cosmic ray, cpu 
>>fluctuations during write, e/m or thermal variations)
>>    
>>
>
>Yes. This is not different from what I have said. I didn't have any
>particular scenario in mind.
>
>But I see that you are correct in pointing out that some error
>posibilities arer _created_ by the presence of raid that would not
>ordinarily be present. So there is some scaling with the
>number of disks that needs clarification.
>
>  
>
>>Raid disks have more 'corruption-susceptible' data capacity per useable 
>>data capacity and so the probability of a corruption event is higher. 
>>    
>>
>
>Well, the probability is larger no matter what the nature of the event.
>In principle, and vry apprximately, there are simply more places (and
>times!) for it to happen TO.
>  
>
exactly what I meant.

>Yes, you may say but those errors that are produced by the cpu don't
>scale, nor do those that are produced by software.
>
No, I don't say that.

> I'd demur. If you
>think about each kind you have in mind you'll see that they do scale:
>for example, the cpu has to work twice as often to write to two raid
>disks as it does to have to write to one disk, so the opportunities for
>IT to get something wrong are doubled.  Ditto software.  And of course,
>since it is writing twice as often , the chance of being interrupted at
>an inopportune time by a power failure are also doubled.
>  
>
I agree - obvious really.

>See?
>  
>
yes

>
>  
>
>>Since a detectable error is detected it can be retried and dealt with.
>>    
>>
>
>No. I made no such assumption. I don't know or care what you do with a
>detectable error. I only say that whatever your test is, it detects it!
>IF it looks at the right spot, of course. And on raid the chances of
>doing that are halved, because it has to choose which disk to read.
>  
>
I did when I defined detectable.... tentative definitions:
detectable = noticed by normal OS I/O. ie CRC sector failure etc
undetectable = noticed by special analysis (fsck, md5sum verification etc)

And a detectable error occurs on the underlying non-raid device - so the 
chances are not halved since we're talking about write errors which go 
to both disks. Detectable read errors are retried until they succeed - 
if they fail then I submit that a "write (or after)" corruption occured.

Hmm.
It also occurs to me that undetectable errors are likely to be temporary 
- nothing's broken but a bit flipped during the write/store process (or 
the power went before it hit the media). Detectable errors are more 
likely to be permanent (since most detection algorithms probably have a 
retry).

>>This leaves the fact that essentially, raid disks are less reliable than 
>>non-raid disks wrt undetectable corruption events.
>>    
>>
>
>Well, that too. There is more real estate.
>
>But this "corruption"  word seems to me to imply that you think I was
>imagining errors produced by cosmic rays. I made no such restriction.
>  
>
No, I was attempting to convey "random, undetectable, small, non 
systematic" (ie I can't spot cosmic rays hitting the disk - and even if 
I could, only a very few would cause damage) vs significant physical 
failure "drive smoking and horrid graunching noise" (smoke and noise 
being valid detection methods!).

They're only the same if you have a no process for dealing with errors.

>>However, we need to carry out risk analysis to decide if the increase in 
>>susceptibility to certain kinds of corruption (cosmic rays) is 
>>    
>>
>
>Ahh. Yes you do. No I don't! This is your own invention, and I said no
>such thing. By "errors", I meant anything at all that you consider to be
>an error. It's up to you.  And I see no reason to restrict the term to
>what is produced by something like "cosmic rays". "People hitting the
>off switch at the wrong time" counts just as much, as far as I know.
>  
>
You're talking about causes - I'm talking about classes of error.

(I live in telco-land so most datacentres I know have more chance of 
suffering cosmic ray damage than Joe Random user pulling the plug - but 
conceptually these events are the same).

Hitting the power off switch doesn't cause a physical failure - it 
causes inconsistency in the data.

I introduce risk analysis to justify accepting the 'real estate 
undetectable corruption vulnerability' risk increase of raid versus the 
ability to cope with detectable errors.

>I would guess that you are trying to classify errors by the way their
>probabilities scale with number of disks.
>
Nope - detectable vs undetectable.

> I made no such distinction,
>in principle.  I simply classified errors according to whether you could
>(in principle, also) detect them or not, whatever your test is.
>  
>
Also, it strikes me that raid can actually find undetectable errors by 
doing a bit-comparison scan.
Non-resilient devices with only one copy of each bit can't do that.
raid 6 could even fix undetectable errors.

A detectable error on a non-resilient media means you have no faith in 
the (possibly corrupt) data.
An undetectable error on a non-resilient media means you have faith in 
the (possibly corrupt) data.

Raid ultimately uses non-resilient media and propagates and uses this 
faith to deliver data to you.


David

next prev parent reply	other threads:[~2005-01-04 16:54 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200501030916.j039Gqe23568@inv.it.uc3m.es>
2005-01-03 10:17 ` ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard) Guy
2005-01-03 11:31   ` Peter T. Breuer
2005-01-03 17:34     ` Guy
2005-01-03 19:20       ` ext3 Gordon Henderson
2005-01-03 19:47         ` ext3 Morten Sylvest Olsen
2005-01-03 20:05           ` ext3 Gordon Henderson
2005-01-03 17:46     ` ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard) maarten
2005-01-03 19:52       ` maarten
2005-01-03 20:41         ` Peter T. Breuer
2005-01-03 23:19           ` Peter T. Breuer
2005-01-03 23:46             ` Neil Brown
2005-01-04  0:28               ` Peter T. Breuer
2005-01-04  1:18                 ` Alvin Oga
2005-01-04  4:29                   ` Neil Brown
2005-01-04  8:43                     ` Peter T. Breuer
2005-01-04  2:07                 ` Neil Brown
2005-01-04  2:16                   ` Ewan Grantham
2005-01-04  2:22                     ` Neil Brown
2005-01-04  2:41                       ` Andy Smith
2005-01-04  3:42                         ` Neil Brown
2005-01-04  9:50                           ` Peter T. Breuer
2005-01-04 14:15                             ` David Greaves
2005-01-04 15:20                               ` Peter T. Breuer
2005-01-04 16:42                             ` Guy
2005-01-04 17:46                               ` Peter T. Breuer
2005-01-04  9:30                         ` Maarten
2005-01-04 10:18                           ` Peter T. Breuer
2005-01-04 13:36                             ` Maarten
2005-01-04 14:13                               ` Peter T. Breuer
2005-01-04 19:22                                 ` maarten
2005-01-04 20:05                                   ` Peter T. Breuer
2005-01-04 21:38                                     ` Guy
2005-01-04 23:53                                       ` Peter T. Breuer
2005-01-05  0:58                                       ` Mikael Abrahamsson
2005-01-04 21:48                                     ` maarten
2005-01-04 23:14                                       ` Peter T. Breuer
2005-01-05  1:53                                         ` maarten
2005-01-04  9:46                         ` Peter T. Breuer
2005-01-04 19:02                           ` maarten
2005-01-04 19:12                             ` David Greaves
2005-01-04 21:08                             ` Peter T. Breuer
2005-01-04 22:02                               ` Brad Campbell
2005-01-04 23:20                                 ` Peter T. Breuer
2005-01-05  5:44                                   ` Brad Campbell
2005-01-05  9:00                                     ` Peter T. Breuer
2005-01-05  9:14                                       ` Brad Campbell
2005-01-05  9:28                                         ` Peter T. Breuer
2005-01-05  9:43                                           ` Brad Campbell
2005-01-05 15:09                                             ` Guy
2005-01-05 15:52                                               ` maarten
2005-01-05 10:04                                           ` Andy Smith
2005-01-04 22:21                               ` Neil Brown
2005-01-05  0:08                                 ` Peter T. Breuer
2005-01-04 22:29                               ` Neil Brown
2005-01-05  0:19                                 ` Peter T. Breuer
2005-01-05  1:19                                   ` Jure Pe_ar
2005-01-05  2:29                                     ` Peter T. Breuer
2005-01-05  0:38                               ` maarten
2005-01-04  9:40                   ` Peter T. Breuer
2005-01-04 11:57                     ` Which drive gets read in case of inconsistency? [was: ext3 journal on software raid etc] Michael Tokarev
2005-01-04 12:40                       ` Morten Sylvest Olsen
2005-01-04 12:44                       ` Peter T. Breuer
2005-01-04 14:22                         ` Maarten
2005-01-04 14:56                           ` Peter T. Breuer
2005-01-04 14:03                     ` ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard) David Greaves
2005-01-04 14:07                       ` Peter T. Breuer
2005-01-04 14:43                         ` David Greaves
2005-01-04 15:12                           ` Peter T. Breuer
2005-01-04 16:54                             ` David Greaves [this message]
2005-01-04 17:42                               ` Peter T. Breuer
2005-01-04 19:12                                 ` David Greaves
2005-01-04  0:45           ` maarten
2005-01-04 10:14             ` Peter T. Breuer
2005-01-04 13:24               ` Maarten
2005-01-04 14:05                 ` Peter T. Breuer
2005-01-04 15:31                   ` Maarten
2005-01-04 16:21                     ` Peter T. Breuer
2005-01-04 20:55                       ` maarten
2005-01-04 21:11                         ` Peter T. Breuer
2005-01-04 21:38                         ` Peter T. Breuer
2005-01-04 23:29                           ` Guy
2005-01-04 19:57                     ` Mikael Abrahamsson
2005-01-04 21:05                       ` maarten
2005-01-04 21:26                         ` Alvin Oga
2005-01-04 21:46                         ` Guy
2005-01-03 20:22       ` Peter T. Breuer
2005-01-03 23:05         ` Guy
2005-01-04  0:08         ` maarten
2005-01-04  8:57         ` I'm glad I don't live in Spain (was Re: ext3 journal on software raid) David L. Smith-Uchida
2005-01-03 21:36       ` ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard) Guy
2005-01-04  0:15         ` maarten
2005-01-04 11:21           ` Michael Tokarev
2005-01-03  9:30 Peter T. Breuer
  -- strict thread matches above, loose matches on Subject: below --
2004-12-30  0:31 PROBLEM: Kernel 2.6.10 crashing repeatedly and hard Georg C. F. Greve
2004-12-30 16:23 ` Georg C. F. Greve
2004-12-30 17:39   ` Peter T. Breuer
2004-12-30 19:50     ` Michael Tokarev
2004-12-30 21:39       ` Peter T. Breuer
2005-01-02 19:42         ` ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard) Andy Smith
2005-01-02 20:18           ` Peter T. Breuer
2005-01-03  0:30             ` Andy Smith
2005-01-03  6:41               ` Neil Brown
2005-01-03  8:37                 ` Peter T. Breuer
2005-01-03  8:03               ` Peter T. Breuer
2005-01-03  8:58                 ` Guy
2005-01-03 12:11                 ` Michael Tokarev
2005-01-03 14:23                   ` Peter T. Breuer
2005-01-03 18:30                     ` maarten
2005-01-03 21:36                     ` Michael Tokarev
2005-01-05  9:56           ` Andy Smith
2005-01-05 10:44             ` Alvin Oga
2005-01-05 10:56               ` Brad Campbell
2005-01-05 11:39                 ` Alvin Oga
2005-01-05 12:02                   ` Brad Campbell
2005-01-05 13:23                     ` Alvin Oga
2005-01-05 13:33                       ` Brad Campbell
2005-01-05 14:12                 ` Erik Mouw
2005-01-05 14:37                   ` Michael Tokarev
2005-01-05 17:11                     ` Erik Mouw
2005-01-06  5:41                       ` Brad Campbell
2005-01-05 15:17                 ` Guy
2005-01-05 15:33                   ` Alvin Oga
2005-01-05 16:22                     ` Michael Tokarev
2005-01-05 17:23                       ` Peter T. Breuer
2005-01-05 16:23                     ` Andy Smith
2005-01-05 16:30                       ` Andy Smith
2005-01-05 17:07                     ` Guy
2005-01-05 17:21                       ` Alvin Oga
2005-01-05 17:32                         ` Guy
2005-01-05 18:37                           ` Alvin Oga
2005-01-05 17:26                       ` David Greaves
2005-01-05 18:16                         ` Peter T. Breuer
2005-01-05 18:28                           ` Guy
2005-01-05 18:26                         ` Guy
2005-01-05 15:48                   ` Peter T. Breuer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41DACA39.4020700@dgreaves.com \
    --to=david@dgreaves.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=ptb@lab.it.uc3m.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).