public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Xuân Baldauf" <xuan--2004.08.01--linux-kernel--vger.kernel.org@baldauf.org>
To: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Software RAID 5 and crashes
Date: Sun, 01 Aug 2004 16:01:14 +0200	[thread overview]
Message-ID: <410CF7AA.2020604@baldauf.org> (raw)
In-Reply-To: <16652.55805.900685.826943@cse.unsw.edu.au>

Neil Brown wrote:

>On Sunday August 1, xuan--2004.08.01--linux-kernel--vger.kernel.org@baldauf.org wrote:
>  
>
>>Hello,
>>
>>I have been extensively searching for documentation and mailing lists, 
>>but was yet unable to answer this question:
>>
>>Does Linux software RAID 5 (or RAID 4) do ordered writes? (data stripes 
>>first, then parity stripes)
>>    
>>
>
>No, it doesn't impose any ordering between writes of parity and data
>in the same stripe, and it would not have any material effect on any
>outcomes if it did.
>  
>
I disagree. :-)

>  
>
>>Because if the writes are not ordered, parity stripes could be written 
>>before data stripes. If the system crashes at this time, reconstruction 
>>will  try to reconstruct the parity stripes by using the wrong (old) 
>>data stripes.
>>
>>If the writes are ordered, crashes after the write of the data stripe 
>>but before the write to the parity stripe do not harm.
>>    
>>
>
>When the system crashes, the RAID5 manager assume that all data blocks
>are correct and all parity blocks are suspect.  It checks all parity
>blocks against corresponding data and corrects  those that don't
>match.
>If a write is "in progress" - i.e. it has started but not all data and
>parity has been written, then either the "old" data or the "new" data
>are equally correct.  The only thing that needs to be guaranteed after
>a crash, and the only thing that can be guaranteed, is that any data
>that has been reported as "safe-in-storage" really is safe.  That is
>all journalling filesystems, or anything else, assume.
>
>Hope that helps.
>  
>
Yes, thank you, the clear statement, that writes are not ordered, helps. 
:-) It is also relieving to read that data blocks are always preferred 
to parity blocks, so that data blocks never can become scrambled by 
unmatching other data blocks and parity blocks (at least in non-degraded 
mode).

Unfortunately, it still does not make me satisfied, because: The 
asymmetry of "all data blocks are correct, all parity blocks are 
suspect" should be exploited.
Consider 4 disks joined as RAID 5. There are 4 stripes (s0, s1, s2, s3), 
where s3 is the parity stripe.

    * <>Example 0: s3,s2,s1 are written to disk, while s0 is not written
      to disk for some reason. The system crashes. What happens at
      reconstruction? s3 gets replaced by s0 XOR s1 XOR s2. s0 contains
      old (read: wrong) data.
    * Example 1: s0,s1,s2 are written to disk, while s3 is not written
      to disk for some reason. The system crashes. What happens at
      reconstruction? s3 gets replaced by s0 XOR s1 XOR s2. s0 does not
      contain old data. The stripe which contains the old data (s3) is
      replaced anyway during reconstruction.

If we now consider, that for each disk (as member of a RAID 5), there 
are parity stripes and there are data stripes. Doesn't it make sense to 
prefer data blocks over partiy blocks when writing, just to get more 
cases of "example 1" against "example 0" than without this preference?

One could even imagine to intensionally postpone the parity block 
writing for some time in favour of peak throughput. The RAID 5 device 
looses its rendundancy for some bounded time at a bounded region of its 
space, but this may be acceptable for certain applications, I think.

>NeilBrown
>
>  
>
Xuân Baldauf.



  reply	other threads:[~2004-08-01 14:01 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-01 10:19 Software RAID 5 and crashes Xuân Baldauf
2004-08-01 11:54 ` Neil Brown
2004-08-01 14:01   ` Xuân Baldauf [this message]
2004-08-01 15:08     ` Bernd Eckenfels
2004-08-02  6:13     ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=410CF7AA.2020604@baldauf.org \
    --to=xuan--2004.08.01--linux-kernel--vger.kernel.org@baldauf.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox