Starting RAID 5

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Starting RAID 5
@ 2009-05-15  2:15 Leslie Rhorer
  2009-05-15  2:34 ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Leslie Rhorer @ 2009-05-15  2:15 UTC (permalink / raw)
  To: 'Linux RAID'


OK, I've torn down the LVM backup arraqy and am rebuilding it as a RAID 5.
I've had problems with this before, and I'm having them, again.  I created
the array with:

mdadm --create /dev/md0 --raid-devices=7 --metadata=1.2 --chunk=256
--level=5 /dev/sd[a-g]

whereupon it creates the array and then immediately removes /dev/sdg and
makes it a spare.  I think I may have read where this is normal behavior.
Mdadm reports:

Backup:/# mdadm -Dt /dev/md0
/dev/md0:
        Version : 01.02
  Creation Time : Thu May 14 21:08:39 2009
     Raid Level : raid5
     Array Size : 8790830592 (8383.59 GiB 9001.81 GB)
  Used Dev Size : 2930276864 (2794.53 GiB 3000.60 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu May 14 21:08:39 2009
          State : clean, degraded
 Active Devices : 6
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 256K

           Name : Backup:0  (local to host Backup)
           UUID : 7014c2f4:04c56e86:b453d0be:9c49d0e2
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       8       16        1      active sync   /dev/sdb
       2       8       32        2      active sync   /dev/sdc
       3       8       48        3      active sync   /dev/sdd
       4       8       64        4      active sync   /dev/sde
       5       8       80        5      active sync   /dev/sdf
       6       0        0        6      removed

       7       8       96        -      spare   /dev/sdg

I can't get it to do an initial resync or promote the spare, however.  What
do I do?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Starting RAID 5
  2009-05-15  2:15 Starting RAID 5 Leslie Rhorer
@ 2009-05-15  2:34 ` NeilBrown
  2009-05-15  3:58   ` Leslie Rhorer
  2009-05-18 15:13   ` Bill Davidsen
  0 siblings, 2 replies; 8+ messages in thread
From: NeilBrown @ 2009-05-15  2:34 UTC (permalink / raw)
  To: lrhorer; +Cc: 'Linux RAID'

On Fri, May 15, 2009 12:15 pm, Leslie Rhorer wrote:
>
> OK, I've torn down the LVM backup arraqy and am rebuilding it as a RAID 5.
> I've had problems with this before, and I'm having them, again.  I created
> the array with:
>
> mdadm --create /dev/md0 --raid-devices=7 --metadata=1.2 --chunk=256
> --level=5 /dev/sd[a-g]
>
> whereupon it creates the array and then immediately removes /dev/sdg and
> makes it a spare.  I think I may have read where this is normal behavior.

Correct. Maybe you read it in the mdadm man page.


> I can't get it to do an initial resync or promote the spare, however.

So it doesn't start recovery straight away? That is odd...
Maybe it is in read-auto mode.  I should probably get mdadm to poke
it out of that.

If it is just start creating the filesystem or whatever you want to
do, that will prod it into action.

If not (check /proc/mdstat), what kernel log messages are them from the
time when you created the array.

NeilBrown


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Starting RAID 5
  2009-05-15  2:34 ` NeilBrown
@ 2009-05-15  3:58   ` Leslie Rhorer
  2009-05-18 15:13   ` Bill Davidsen
  1 sibling, 0 replies; 8+ messages in thread
From: Leslie Rhorer @ 2009-05-15  3:58 UTC (permalink / raw)
  To: 'Linux RAID'

> > OK, I've torn down the LVM backup arraqy and am rebuilding it as a RAID
> 5.
> > I've had problems with this before, and I'm having them, again.  I
> created
> > the array with:
> >
> > mdadm --create /dev/md0 --raid-devices=7 --metadata=1.2 --chunk=256
> > --level=5 /dev/sd[a-g]
> >
> > whereupon it creates the array and then immediately removes /dev/sdg and
> > makes it a spare.  I think I may have read where this is normal
> behavior.
> 
> Correct. Maybe you read it in the mdadm man page.

That's what I thought.  I breezed through the MAN page to try to find it
again, but one of my baby chicks (technicians who work for me) was having a
major melt-down over some minor problems he was experiencing in the network
while trying to prepare for a maintenance window, so I had to break off.

> > I can't get it to do an initial resync or promote the spare, however.
> 
> So it doesn't start recovery straight away? That is odd...

No, and it's happened before, on the other system.  Version 2.6.7.2-1  It
didn't do it when I created the RAID 6 array.

> Maybe it is in read-auto mode.  I should probably get mdadm to poke
> it out of that.
> 
> If it is just start creating the filesystem or whatever you want to
> do, that will prod it into action.

Umph.  Well, that was easy.  I kept messing with mdadm, not mkfs.  Generally
speaking, when I'm working on something I don't move on to subsequent
processes until the current process is functioning properly, so I was loathe
to move on.  Silly me, I guess, but I still say it's usually the best
practice.  Sometimes even good habits get you in trouble, I suppose.

> If not (check /proc/mdstat), what kernel log messages are them from the
> time when you created the array.

Yeah, that was the first thing I did.  There was nothing out of the
ordinary.  Mdadm reported to stdout that it was removing the 7th disk, but
put nothing about it in the log except to say it was active on 6 of 7
devices.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Starting RAID 5
  2009-05-15  2:34 ` NeilBrown
  2009-05-15  3:58   ` Leslie Rhorer
@ 2009-05-18 15:13   ` Bill Davidsen
  2009-05-18 21:36     ` NeilBrown
  1 sibling, 1 reply; 8+ messages in thread
From: Bill Davidsen @ 2009-05-18 15:13 UTC (permalink / raw)
  To: NeilBrown; +Cc: lrhorer, 'Linux RAID'

NeilBrown wrote:
> On Fri, May 15, 2009 12:15 pm, Leslie Rhorer wrote:
>   
>> OK, I've torn down the LVM backup arraqy and am rebuilding it as a RAID 5.
>> I've had problems with this before, and I'm having them, again.  I created
>> the array with:
>>
>> mdadm --create /dev/md0 --raid-devices=7 --metadata=1.2 --chunk=256
>> --level=5 /dev/sd[a-g]
>>
>> whereupon it creates the array and then immediately removes /dev/sdg and
>> makes it a spare.  I think I may have read where this is normal behavior.
>>     
>
> Correct. Maybe you read it in the mdadm man page.
>
>
>   
While I know about that, I have never understood why that was desirable, 
or even acceptable, behavior. The array sits half created doing nothing 
until the system tries to use the array, at which time it's slow because 
it's finally getting around to actually getting the array into some 
sensible state. Is there some benefit to wasting time so the array can 
be slow when needed?
>> I can't get it to do an initial resync or promote the spare, however.
>>     
>
> So it doesn't start recovery straight away? That is odd...
> Maybe it is in read-auto mode.  I should probably get mdadm to poke
> it out of that.
>
> If it is just start creating the filesystem or whatever you want to
> do, that will prod it into action.
>
> If not (check /proc/mdstat), what kernel log messages are them from the
> time when you created the array.
>
> NeilBrown
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   


-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc

"You are disgraced professional losers. And by the way, give us our money back."
    - Representative Earl Pomeroy,  Democrat of North Dakota
on the A.I.G. executives who were paid bonuses  after a federal bailout.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Starting RAID 5
  2009-05-18 15:13   ` Bill Davidsen
@ 2009-05-18 21:36     ` NeilBrown
  2009-05-20 19:45       ` Bill Davidsen
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2009-05-18 21:36 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: lrhorer, 'Linux RAID'

On Tue, May 19, 2009 1:13 am, Bill Davidsen wrote:
> NeilBrown wrote:
>> On Fri, May 15, 2009 12:15 pm, Leslie Rhorer wrote:
>>
>>> OK, I've torn down the LVM backup arraqy and am rebuilding it as a RAID
>>> 5.
>>> I've had problems with this before, and I'm having them, again.  I
>>> created
>>> the array with:
>>>
>>> mdadm --create /dev/md0 --raid-devices=7 --metadata=1.2 --chunk=256
>>> --level=5 /dev/sd[a-g]
>>>
>>> whereupon it creates the array and then immediately removes /dev/sdg
>>> and
>>> makes it a spare.  I think I may have read where this is normal
>>> behavior.
>>>
>>
>> Correct. Maybe you read it in the mdadm man page.
>>
>>
>>
> While I know about that, I have never understood why that was desirable,
> or even acceptable, behavior. The array sits half created doing nothing
> until the system tries to use the array, at which time it's slow because
> it's finally getting around to actually getting the array into some
> sensible state. Is there some benefit to wasting time so the array can
> be slow when needed?

Is the "that" which you refer to the content of the previous paragraph,
or the following paragraph.

The content of your comment suggests the following paragraph which,
as I hint, is a misfeature that should be fixed by having mdadm
"poke it out of that" (i.e. set the array to read-write if it is
read-mostly).

But the positioning of your comment makes it seem to refer to
the previous paragraph which is totally unrelated to your complaint,
but I will explain anyway.

When a raid5 performs a 'resync' it reads every block, tests parity,
then if the parity is wrong, it writes out the correct parity block.
For an array with mostly correct parity, this involves sequential
reads across all devices in parallel and so is as fast as possible.
For an array with mostly incorrect parity (as is quite likely at
array creation) there will be many writes to parity block as well
as the reads, which will take a lot longer.

If we instead make one drive a spare then raid5 will perform recovery
which involves reading N-1 drives and writing to the Nth drive.
All sequential IOs.  This should be as fast as resync on a mostly-clean
array, and much faster than resync on a mostly-dirty array.

NeilBrown

>>> I can't get it to do an initial resync or promote the spare, however.
>>>
>>
>> So it doesn't start recovery straight away? That is odd...
>> Maybe it is in read-auto mode.  I should probably get mdadm to poke
>> it out of that.
>>
>> If it is just start creating the filesystem or whatever you want to
>> do, that will prod it into action.
>>
>> If not (check /proc/mdstat), what kernel log messages are them from the
>> time when you created the array.
>>
>> NeilBrown
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
> --
> bill davidsen <davidsen@tmr.com>
>   CTO TMR Associates, Inc
>
> "You are disgraced professional losers. And by the way, give us our money
> back."
>     - Representative Earl Pomeroy,  Democrat of North Dakota
> on the A.I.G. executives who were paid bonuses  after a federal bailout.
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Starting RAID 5
  2009-05-18 21:36     ` NeilBrown
@ 2009-05-20 19:45       ` Bill Davidsen
  2009-05-20 22:28         ` Neil Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Bill Davidsen @ 2009-05-20 19:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: lrhorer, 'Linux RAID'

NeilBrown wrote:
> On Tue, May 19, 2009 1:13 am, Bill Davidsen wrote:
>   
>> NeilBrown wrote:
>>     
>>> On Fri, May 15, 2009 12:15 pm, Leslie Rhorer wrote:
>>>
>>>       
>>>> OK, I've torn down the LVM backup arraqy and am rebuilding it as a RAID
>>>> 5.
>>>> I've had problems with this before, and I'm having them, again.  I
>>>> created
>>>> the array with:
>>>>
>>>> mdadm --create /dev/md0 --raid-devices=7 --metadata=1.2 --chunk=256
>>>> --level=5 /dev/sd[a-g]
>>>>
>>>> whereupon it creates the array and then immediately removes /dev/sdg
>>>> and
>>>> makes it a spare.  I think I may have read where this is normal
>>>> behavior.
>>>>
>>>>         
>>> Correct. Maybe you read it in the mdadm man page.
>>>
>>>
>>>
>>>       
>> While I know about that, I have never understood why that was desirable,
>> or even acceptable, behavior. The array sits half created doing nothing
>> until the system tries to use the array, at which time it's slow because
>> it's finally getting around to actually getting the array into some
>> sensible state. Is there some benefit to wasting time so the array can
>> be slow when needed?
>>     
>
> Is the "that" which you refer to the content of the previous paragraph,
> or the following paragraph.
>
>   
The problem in the following paragraph is caused by the behavior in the 
first. I don't understand what benefit there is to bringing up the array 
with a spare instead of N elements needing a rebuild. Is adding a spare 
in place of the failed device the best (or only) way to kick off a resync?

> The content of your comment suggests the following paragraph which,
> as I hint, is a misfeature that should be fixed by having mdadm
> "poke it out of that" (i.e. set the array to read-write if it is
> read-mostly).
>
> But the positioning of your comment makes it seem to refer to
> the previous paragraph which is totally unrelated to your complaint,
> but I will explain anyway.
>
> When a raid5 performs a 'resync' it reads every block, tests parity,
> then if the parity is wrong, it writes out the correct parity block.
> For an array with mostly correct parity, this involves sequential
> reads across all devices in parallel and so is as fast as possible.
> For an array with mostly incorrect parity (as is quite likely at
> array creation) there will be many writes to parity block as well
> as the reads, which will take a lot longer.
>
> If we instead make one drive a spare then raid5 will perform recovery
> which involves reading N-1 drives and writing to the Nth drive.
> All sequential IOs.  This should be as fast as resync on a mostly-clean
> array, and much faster than resync on a mostly-dirty array.
>   

It's not the process I question, just leaving the resync until the array 
is written by the user rather than starting it at once so the create 
actually results in a fully functional array. I have the feeling that 
raid6 did that, but I haven't hardware to test today.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc

"You are disgraced professional losers. And by the way, give us our money back."
    - Representative Earl Pomeroy,  Democrat of North Dakota
on the A.I.G. executives who were paid bonuses  after a federal bailout.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Starting RAID 5
  2009-05-20 19:45       ` Bill Davidsen
@ 2009-05-20 22:28         ` Neil Brown
  2009-05-21 18:27           ` Bill Davidsen
  0 siblings, 1 reply; 8+ messages in thread
From: Neil Brown @ 2009-05-20 22:28 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: lrhorer, 'Linux RAID'

On Wednesday May 20, davidsen@tmr.com wrote:
> NeilBrown wrote:
> > On Tue, May 19, 2009 1:13 am, Bill Davidsen wrote:
> >   
> >> NeilBrown wrote:
> >>     
> >>> On Fri, May 15, 2009 12:15 pm, Leslie Rhorer wrote:
> >>>
> >>>       
> >>>> OK, I've torn down the LVM backup arraqy and am rebuilding it as a RAID
> >>>> 5.
> >>>> I've had problems with this before, and I'm having them, again.  I
> >>>> created
> >>>> the array with:
> >>>>
> >>>> mdadm --create /dev/md0 --raid-devices=7 --metadata=1.2 --chunk=256
> >>>> --level=5 /dev/sd[a-g]
> >>>>
> >>>> whereupon it creates the array and then immediately removes /dev/sdg
> >>>> and
> >>>> makes it a spare.  I think I may have read where this is normal
> >>>> behavior.
> >>>>
> >>>>         
> >>> Correct. Maybe you read it in the mdadm man page.
> >>>
> >>>
> >>>
> >>>       
> >> While I know about that, I have never understood why that was desirable,
> >> or even acceptable, behavior. The array sits half created doing nothing
> >> until the system tries to use the array, at which time it's slow because
> >> it's finally getting around to actually getting the array into some
> >> sensible state. Is there some benefit to wasting time so the array can
> >> be slow when needed?
> >>     
> >
> > Is the "that" which you refer to the content of the previous paragraph,
> > or the following paragraph.
> >
> >   
> The problem in the following paragraph is caused by the behavior in the 
> first. I don't understand what benefit there is to bringing up the array 
> with a spare instead of N elements needing a rebuild. Is adding a spare 
> in place of the failed device the best (or only) way to kick off a resync?

Really, the two are independent.  The "wait until someone writes"
would affect resync as well as recover.

The "benefit" is, as I explained, that one is faster (in general) than
the other.
If you want to create a raid5 that have exactly the drive you specify
and not a spare, use the --force flag.
If it would have started read-auto without the --force, it would with
the --force too.  This is controlled by
   /sys/module/md_mod/parameters/start_ro 
If the drives had not been part of a raid5 before, the resync will be
slower than a recovery would have been.


> 
> > The content of your comment suggests the following paragraph which,
> > as I hint, is a misfeature that should be fixed by having mdadm
> > "poke it out of that" (i.e. set the array to read-write if it is
> > read-mostly).
> >
> > But the positioning of your comment makes it seem to refer to
> > the previous paragraph which is totally unrelated to your complaint,
> > but I will explain anyway.
> >
> > When a raid5 performs a 'resync' it reads every block, tests parity,
> > then if the parity is wrong, it writes out the correct parity block.
> > For an array with mostly correct parity, this involves sequential
> > reads across all devices in parallel and so is as fast as possible.
> > For an array with mostly incorrect parity (as is quite likely at
> > array creation) there will be many writes to parity block as well
> > as the reads, which will take a lot longer.
> >
> > If we instead make one drive a spare then raid5 will perform recovery
> > which involves reading N-1 drives and writing to the Nth drive.
> > All sequential IOs.  This should be as fast as resync on a mostly-clean
> > array, and much faster than resync on a mostly-dirty array.
> >   
> 
> It's not the process I question, just leaving the resync until the array 
> is written by the user rather than starting it at once so the create 
> actually results in a fully functional array. I have the feeling that 
> raid6 did that, but I haven't hardware to test today.

No.  You really need a resync first, or your data is not safe.
Just writing data does not set the parity correctly, unless it was
already create before hand. (it might, but there is no guarantee).
So if you get a drive failure before the initial resync or recovery is
complete, you have possibly lost data.
The difference is that in the default case (make a spare and force
recovery), you know that you have lost data.  In the other case (no
magic spares, just do a resync) you can believe that you haven't, but
you might be wrong.

RAID6 is different in that it always calculates new parity and Q.
So you don't need to initial resync to get the parity correct.
And I don't think mdadm fiddles with spares for RAID6.

Just to clarify:  it is perfectly OK to write data to an array before
the initial resync/recovery is finished.  But, on raid5, that data is
not safe from a single-drive-failure until the resync/recovery is
complete.

NeilBrown



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Starting RAID 5
  2009-05-20 22:28         ` Neil Brown
@ 2009-05-21 18:27           ` Bill Davidsen
  0 siblings, 0 replies; 8+ messages in thread
From: Bill Davidsen @ 2009-05-21 18:27 UTC (permalink / raw)
  To: Neil Brown; +Cc: lrhorer, 'Linux RAID'

Neil Brown wrote:
> On Wednesday May 20, davidsen@tmr.com wrote:
>   
>> NeilBrown wrote:
>>     
>>> On Tue, May 19, 2009 1:13 am, Bill Davidsen wrote:
>>>   
>>>       
>>>> NeilBrown wrote:
>>>>     
>>>>         
>>>>> On Fri, May 15, 2009 12:15 pm, Leslie Rhorer wrote:
>>>>>
>>>>>       
>>>>>           
>>>>>> OK, I've torn down the LVM backup arraqy and am rebuilding it as a RAID
>>>>>> 5.
>>>>>> I've had problems with this before, and I'm having them, again.  I
>>>>>> created
>>>>>> the array with:
>>>>>>
>>>>>> mdadm --create /dev/md0 --raid-devices=7 --metadata=1.2 --chunk=256
>>>>>> --level=5 /dev/sd[a-g]
>>>>>>
>>>>>> whereupon it creates the array and then immediately removes /dev/sdg
>>>>>> and
>>>>>> makes it a spare.  I think I may have read where this is normal
>>>>>> behavior.
>>>>>>
>>>>>>         
>>>>>>             
>>>>> Correct. Maybe you read it in the mdadm man page.
>>>>>
>>>>>
>>>>>
>>>>>       
>>>>>           
>>>> While I know about that, I have never understood why that was desirable,
>>>> or even acceptable, behavior. The array sits half created doing nothing
>>>> until the system tries to use the array, at which time it's slow because
>>>> it's finally getting around to actually getting the array into some
>>>> sensible state. Is there some benefit to wasting time so the array can
>>>> be slow when needed?
>>>>     
>>>>         
>>> Is the "that" which you refer to the content of the previous paragraph,
>>> or the following paragraph.
>>>
>>>   
>>>       
>> The problem in the following paragraph is caused by the behavior in the 
>> first. I don't understand what benefit there is to bringing up the array 
>> with a spare instead of N elements needing a rebuild. Is adding a spare 
>> in place of the failed device the best (or only) way to kick off a resync?
>>     
>
> Really, the two are independent.  The "wait until someone writes"
> would affect resync as well as recover.
>
> The "benefit" is, as I explained, that one is faster (in general) than
> the other.
> If you want to create a raid5 that have exactly the drive you specify
> and not a spare, use the --force flag.
> If it would have started read-auto without the --force, it would with
> the --force too.  This is controlled by
>    /sys/module/md_mod/parameters/start_ro 
> If the drives had not been part of a raid5 before, the resync will be
> slower than a recovery would have been.
>
>   
We are talking create here, that was the original example. So each 
stripe needs to have the data chunks read, parity calculated, and parity 
rewritten. So the only question about how soon it ends is how soon it 
starts. Stop me here if somehow that's not the case, parallel reads of 
N-1 drives and write of the parity drive, no read and check of the 
parity needed because it's a create.

If the default values of start_ro is not as I would like it it's my job 
to change it, that's fine. I thought it applied to starting arrays with 
failed drives and create was a special case.
>   
>>> The content of your comment suggests the following paragraph which,
>>> as I hint, is a misfeature that should be fixed by having mdadm
>>> "poke it out of that" (i.e. set the array to read-write if it is
>>> read-mostly).
>>>
>>> But the positioning of your comment makes it seem to refer to
>>> the previous paragraph which is totally unrelated to your complaint,
>>> but I will explain anyway.
>>>
>>> When a raid5 performs a 'resync' it reads every block, tests parity,
>>> then if the parity is wrong, it writes out the correct parity block.
>>> For an array with mostly correct parity, this involves sequential
>>> reads across all devices in parallel and so is as fast as possible.
>>> For an array with mostly incorrect parity (as is quite likely at
>>> array creation) there will be many writes to parity block as well
>>> as the reads, which will take a lot longer.
>>>
>>> If we instead make one drive a spare then raid5 will perform recovery
>>> which involves reading N-1 drives and writing to the Nth drive.
>>> All sequential IOs.  This should be as fast as resync on a mostly-clean
>>> array, and much faster than resync on a mostly-dirty array.
>>>   
>>>       
>> It's not the process I question, just leaving the resync until the array 
>> is written by the user rather than starting it at once so the create 
>> actually results in a fully functional array. I have the feeling that 
>> raid6 did that, but I haven't hardware to test today.
>>     
>
> No.  You really need a resync first, or your data is not safe.
>   

My point exactly, as Shakespeare said "If must be done 'tis better done 
quickly," so why would the resync not kick off when the array is 
started, rather than wait? (see below on that).

> Just writing data does not set the parity correctly, unless it was
> already create before hand. (it might, but there is no guarantee).
> So if you get a drive failure before the initial resync or recovery is
> complete, you have possibly lost data.
>   

We are not getting a drive failure, other than md code marking on drive 
as spare to look as if we were, on create there's no reason to assume 
that parity is correct, or might be correct, so checking before 
recalculation is a waste of time.

> The difference is that in the default case (make a spare and force
> recovery), you know that you have lost data.  In the other case (no
> magic spares, just do a resync) you can believe that you haven't, but
> you might be wrong.
>
>   
When an array is created, unless the assume-clean option is used, why 
would any attention be paid to salvaging data, the user hasn't given any 
indication that the data is valid or that there is even data to be valid.

> RAID6 is different in that it always calculates new parity and Q.
> So you don't need to initial resync to get the parity correct.
> And I don't think mdadm fiddles with spares for RAID6.
>
> Just to clarify:  it is perfectly OK to write data to an array before
> the initial resync/recovery is finished.  But, on raid5, that data is
> not safe from a single-drive-failure until the resync/recovery is
> complete.
>   

And that's another reason to rethink the way this is done. The system is 
resyncing stripes which have not been written, generating meaningless 
parity from meaningless residual data, rather than doing a resync 
preferentially on stripes where data *has* been written. That seems to 
slow the system doing meaningless work while delaying protection of 
valid data.

In other threads there has been discussion of tracking portions of the 
array which have not been used. That would seem to be the ideal solution 
here, a bitmap reflecting the portions of the array which have not been 
used, and do the resync on a demand basis, per-stripe, on first use of 
the stripe. So creating a new array wouldn't kick off a resync unless 
you were really adding a spare or used an option to force it. I have to 
believe that the portion of users using create to do rescue of a 
partially munged array is far lower than the portion which uses create 
with no valid data in the members.

You also discussed bits for stripes not in use, perhaps this could all 
be rolled into a single enhancement to provide all of those benefits at 
once. It also has the possibility of speeding the 'migrate' 
functionality by only moving the portions of the drive which have been used.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc

"You are disgraced professional losers. And by the way, give us our money back."
    - Representative Earl Pomeroy,  Democrat of North Dakota
on the A.I.G. executives who were paid bonuses  after a federal bailout.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-05-21 18:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-15  2:15 Starting RAID 5 Leslie Rhorer
2009-05-15  2:34 ` NeilBrown
2009-05-15  3:58   ` Leslie Rhorer
2009-05-18 15:13   ` Bill Davidsen
2009-05-18 21:36     ` NeilBrown
2009-05-20 19:45       ` Bill Davidsen
2009-05-20 22:28         ` Neil Brown
2009-05-21 18:27           ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).