misunderstanding of spare and raid devices?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* misunderstanding of spare and raid devices?
@ 2011-06-30 10:51 Karsten Römke
  2011-06-30 10:58 ` Robin Hill
  2011-06-30 11:30 ` John Robinson
  0 siblings, 2 replies; 19+ messages in thread
From: Karsten Römke @ 2011-06-30 10:51 UTC (permalink / raw)
  To: linux-raid

Hello,
I'm searching some hours / minutes to create a raid5 device with 4 disks and 1 spare:
I tried first with the opensuse tool but no success as I want, so I tried mdadm

Try:
  mdadm --create /dev/md0 --level=5 --raid-devices=4 --spare-devices=1 /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5

leads to
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active (auto-read-only) raid5 sdd5[5](S) sde5[4](S) sdc5[2] sdb2[1] sda3[0]
       13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

2 spares - I don't understand that.


kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5
leads to
md0 : active (auto-read-only) raid5 sdd5[4](S) sdc5[2] sdb2[1] sda3[0]
       13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

1 spare - but why - I expect 4 active disks and 1 spare

kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5
leads to
md0 : active (auto-read-only) raid5 sde5[5](S) sdd5[3] sdc5[2] sdb2[1] sda3[0]
       18345728 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]

That's what I want, but I reached it more or less by random.
Where is my "think-error" (in german).

I use
kspace9:~ # mdadm --version
mdadm - v3.0.3 - 22nd October 2009

Any hints would be nice
karsten

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices?
  2011-06-30 10:51 misunderstanding of spare and raid devices? Karsten Römke
@ 2011-06-30 10:58 ` Robin Hill
  2011-06-30 13:09   ` Karsten Römke
  2011-06-30 11:30 ` John Robinson
  1 sibling, 1 reply; 19+ messages in thread
From: Robin Hill @ 2011-06-30 10:58 UTC (permalink / raw)
  To: Karsten Römke; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

On Thu Jun 30, 2011 at 12:51:37 +0200, Karsten Römke wrote:

> Hello,
> I'm searching some hours / minutes to create a raid5 device with 4
> disks and 1 spare:
> I tried first with the opensuse tool but no success as I want, so I
> tried mdadm
> 
> Try:
>   mdadm --create /dev/md0 --level=5 --raid-devices=4 --spare-devices=1
>   /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5
> 
> leads to
> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active (auto-read-only) raid5 sdd5[5](S) sde5[4](S) sdc5[2] sdb2[1] sda3[0]
>        13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
> 
> 2 spares - I don't understand that.
> 
That's perfectly normal. The RAID5 array is created in degraded mode,
then recovered onto the final disk. That way it becomes available for
use immediately, rather than requiring all the parity to be calculated
before the array is ready. As it's been started in auto-read-only mode
(not sure why though) then it hasn't started recovery yet. Running
"mdadm -w /dev/md0" or mounting the array will kick it into read-write
mode and start the recovery process.

HTH,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re:  Re: misunderstanding of spare and raid devices?
  2011-06-30 10:58 ` Robin Hill
@ 2011-06-30 13:09   ` Karsten Römke
  0 siblings, 0 replies; 19+ messages in thread
From: Karsten Römke @ 2011-06-30 13:09 UTC (permalink / raw)
  To: linux-raid

Hello,
I suppose it works now.
After mdadm -w /dev/md0 it starts synching.
md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0]
       13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
       [=>...................]  recovery =  6.2% (286656/4586432) finish=0.9min speed=71664K/sec
thanks to both of you.
Karsten

Am 30.06.2011 12:58, schrieb Robin Hill:
> On Thu Jun 30, 2011 at 12:51:37 +0200, Karsten Römke wrote:
>
>> Hello,
>> I'm searching some hours / minutes to create a raid5 device with 4
>> disks and 1 spare:
>> I tried first with the opensuse tool but no success as I want, so I
>> tried mdadm
>>
>> Try:
>>    mdadm --create /dev/md0 --level=5 --raid-devices=4 --spare-devices=1
>>    /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5
>>
>> leads to
>> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
>> md0 : active (auto-read-only) raid5 sdd5[5](S) sde5[4](S) sdc5[2] sdb2[1] sda3[0]
>>         13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>>
>> 2 spares - I don't understand that.
>>
> That's perfectly normal. The RAID5 array is created in degraded mode,
> then recovered onto the final disk. That way it becomes available for
> use immediately, rather than requiring all the parity to be calculated
> before the array is ready. As it's been started in auto-read-only mode
> (not sure why though) then it hasn't started recovery yet. Running
> "mdadm -w /dev/md0" or mounting the array will kick it into read-write
> mode and start the recovery process.
>
> HTH,
>      Robin

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices?
  2011-06-30 10:51 misunderstanding of spare and raid devices? Karsten Römke
  2011-06-30 10:58 ` Robin Hill
@ 2011-06-30 11:30 ` John Robinson
  2011-06-30 12:32   ` Phil Turmel
  1 sibling, 1 reply; 19+ messages in thread
From: John Robinson @ 2011-06-30 11:30 UTC (permalink / raw)
  To: Karsten Römke; +Cc: linux-raid

On 30/06/2011 11:51, Karsten Römke wrote:
> Hello,
> I'm searching some hours / minutes to create a raid5 device with 4 disks
> and 1 spare:
> I tried first with the opensuse tool but no success as I want, so I
> tried mdadm
>
> Try:
> mdadm --create /dev/md0 --level=5 --raid-devices=4 --spare-devices=1
> /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5
>
> leads to
> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active (auto-read-only) raid5 sdd5[5](S) sde5[4](S) sdc5[2]
> sdb2[1] sda3[0]
> 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>
> 2 spares - I don't understand that.
>
>
> kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sda3
> /dev/sdb2 /dev/sdc5 /dev/sdd5
> leads to
> md0 : active (auto-read-only) raid5 sdd5[4](S) sdc5[2] sdb2[1] sda3[0]
> 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>
> 1 spare - but why - I expect 4 active disks and 1 spare
>
> kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda3
> /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5
> leads to
> md0 : active (auto-read-only) raid5 sde5[5](S) sdd5[3] sdc5[2] sdb2[1]
> sda3[0]
> 18345728 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
>
> That's what I want, but I reached it more or less by random.
> Where is my "think-error" (in german).
>
> I use
> kspace9:~ # mdadm --version
> mdadm - v3.0.3 - 22nd October 2009
>
> Any hints would be nice

When you create a RAID 5 array, it starts degraded, and a resync is 
performed from the first N-1 drives to the last one. If you create a 
5-drive RAID-5, this shows up as 4 drives and a spare, but once the 
resync is finished it's 5 active drives. Going back to your first 
attempt, it'll show as 3 drives and 2 spares, but once the initial 
resync is finished, it'll be 4 drives and 1 spare.

mdadm --detail /dev/md0 will show more information to confirm that this 
is what is happening.

Hope this helps.

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices?
  2011-06-30 11:30 ` John Robinson
@ 2011-06-30 12:32   ` Phil Turmel
  2011-06-30 12:52     ` misunderstanding of spare and raid devices? - and one question more Karsten Römke
  0 siblings, 1 reply; 19+ messages in thread
From: Phil Turmel @ 2011-06-30 12:32 UTC (permalink / raw)
  To: Karsten Römke; +Cc: John Robinson, linux-raid

Hi Karsten,

On 06/30/2011 07:30 AM, John Robinson wrote:
> On 30/06/2011 11:51, Karsten Römke wrote:
>> Hello,
>> I'm searching some hours / minutes to create a raid5 device with 4 disks
>> and 1 spare:
>> I tried first with the opensuse tool but no success as I want, so I
>> tried mdadm
>>
>> Try:
>> mdadm --create /dev/md0 --level=5 --raid-devices=4 --spare-devices=1
>> /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5
>>
>> leads to
>> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
>> md0 : active (auto-read-only) raid5 sdd5[5](S) sde5[4](S) sdc5[2]
>> sdb2[1] sda3[0]
>> 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>>
>> 2 spares - I don't understand that.

Just to clarify for you, as your comment below suggests some confusion as to the role of a spare:

When the resync finished on this, if you had let it, you would have had three drives' capacity, with parity interspersed, on four drives.  The fifth drive would have been idle, but ready to replace any of the other four without intervention from you.

>> kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sda3
>> /dev/sdb2 /dev/sdc5 /dev/sdd5
>> leads to
>> md0 : active (auto-read-only) raid5 sdd5[4](S) sdc5[2] sdb2[1] sda3[0]
>> 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>>
>> 1 spare - but why - I expect 4 active disks and 1 spare
>>
>> kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda3
>> /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5
>> leads to
>> md0 : active (auto-read-only) raid5 sde5[5](S) sdd5[3] sdc5[2] sdb2[1]
>> sda3[0]
>> 18345728 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]

This will end up with four drives' capacity, with parity interspersed, on five drives.  No spare.

>> That's what I want, but I reached it more or less by random.
>> Where is my "think-error" (in german).

I hope this helps you decide which layout is the one you really want.  If you think you want the first layout, you should also consider raid6 (dual redundancy).  There's a performance penalty, but your data would be significantly safer.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-06-30 12:32   ` Phil Turmel
@ 2011-06-30 12:52     ` Karsten Römke
  2011-06-30 13:34       ` Phil Turmel
  0 siblings, 1 reply; 19+ messages in thread
From: Karsten Römke @ 2011-06-30 12:52 UTC (permalink / raw)
  To: linux-raid

Hi Phil,
your explanations are well for me.
Am 30.06.2011 14:32, schrieb Phil Turmel:
> Hi Karsten,
>
> Just to clarify for you, as your comment below suggests some confusion as to the role of a spare:
>
> When the resync finished on this, if you had let it, you would have had three drives' capacity, with parity interspersed, on four drives.  The fifth drive would have been idle, but ready to replace any of the other four without intervention from you.
Yes, I understand in exact that way - but I was wondering why I see 2 Spares
>
>>> 1 spare - but why - I expect 4 active disks and 1 spare
>>>
>>> kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda3
>>> /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5
>>> leads to
>>> md0 : active (auto-read-only) raid5 sde5[5](S) sdd5[3] sdc5[2] sdb2[1]
>>> sda3[0]
>>> 18345728 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
>
> This will end up with four drives' capacity, with parity interspersed, on five drives.  No spare.
>
>>> That's what I want, but I reached it more or less by random.
>>> Where is my "think-error" (in german).
No - that's not what I want, but it seems first to be the right way.
After my posting before put the raid back to lvm I do mdadm --detail
and see, that the capacity cant't match, I have around 16 GB, I expected
12 GB - so I decided to stop my experiments - until I get a hint, which
comes very fast.
>
> I hope this helps you decide which layout is the one you really want.
If you think you want the first layout, you should also consider raid6 (dual redundancy).
There's a performance penalty, but your data would be significantly safer.
I have to say, I haved looked at  raid 6 only at a glance.
Are there any experiences in which percentage the performance penalty is to expect?
Thanks
      Karsten

>
> Phil
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-06-30 12:52     ` misunderstanding of spare and raid devices? - and one question more Karsten Römke
@ 2011-06-30 13:34       ` Phil Turmel
  2011-06-30 14:05         ` Karsten Römke
  2011-06-30 14:21         ` Karsten Römke
  0 siblings, 2 replies; 19+ messages in thread
From: Phil Turmel @ 2011-06-30 13:34 UTC (permalink / raw)
  To: Karsten Römke; +Cc: linux-raid

On 06/30/2011 08:52 AM, Karsten Römke wrote:

[...]

>> This will end up with four drives' capacity, with parity interspersed, on five drives.  No spare.
>>
>>>> That's what I want, but I reached it more or less by random.
>>>> Where is my "think-error" (in german).
> No - that's not what I want, but it seems first to be the right way.
> After my posting before put the raid back to lvm I do mdadm --detail
> and see, that the capacity cant't match, I have around 16 GB, I expected
> 12 GB - so I decided to stop my experiments - until I get a hint, which
> comes very fast.

So the first layout is the one you wanted.  Each drive is ~4GB ?  Or is this just a test setup?

>> I hope this helps you decide which layout is the one you really want.
> If you think you want the first layout, you should also consider raid6 (dual redundancy).
> There's a performance penalty, but your data would be significantly safer.
> I have to say, I haved looked at  raid 6 only at a glance.
> Are there any experiences in which percentage the performance penalty is to expect?

I don't have percentages to share, no.  They would vary a lot based on number of disks and type of CPU.  As an estimate though, you can expect raid6 to be about as fast as raid5 when reading from a non-degraded array.  Certain read workloads could even be faster, as the data is spread over more spindles.  It will be slower to write in all cases.  The extra "Q" parity for raid6 is quite complex to calculate.  In a single disk failure situation, both raid5 and raid6 will use the "P" parity to reconstruct the missing information, so their single-degraded read performance will be comparable.  With two disk failures, raid6 performance plummets, as every read requires a complete inverse "Q" solution.  Of course, two disk failures in raid5 stops your system.  So running at a crawl, with data intact, is better than no data.

You should also consider the odds of failure during rebuild, which is a serious concern for large raid5 arrays.  This was discussed recently on this list:

http://marc.info/?l=linux-raid&m=130754284831666&w=2

If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-06-30 13:34       ` Phil Turmel
@ 2011-06-30 14:05         ` Karsten Römke
  2011-06-30 14:21         ` Karsten Römke
  1 sibling, 0 replies; 19+ messages in thread
From: Karsten Römke @ 2011-06-30 14:05 UTC (permalink / raw)
  To: linux-raid

Hi Phil
>
> So the first layout is the one you wanted.  Each drive is ~4GB ?  Or is this just a test setup?
It's not a test setup. Historical reasons. I started whith Linux around 1995 and
use software raid a long time. So I have this 4GB partition a long time and when
I decide to upgrade storage or a hd says goodby, I use a new 4GB partition...
Later I put more raid-arrays under a lvm, so I have no trouble with space on a
single partition.
>> Are there any experiences in which percentage the performance penalty is to expect?
>
> I don't have percentages to share, no.  They would vary a lot based on number of disks
> and type of CPU.  As an estimate though, you can expect raid6 to be about as fast as
> raid5 when reading from a non-degraded array.  Certain read workloads could even be faster,
> as the data is spread over more spindles.  It will be slower to write in all cases.
> The extra "Q" parity for raid6 is quite complex to calculate.  In a single disk failure situation,
> both raid5 and raid6 will use the "P" parity to reconstruct the missing information, so
> their single-degraded read performance will be comparable.  With two disk failures,
> raid6 performance plummets, as every read requires a complete inverse "Q" solution.
> Of course, two disk failures in raid5 stops your system.  So running at a crawl, with data intact, is better than no data.
That's the reason to think about a spare disc
> You should also consider the odds of failure during rebuild, which is a serious concern for large raid5 arrays.
> This was discussed recently on this list:
>
> http://marc.info/?l=linux-raid&m=130754284831666&w=2

> If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare.
>
> Phil
>
I think there are free cycles, so I should try it.
Thanks
        Karsten


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-06-30 13:34       ` Phil Turmel
  2011-06-30 14:05         ` Karsten Römke
@ 2011-06-30 14:21         ` Karsten Römke
  2011-06-30 14:44           ` Phil Turmel
  2011-06-30 21:28           ` NeilBrown
  1 sibling, 2 replies; 19+ messages in thread
From: Karsten Römke @ 2011-06-30 14:21 UTC (permalink / raw)
  To: linux-raid

Hi Phil
>
> If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare.
>
> Phil
>
I started the raid 6 array and get:

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0]
       13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
       [=================>...]  resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec

when I started the raid 5 array I get

md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0]
       13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
       [=>...................]  recovery =  6.2% (286656/4586432) finish=0.9min speed=71664K/sec

so I have to expect a three times less write speed - or is this calculation
to simple ?

Karsten

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-06-30 14:21         ` Karsten Römke
@ 2011-06-30 14:44           ` Phil Turmel
  2011-07-02  8:34             ` Karsten Römke
  2011-06-30 21:28           ` NeilBrown
  1 sibling, 1 reply; 19+ messages in thread
From: Phil Turmel @ 2011-06-30 14:44 UTC (permalink / raw)
  To: Karsten Römke; +Cc: linux-raid

On 06/30/2011 10:21 AM, Karsten Römke wrote:
> Hi Phil
>>
>> If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare.
>>
>> Phil
>>
> I started the raid 6 array and get:
> 
> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0]
>       13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
>       [=================>...]  resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec
> 
> when I started the raid 5 array I get
> 
> md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0]
>       13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>       [=>...................]  recovery =  6.2% (286656/4586432) finish=0.9min speed=71664K/sec
> 
> so I have to expect a three times less write speed - or is this calculation
> to simple ?

That's a bigger difference than I would have expected for resync, which works in full stripes.  If you have a workload with many small random writes, this slowdown is quite possible.

Is your CPU maxed out while writing to the raid6?

Can you run some speed tests?  dd streaming read or write in one window, with "iostat -xm 1" in another is a decent test of peak performance.  bonnie++, dbench, and iozone are good for more generic workload simulation.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-06-30 14:44           ` Phil Turmel
@ 2011-07-02  8:34             ` Karsten Römke
  2011-07-02  9:42               ` David Brown
  0 siblings, 1 reply; 19+ messages in thread
From: Karsten Römke @ 2011-07-02  8:34 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

Hi Phil,
I have done some tests and appended the results, maybe they are of interest
for somebody. As a conclusion I would say raid5 and raid6 make in my situation
nearly no difference.
Thanks to all for hints and explanation
           Karsten



--------------------------------------------------------------------------------------
first - just copy a dir with a size of 2,9 GB, it was copied once before so I
think there are still data buffered?

OLDER war im Cache? hatte Dir vorher kopiert
kspace9:~ # date ; cp -a /home/roemke/HHertzTex/OLDER/* /raid5/ ; date
Fr 1. Jul 16:15:57 CEST 2011
Fr 1. Jul 16:16:26 CEST 2011

kspace9:~ # date ; cp -a /home/roemke/HHertzTex/OLDER/* /raid6/ ; date
Fr 1. Jul 16:17:27 CEST 2011
Fr 1. Jul 16:17:58 CEST 2011

--------------------------------------------------------------------------------------
now a test with bonnie, I found this  example online and the parameters seems
senseful to me (I've never done performance tests on hd's before, so I searched
for an example)
bonnie, found
-n 0 : file creation  0
-u 0 : root
-r   : memory in megabyte (calculated to 7999)
-s   : file size calculated to 15998
-f   : fast, skip per char IO-tests
-b   : no write buffering
-d   : set directory

kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk 
'{print $2}'`*2" | bc -l) -f -b -d /raid5
Using uid:0, gid:0.
Writing intelligently...done
Rewriting...done
Reading intelligently...done
start 'em...done...done...done...
Version 1.03d       ------Sequential Output------ --Sequential Input- --Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
kspace9      15998M           96365  20 48302  12           149445  18 113.7   0
kspace9,15998M,,,96365,20,48302,12,,,149445,18,113.7,0,,,,,,,,,,,,,

kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk
'{print $2}'`*2" | bc -l) -f -b -d /raid6
Using uid:0, gid:0.
Writing intelligently...done
Rewriting...done
Reading intelligently...done
start 'em...done...done...done...
Version 1.03d       ------Sequential Output------ --Sequential Input- --Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
kspace9      15998M           100321  22 48617  13           131651  16 120.2   1
kspace9,15998M,,,100321,22,48617,13,,,131651,16,120.2,1,,,,,,,,,,,,,


===================================================================================
results for old raid 1:
a test of old raid 1 which I have done unintended, because I forgot to mount
the raid array :-)
mounten vergessen ---------------------------------------------------------------------
-> vergleichsresultate :-)

kspace9:~ # date ; cp -r /home/roemke/HHertzTex/OLDER/ /raid5/ ; date
Fr 1. Jul 16:07:32 CEST 2011                              ^^^ not raid 5, old raid 1, forgot to mount
Fr 1. Jul 16:08:39 CEST 2011
aehnlich (similiar)


test mit bonnie++
kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk 
'{print $2}'`*2" | bc -l) -f -b -d /raid5 <-- not raid 5, still the older raid 1
Using uid:0, gid:0.
Writing intelligently...done
Rewriting...done
Reading intelligently...done
start 'em...done...done...done...
Version 1.03d       ------Sequential Output------ --Sequential Input- --Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
kspace9      15998M           62977   9 34410   9           101979  13  66.7   0
kspace9,15998M,,,62977,9,34410,9,,,101979,13,66.7,0,,,,,,,,,,,,,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-07-02  8:34             ` Karsten Römke
@ 2011-07-02  9:42               ` David Brown
  0 siblings, 0 replies; 19+ messages in thread
From: David Brown @ 2011-07-02  9:42 UTC (permalink / raw)
  To: linux-raid

On 02/07/11 10:34, Karsten Römke wrote:
> Hi Phil,
> I have done some tests and appended the results, maybe they are of interest
> for somebody. As a conclusion I would say raid5 and raid6 make in my
> situation
> nearly no difference.
> Thanks to all for hints and explanation
> Karsten
>

If raid6 doesn't have any noticeable performance costs compared to raid5 
for your usage, then you should definitely use raid6 rather than raid5 + 
spare.  Think of it as raid5 + spare with the rebuild done in advance!

mvh.,

David



>
>
> --------------------------------------------------------------------------------------
>
> first - just copy a dir with a size of 2,9 GB, it was copied once before
> so I
> think there are still data buffered?
>
> OLDER war im Cache? hatte Dir vorher kopiert
> kspace9:~ # date ; cp -a /home/roemke/HHertzTex/OLDER/* /raid5/ ; date
> Fr 1. Jul 16:15:57 CEST 2011
> Fr 1. Jul 16:16:26 CEST 2011
>
> kspace9:~ # date ; cp -a /home/roemke/HHertzTex/OLDER/* /raid6/ ; date
> Fr 1. Jul 16:17:27 CEST 2011
> Fr 1. Jul 16:17:58 CEST 2011
>
> --------------------------------------------------------------------------------------
>
> now a test with bonnie, I found this example online and the parameters
> seems
> senseful to me (I've never done performance tests on hd's before, so I
> searched
> for an example)
> bonnie, found
> -n 0 : file creation 0
> -u 0 : root
> -r : memory in megabyte (calculated to 7999)
> -s : file size calculated to 15998
> -f : fast, skip per char IO-tests
> -b : no write buffering
> -d : set directory
>
> kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print
> $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk '{print $2}'`*2" |
> bc -l) -f -b -d /raid5
> Using uid:0, gid:0.
> Writing intelligently...done
> Rewriting...done
> Reading intelligently...done
> start 'em...done...done...done...
> Version 1.03d ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> kspace9 15998M 96365 20 48302 12 149445 18 113.7 0
> kspace9,15998M,,,96365,20,48302,12,,,149445,18,113.7,0,,,,,,,,,,,,,
>
> kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print
> $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk
> '{print $2}'`*2" | bc -l) -f -b -d /raid6
> Using uid:0, gid:0.
> Writing intelligently...done
> Rewriting...done
> Reading intelligently...done
> start 'em...done...done...done...
> Version 1.03d ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> kspace9 15998M 100321 22 48617 13 131651 16 120.2 1
> kspace9,15998M,,,100321,22,48617,13,,,131651,16,120.2,1,,,,,,,,,,,,,
>
>
> ===================================================================================
>
> results for old raid 1:
> a test of old raid 1 which I have done unintended, because I forgot to
> mount
> the raid array :-)
> mounten vergessen
> ---------------------------------------------------------------------
> -> vergleichsresultate :-)
>
> kspace9:~ # date ; cp -r /home/roemke/HHertzTex/OLDER/ /raid5/ ; date
> Fr 1. Jul 16:07:32 CEST 2011 ^^^ not raid 5, old raid 1, forgot to mount
> Fr 1. Jul 16:08:39 CEST 2011
> aehnlich (similiar)
>
>
> test mit bonnie++
> kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print
> $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk '{print $2}'`*2" |
> bc -l) -f -b -d /raid5 <-- not raid 5, still the older raid 1
> Using uid:0, gid:0.
> Writing intelligently...done
> Rewriting...done
> Reading intelligently...done
> start 'em...done...done...done...
> Version 1.03d ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> kspace9 15998M 62977 9 34410 9 101979 13 66.7 0
> kspace9,15998M,,,62977,9,34410,9,,,101979,13,66.7,0,,,,,,,,,,,,,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-06-30 14:21         ` Karsten Römke
  2011-06-30 14:44           ` Phil Turmel
@ 2011-06-30 21:28           ` NeilBrown
  2011-07-01  7:23             ` David Brown
  1 sibling, 1 reply; 19+ messages in thread
From: NeilBrown @ 2011-06-30 21:28 UTC (permalink / raw)
  To: Karsten Römke; +Cc: linux-raid

On Thu, 30 Jun 2011 16:21:57 +0200 Karsten Römke <k.roemke@gmx.de> wrote:

> Hi Phil
> >
> > If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare.
> >
> > Phil
> >
> I started the raid 6 array and get:
> 
> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0]
>        13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
>        [=================>...]  resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec
                                  ^^^^^^
Note: resync

> 
> when I started the raid 5 array I get
> 
> md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0]
>        13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>        [=>...................]  recovery =  6.2% (286656/4586432) finish=0.9min speed=71664K/sec
                                  ^^^^^^^^
Note: recovery.

> 
> so I have to expect a three times less write speed - or is this calculation
> to simple ?
> 

You are comparing two different things, neither of which is write speed.
If you want to measure write speed, you should try writing and measure that.

When you create a RAID5 mdadm deliberately triggers recovery rather than
resync as it is likely to be faster.  This is why you see a missed device and
an extra spare.  I don't remember why it doesn't with RAID6.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-06-30 21:28           ` NeilBrown
@ 2011-07-01  7:23             ` David Brown
  2011-07-01  8:50               ` Robin Hill
  0 siblings, 1 reply; 19+ messages in thread
From: David Brown @ 2011-07-01  7:23 UTC (permalink / raw)
  To: linux-raid

On 30/06/2011 23:28, NeilBrown wrote:
> On Thu, 30 Jun 2011 16:21:57 +0200 Karsten Römke<k.roemke@gmx.de>  wrote:
>
>> Hi Phil
>>>
>>> If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare.
>>>
>>> Phil
>>>
>> I started the raid 6 array and get:
>>
>> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
>> md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0]
>>         13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
>>         [=================>...]  resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec
>                                    ^^^^^^
> Note: resync
>
>>
>> when I started the raid 5 array I get
>>
>> md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0]
>>         13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>>         [=>...................]  recovery =  6.2% (286656/4586432) finish=0.9min speed=71664K/sec
>                                    ^^^^^^^^
> Note: recovery.
>
>>
>> so I have to expect a three times less write speed - or is this calculation
>> to simple ?
>>
>
> You are comparing two different things, neither of which is write speed.
> If you want to measure write speed, you should try writing and measure that.
>
> When you create a RAID5 mdadm deliberately triggers recovery rather than
> resync as it is likely to be faster.  This is why you see a missed device and
> an extra spare.  I don't remember why it doesn't with RAID6.
>

What's the difference between a "resync" and a "recovery"?  Is it that a 
"resync" will read the whole stripe, check if it is valid, and if it is 
not it then generates the parity, while a "recovery" will always 
generate the parity?

If that's the case, then one reason it might not do that with raid6 is 
if the code is common with the raid5 to raid6 grow case.  Then a 
"resync" would leave the raid5 parity untouched, so that the set keeps 
some redundancy, whereas a "recovery" would temporarily leave the stripe 
unprotected.


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-07-01  7:23             ` David Brown
@ 2011-07-01  8:50               ` Robin Hill
  2011-07-01 10:18                 ` David Brown
  0 siblings, 1 reply; 19+ messages in thread
From: Robin Hill @ 2011-07-01  8:50 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2690 bytes --]

On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote:

> On 30/06/2011 23:28, NeilBrown wrote:
> > On Thu, 30 Jun 2011 16:21:57 +0200 Karsten Römke<k.roemke@gmx.de>  wrote:
> >
> >> Hi Phil
> >>>
> >>> If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare.
> >>>
> >>> Phil
> >>>
> >> I started the raid 6 array and get:
> >>
> >> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> >> md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0]
> >>         13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
> >>         [=================>...]  resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec
> >                                    ^^^^^^
> > Note: resync
> >
> >>
> >> when I started the raid 5 array I get
> >>
> >> md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0]
> >>         13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
> >>         [=>...................]  recovery =  6.2% (286656/4586432) finish=0.9min speed=71664K/sec
> >                                    ^^^^^^^^
> > Note: recovery.
> >
> >>
> >> so I have to expect a three times less write speed - or is this calculation
> >> to simple ?
> >>
> >
> > You are comparing two different things, neither of which is write speed.
> > If you want to measure write speed, you should try writing and measure that.
> >
> > When you create a RAID5 mdadm deliberately triggers recovery rather than
> > resync as it is likely to be faster.  This is why you see a missed device and
> > an extra spare.  I don't remember why it doesn't with RAID6.
> >
> 
> What's the difference between a "resync" and a "recovery"?  Is it that a 
> "resync" will read the whole stripe, check if it is valid, and if it is 
> not it then generates the parity, while a "recovery" will always 
> generate the parity?
> 
From the names, recovery would mean that it's reading from N-1 disks,
and recreating data/parity to rebuild the final disk (as when it
recovers from a drive failure), whereas resync will be reading from all
N disks and checking/recreating the parity (as when you're running a
repair on the array).

The main reason I can see for doing a resync on RAID6 rather than a
recovery is if the data reconstruction from the Q parity is far slower
that the construction of the Q parity itself (I've no idea how the
mathematics works out for this).

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-07-01  8:50               ` Robin Hill
@ 2011-07-01 10:18                 ` David Brown
  2011-07-01 11:29                   ` Robin Hill
  0 siblings, 1 reply; 19+ messages in thread
From: David Brown @ 2011-07-01 10:18 UTC (permalink / raw)
  To: linux-raid

On 01/07/2011 10:50, Robin Hill wrote:
> On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote:
>
>> On 30/06/2011 23:28, NeilBrown wrote:
>>> On Thu, 30 Jun 2011 16:21:57 +0200 Karsten Römke<k.roemke@gmx.de>   wrote:
>>>
>>>> Hi Phil
>>>>>
>>>>> If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare.
>>>>>
>>>>> Phil
>>>>>
>>>> I started the raid 6 array and get:
>>>>
>>>> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
>>>> md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0]
>>>>          13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
>>>>          [=================>...]  resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec
>>>                                     ^^^^^^
>>> Note: resync
>>>
>>>>
>>>> when I started the raid 5 array I get
>>>>
>>>> md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0]
>>>>          13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>>>>          [=>...................]  recovery =  6.2% (286656/4586432) finish=0.9min speed=71664K/sec
>>>                                     ^^^^^^^^
>>> Note: recovery.
>>>
>>>>
>>>> so I have to expect a three times less write speed - or is this calculation
>>>> to simple ?
>>>>
>>>
>>> You are comparing two different things, neither of which is write speed.
>>> If you want to measure write speed, you should try writing and measure that.
>>>
>>> When you create a RAID5 mdadm deliberately triggers recovery rather than
>>> resync as it is likely to be faster.  This is why you see a missed device and
>>> an extra spare.  I don't remember why it doesn't with RAID6.
>>>
>>
>> What's the difference between a "resync" and a "recovery"?  Is it that a
>> "resync" will read the whole stripe, check if it is valid, and if it is
>> not it then generates the parity, while a "recovery" will always
>> generate the parity?
>>
>  From the names, recovery would mean that it's reading from N-1 disks,
> and recreating data/parity to rebuild the final disk (as when it
> recovers from a drive failure), whereas resync will be reading from all
> N disks and checking/recreating the parity (as when you're running a
> repair on the array).
>
> The main reason I can see for doing a resync on RAID6 rather than a
> recovery is if the data reconstruction from the Q parity is far slower
> that the construction of the Q parity itself (I've no idea how the
> mathematics works out for this).
>

Well, data reconstruction from Q parity /is/ more demanding than 
constructing the Q parity in the first place (the mathematics is the 
part that I know about).  That's why a two-disk degraded raid6 array is 
significantly slower (or, more accurately, significantly more 
cpu-intensive) than a one-disk degraded raid6 array.

But that doesn't make a difference here - you are rebuilding one or two 
disks, so you have to use the data you've got whether you are doing a 
resync or a recovery.


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-07-01 10:18                 ` David Brown
@ 2011-07-01 11:29                   ` Robin Hill
  2011-07-01 12:45                     ` David Brown
  0 siblings, 1 reply; 19+ messages in thread
From: Robin Hill @ 2011-07-01 11:29 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2209 bytes --]

On Fri Jul 01, 2011 at 12:18:22PM +0200, David Brown wrote:

> On 01/07/2011 10:50, Robin Hill wrote:
> > On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote:
> >
> >> What's the difference between a "resync" and a "recovery"?  Is it that a
> >> "resync" will read the whole stripe, check if it is valid, and if it is
> >> not it then generates the parity, while a "recovery" will always
> >> generate the parity?
> >>
> >  From the names, recovery would mean that it's reading from N-1 disks,
> > and recreating data/parity to rebuild the final disk (as when it
> > recovers from a drive failure), whereas resync will be reading from all
> > N disks and checking/recreating the parity (as when you're running a
> > repair on the array).
> >
> > The main reason I can see for doing a resync on RAID6 rather than a
> > recovery is if the data reconstruction from the Q parity is far slower
> > that the construction of the Q parity itself (I've no idea how the
> > mathematics works out for this).
> >
> 
> Well, data reconstruction from Q parity /is/ more demanding than 
> constructing the Q parity in the first place (the mathematics is the 
> part that I know about).  That's why a two-disk degraded raid6 array is 
> significantly slower (or, more accurately, significantly more 
> cpu-intensive) than a one-disk degraded raid6 array.
> 
> But that doesn't make a difference here - you are rebuilding one or two 
> disks, so you have to use the data you've got whether you are doing a 
> resync or a recovery.
> 
Yes, but in a resync all the data you have available is the data
blocks, and you're reconstructing all the P and Q parity blocks. With a
recovery, the data you have available is some of the data blocks and some
of the P & Q parity blocks, so for some stripes you'll be reconstructing
the parity and for others you'll be regenerating the data using the
parity (and for some you'll be doing one of each).

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-07-01 11:29                   ` Robin Hill
@ 2011-07-01 12:45                     ` David Brown
  2011-07-01 13:02                       ` NeilBrown
  0 siblings, 1 reply; 19+ messages in thread
From: David Brown @ 2011-07-01 12:45 UTC (permalink / raw)
  To: linux-raid

On 01/07/2011 13:29, Robin Hill wrote:
> On Fri Jul 01, 2011 at 12:18:22PM +0200, David Brown wrote:
>
>> On 01/07/2011 10:50, Robin Hill wrote:
>>> On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote:
>>>
>>>> What's the difference between a "resync" and a "recovery"?  Is it that a
>>>> "resync" will read the whole stripe, check if it is valid, and if it is
>>>> not it then generates the parity, while a "recovery" will always
>>>> generate the parity?
>>>>
>>>    From the names, recovery would mean that it's reading from N-1 disks,
>>> and recreating data/parity to rebuild the final disk (as when it
>>> recovers from a drive failure), whereas resync will be reading from all
>>> N disks and checking/recreating the parity (as when you're running a
>>> repair on the array).
>>>
>>> The main reason I can see for doing a resync on RAID6 rather than a
>>> recovery is if the data reconstruction from the Q parity is far slower
>>> that the construction of the Q parity itself (I've no idea how the
>>> mathematics works out for this).
>>>
>>
>> Well, data reconstruction from Q parity /is/ more demanding than
>> constructing the Q parity in the first place (the mathematics is the
>> part that I know about).  That's why a two-disk degraded raid6 array is
>> significantly slower (or, more accurately, significantly more
>> cpu-intensive) than a one-disk degraded raid6 array.
>>
>> But that doesn't make a difference here - you are rebuilding one or two
>> disks, so you have to use the data you've got whether you are doing a
>> resync or a recovery.
>>
> Yes, but in a resync all the data you have available is the data
> blocks, and you're reconstructing all the P and Q parity blocks. With a
> recovery, the data you have available is some of the data blocks and some
> of the P&  Q parity blocks, so for some stripes you'll be reconstructing
> the parity and for others you'll be regenerating the data using the
> parity (and for some you'll be doing one of each).
>

If were that simple, then the resync (as used by RAID6 creates) would 
not be so much slower the recovery used in a RAID5 build...

With a resync, you first check if the parity blocks are correct (by 
generating them from the data blocks and comparing them to the read 
parity blocks).  If they are not correct, you write out the parity 
blocks.  With a recovery, you /know/ that one block is incorrect and 
re-generate that (from the data blocks if it is a parity block, or using 
the parities if it is a data block).

Consider the two cases raid5 and raid6 separately.

When you build your raid5 array, there is nothing worth keeping in the 
data - the aim is simply to make the stripes consistent.  There are two 
possible routes - consider the data blocks to be "correct" and do a 
resync to make sure the parity blocks match, or consider the first n-1 
disks to be "correct" and do a recovery to make sure the n'th disk 
matches.  For recovery, that means reading n-1 blocks in a stripe, doing 
a big xor, and writing out the remaining block (whether it is data or 
parity).  For rsync, it means reading all n blocks, and checking the 
xor.  If there is no match (which will be the norm when building an 
array), then the correct parity is calculated and written out.  Thus an 
rsync takes longer than a recovery, and a recovery is used.

When you build your raid6 array, you have the same two choices.  For an 
rsync, you have to read all n blocks, calculate P and Q, compare them, 
then (as there will be no match) write out P and Q.  In comparison to 
the raid5 recovery, you've done a couple of unnecessary block reads and 
compares, and the time-consuming Q calculation and write.  But if you 
chose recovery, then you'd be assuming the first n-2 blocks are correct 
and re-calculating the last two blocks.  This avoids the extra reads and 
compares, but if the two parity blocks are within the first n-2 blocks 
read, then the recovery calculations will be much slower.  Hence an 
rsync is faster for raid6.

I suppose the raid6 build could be optimised a little by skipping the 
extra reads when you know in advance that they will not match.  But 
either that is already being done, or it is considered a small issue 
that is not worth changing (since it only has an effect during the 
initial build).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: misunderstanding of spare and raid devices? - and one question more
  2011-07-01 12:45                     ` David Brown
@ 2011-07-01 13:02                       ` NeilBrown
  0 siblings, 0 replies; 19+ messages in thread
From: NeilBrown @ 2011-07-01 13:02 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

On Fri, 01 Jul 2011 14:45:00 +0200 David Brown <david@westcontrol.com> wrote:

> On 01/07/2011 13:29, Robin Hill wrote:
> > On Fri Jul 01, 2011 at 12:18:22PM +0200, David Brown wrote:
> >
> >> On 01/07/2011 10:50, Robin Hill wrote:
> >>> On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote:
> >>>
> >>>> What's the difference between a "resync" and a "recovery"?  Is it that a
> >>>> "resync" will read the whole stripe, check if it is valid, and if it is
> >>>> not it then generates the parity, while a "recovery" will always
> >>>> generate the parity?
> >>>>
> >>>    From the names, recovery would mean that it's reading from N-1 disks,
> >>> and recreating data/parity to rebuild the final disk (as when it
> >>> recovers from a drive failure), whereas resync will be reading from all
> >>> N disks and checking/recreating the parity (as when you're running a
> >>> repair on the array).
> >>>
> >>> The main reason I can see for doing a resync on RAID6 rather than a
> >>> recovery is if the data reconstruction from the Q parity is far slower
> >>> that the construction of the Q parity itself (I've no idea how the
> >>> mathematics works out for this).
> >>>
> >>
> >> Well, data reconstruction from Q parity /is/ more demanding than
> >> constructing the Q parity in the first place (the mathematics is the
> >> part that I know about).  That's why a two-disk degraded raid6 array is
> >> significantly slower (or, more accurately, significantly more
> >> cpu-intensive) than a one-disk degraded raid6 array.
> >>
> >> But that doesn't make a difference here - you are rebuilding one or two
> >> disks, so you have to use the data you've got whether you are doing a
> >> resync or a recovery.
> >>
> > Yes, but in a resync all the data you have available is the data
> > blocks, and you're reconstructing all the P and Q parity blocks. With a
> > recovery, the data you have available is some of the data blocks and some
> > of the P&  Q parity blocks, so for some stripes you'll be reconstructing
> > the parity and for others you'll be regenerating the data using the
> > parity (and for some you'll be doing one of each).
> >
> 
> If were that simple, then the resync (as used by RAID6 creates) would 
> not be so much slower the recovery used in a RAID5 build...
> 
> With a resync, you first check if the parity blocks are correct (by 
> generating them from the data blocks and comparing them to the read 
> parity blocks).  If they are not correct, you write out the parity 
> blocks.  With a recovery, you /know/ that one block is incorrect and 
> re-generate that (from the data blocks if it is a parity block, or using 
> the parities if it is a data block).
> 
> Consider the two cases raid5 and raid6 separately.
> 
> When you build your raid5 array, there is nothing worth keeping in the 
> data - the aim is simply to make the stripes consistent.  There are two 
> possible routes - consider the data blocks to be "correct" and do a 
> resync to make sure the parity blocks match, or consider the first n-1 
> disks to be "correct" and do a recovery to make sure the n'th disk 
> matches.  For recovery, that means reading n-1 blocks in a stripe, doing 
> a big xor, and writing out the remaining block (whether it is data or 
> parity).  For rsync, it means reading all n blocks, and checking the 
> xor.  If there is no match (which will be the norm when building an 
> array), then the correct parity is calculated and written out.  Thus an 
> rsync takes longer than a recovery, and a recovery is used.
> 
> When you build your raid6 array, you have the same two choices.  For an 
> rsync, you have to read all n blocks, calculate P and Q, compare them, 
> then (as there will be no match) write out P and Q.  In comparison to 
> the raid5 recovery, you've done a couple of unnecessary block reads and 
> compares, and the time-consuming Q calculation and write.  But if you 
> chose recovery, then you'd be assuming thve first n-2 blocks are correct 
> and re-calculating the last two blocks.  This avoids the extra reads and 
> compares, but if the two parity blocks are within the first n-2 blocks 
> read, then the recovery calculations will be much slower.  Hence an 
> rsync is faster for raid6.
> 
> I suppose the raid6 build could be optimised a little by skipping the 
> extra reads when you know in advance that they will not match.  But 
> either that is already being done, or it is considered a small issue 
> that is not worth changing (since it only has an effect during the 
> initial build).
> 

Almost everything you say is correct.
However I'm not convinced that a raid6 resync is faster than a raid6 recovery
(on devices where P and Q are not mostly correct).  I suspect it is just an
historical oversight that RAID6 doesn't force a recovery for the initial
create.

In any one would like to test it is easy to force a recovery by specifying
missing devices:

   mdadm -C /dev/md0 -l6 -n6 /dev/sd[abcd] missing missing -x2 /dev/sd[ef]

and easy to force a resync by using --force

   mdadm -C /dev/md0 -l5 -n5 /dev/sd[abcde] --force

It is only really a valid test if you know that the P and Q that resync will
read are not going to be correct most of the time.

NeilBrown

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-07-02  9:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-30 10:51 misunderstanding of spare and raid devices? Karsten Römke
2011-06-30 10:58 ` Robin Hill
2011-06-30 13:09   ` Karsten Römke
2011-06-30 11:30 ` John Robinson
2011-06-30 12:32   ` Phil Turmel
2011-06-30 12:52     ` misunderstanding of spare and raid devices? - and one question more Karsten Römke
2011-06-30 13:34       ` Phil Turmel
2011-06-30 14:05         ` Karsten Römke
2011-06-30 14:21         ` Karsten Römke
2011-06-30 14:44           ` Phil Turmel
2011-07-02  8:34             ` Karsten Römke
2011-07-02  9:42               ` David Brown
2011-06-30 21:28           ` NeilBrown
2011-07-01  7:23             ` David Brown
2011-07-01  8:50               ` Robin Hill
2011-07-01 10:18                 ` David Brown
2011-07-01 11:29                   ` Robin Hill
2011-07-01 12:45                     ` David Brown
2011-07-01 13:02                       ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).