Issue with growing RAID10

All of lore.kernel.org
 help / color / mirror / Atom feed

* Issue with growing RAID10
@ 2016-11-02 17:59 Robert LeBlanc
  2016-11-02 18:09 ` Wols Lists
  2016-11-02 18:19 ` keld
  0 siblings, 2 replies; 15+ messages in thread
From: Robert LeBlanc @ 2016-11-02 17:59 UTC (permalink / raw)
  To: linux-raid

We would like to add read performance to our RAID10 volume by adding
another drive (we don't care about space), so I did the following test
with poor results.

# mdadm --create /dev/md13 --level 10 --run --assume-clean -p n2
--raid-devices 2 /dev/loop{2..3}
mdadm: /dev/loop2 appears to be part of a raid array:
      level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
mdadm: /dev/loop3 appears to be part of a raid array:
      level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md13 started.

# mdadm --detail /dev/md13
/dev/md13:
       Version : 1.2
 Creation Time : Wed Nov  2 11:47:48 2016
    Raid Level : raid10
    Array Size : 10477568 (9.99 GiB 10.73 GB)
 Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
  Raid Devices : 2
 Total Devices : 2
   Persistence : Superblock is persistent

   Update Time : Wed Nov  2 11:47:48 2016
         State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
 Spare Devices : 0

        Layout : near=2
    Chunk Size : 512K

          Name : rleblanc-pc:13  (local to host rleblanc-pc)
          UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
        Events : 0

   Number   Major   Minor   RaidDevice State
      0       7        2        0      active sync set-A   /dev/loop2
      1       7        3        1      active sync set-B   /dev/loop3

# mdadm /dev/md13 -a /dev/loop4
mdadm: added /dev/loop4

# mdadm --detail /dev/md13
/dev/md13:
       Version : 1.2
 Creation Time : Wed Nov  2 11:47:48 2016
    Raid Level : raid10
    Array Size : 10477568 (9.99 GiB 10.73 GB)
 Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
  Raid Devices : 2
 Total Devices : 3
   Persistence : Superblock is persistent

   Update Time : Wed Nov  2 11:48:13 2016
         State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 0
 Spare Devices : 1

        Layout : near=2
    Chunk Size : 512K

          Name : rleblanc-pc:13  (local to host rleblanc-pc)
          UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
        Events : 1

   Number   Major   Minor   RaidDevice State
      0       7        2        0      active sync set-A   /dev/loop2
      1       7        3        1      active sync set-B   /dev/loop3

      2       7        4        -      spare   /dev/loop4

# mdadm --grow /dev/md13 -p n3 --raid-devices 3
mdadm: Cannot change number of copies when reshaping RAID10

I also tried to add the device, grow raid-devices, let it reshape,
then try to change the number of copies and it didn't like that
either. It would be nice to supply -p nX and --raid-devices X at the
same time to prevent the reshape and only copy the data over to the
new drive (or drop a drive out completely). I could see changing -p
separately or at a different rate of drives added/removed could be
difficult, but for lockstep changes, it seems that it would be rather
easy.

Any ideas?

Thanks,

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 17:59 Issue with growing RAID10 Robert LeBlanc
@ 2016-11-02 18:09 ` Wols Lists
  2016-11-02 18:13   ` Robert LeBlanc
  2016-11-02 18:19 ` keld
  1 sibling, 1 reply; 15+ messages in thread
From: Wols Lists @ 2016-11-02 18:09 UTC (permalink / raw)
  To: Robert LeBlanc, linux-raid

On 02/11/16 17:59, Robert LeBlanc wrote:
> We would like to add read performance to our RAID10 volume by adding
> another drive (we don't care about space), so I did the following test
> with poor results.

Quicky reply ...

I don't think you can change the number of raid-devices on a raid10. Are
you trying to replace a slow drive with a faster one? You can probably
use the --replace option.

If not that, what do you want to achieve?

Cheers,
Wol


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 18:09 ` Wols Lists
@ 2016-11-02 18:13   ` Robert LeBlanc
  0 siblings, 0 replies; 15+ messages in thread
From: Robert LeBlanc @ 2016-11-02 18:13 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid

Grow on RAID10 does work. Here is my previous attempt at trying to
change --raid-devices and -p separately.
# mdadm --detail /dev/md13
/dev/md13:
       Version : 1.2
 Creation Time : Wed Nov  2 11:25:22 2016
    Raid Level : raid10
    Array Size : 10477568 (9.99 GiB 10.73 GB)
 Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
  Raid Devices : 2
 Total Devices : 2
   Persistence : Superblock is persistent

   Update Time : Wed Nov  2 11:25:22 2016
         State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
 Spare Devices : 0

        Layout : near=2
    Chunk Size : 512K

          Name : rleblanc-pc:13  (local to host rleblanc-pc)
          UUID : 278c5e33:5ac1d25a:241a0cf7:66269542
        Events : 0

   Number   Major   Minor   RaidDevice State
      0       7        2        0      active sync set-A   /dev/loop2
      1       7        3        1      active sync set-B   /dev/loop3

# mdadm /dev/md13 -a /dev/loop4
mdadm: added /dev/loop4

root@rleblanc-pc:/home/rleblanc/Downloads# mdadm --detail /dev/md13
/dev/md13:
       Version : 1.2
 Creation Time : Wed Nov  2 11:25:22 2016
    Raid Level : raid10
    Array Size : 10477568 (9.99 GiB 10.73 GB)
 Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
  Raid Devices : 2
 Total Devices : 3
   Persistence : Superblock is persistent

   Update Time : Wed Nov  2 11:27:33 2016
         State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 0
 Spare Devices : 1

        Layout : near=2
    Chunk Size : 512K

          Name : rleblanc-pc:13  (local to host rleblanc-pc)
          UUID : 278c5e33:5ac1d25a:241a0cf7:66269542
        Events : 1

   Number   Major   Minor   RaidDevice State
      0       7        2        0      active sync set-A   /dev/loop2
      1       7        3        1      active sync set-B   /dev/loop3

      2       7        4        -      spare   /dev/loop4

# mdadm --grow /dev/md13 --raid-devices 3

# mdadm --detail /dev/md13
/dev/md13:
       Version : 1.2
 Creation Time : Wed Nov  2 11:25:22 2016
    Raid Level : raid10
    Array Size : 10477568 (9.99 GiB 10.73 GB)
 Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
  Raid Devices : 3
 Total Devices : 3
   Persistence : Superblock is persistent

   Update Time : Wed Nov  2 11:28:08 2016
         State : clean, reshaping
Active Devices : 3
Working Devices : 3
Failed Devices : 0
 Spare Devices : 0

        Layout : near=2
    Chunk Size : 512K

Reshape Status : 1% complete
 Delta Devices : 1, (2->3)

          Name : rleblanc-pc:13  (local to host rleblanc-pc)
          UUID : 278c5e33:5ac1d25a:241a0cf7:66269542
        Events : 12

   Number   Major   Minor   RaidDevice State
      0       7        2        0      active sync   /dev/loop2
      1       7        3        1      active sync   /dev/loop3
      2       7        4        2      active sync   /dev/loop4

----Wait for reshape to finish----
# mdadm --detail /dev/md13
/dev/md13:
       Version : 1.2
 Creation Time : Wed Nov  2 11:25:22 2016
    Raid Level : raid10
    Array Size : 15716352 (14.99 GiB 16.09 GB)
 Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
  Raid Devices : 3
 Total Devices : 3
   Persistence : Superblock is persistent

   Update Time : Wed Nov  2 11:33:25 2016
         State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
 Spare Devices : 0

        Layout : near=2
    Chunk Size : 512K

          Name : rleblanc-pc:13  (local to host rleblanc-pc)
          UUID : 278c5e33:5ac1d25a:241a0cf7:66269542
        Events : 49

   Number   Major   Minor   RaidDevice State
      0       7        2        0      active sync   /dev/loop2
      1       7        3        1      active sync   /dev/loop3
      2       7        4        2      active sync   /dev/loop4

# mdadm --grow /dev/md13 -p n3
mdadm: Cannot change number of copies when reshaping RAID10
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Nov 2, 2016 at 12:09 PM, Wols Lists <antlists@youngman.org.uk> wrote:
> On 02/11/16 17:59, Robert LeBlanc wrote:
>> We would like to add read performance to our RAID10 volume by adding
>> another drive (we don't care about space), so I did the following test
>> with poor results.
>
> Quicky reply ...
>
> I don't think you can change the number of raid-devices on a raid10. Are
> you trying to replace a slow drive with a faster one? You can probably
> use the --replace option.
>
> If not that, what do you want to achieve?
>
> Cheers,
> Wol
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 17:59 Issue with growing RAID10 Robert LeBlanc
  2016-11-02 18:09 ` Wols Lists
@ 2016-11-02 18:19 ` keld
  2016-11-02 19:02   ` Robert LeBlanc
  1 sibling, 1 reply; 15+ messages in thread
From: keld @ 2016-11-02 18:19 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: linux-raid

There is some speed limits om raid10,n2 as also reported in 
https://raid.wiki.kernel.org/index.php/Performance

f you want speed, I suggest you use raid10,f2.

Unfortunatlely you cannot grow "far" layouts, Neil says it is too complicated.

But in your case you should be  able to disable one of your raid10,N2 drives,
then build a raid10,n2 array for 3 disks, but only with the disk you removed from 
your N2 disk plus your new disk. Then you can copy the contents of the remaining
old disk to the new "far" disk, and when complete, add the old raid10,n2 disk to the 
new Far raid, with 3 disks. This should give you about 3 times the speed
of your old raid10,n2 array.

Best regards
keld



On Wed, Nov 02, 2016 at 11:59:25AM -0600, Robert LeBlanc wrote:
> We would like to add read performance to our RAID10 volume by adding
> another drive (we don't care about space), so I did the following test
> with poor results.
> 
> # mdadm --create /dev/md13 --level 10 --run --assume-clean -p n2
> --raid-devices 2 /dev/loop{2..3}
> mdadm: /dev/loop2 appears to be part of a raid array:
>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
> mdadm: /dev/loop3 appears to be part of a raid array:
>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md13 started.
> 
> # mdadm --detail /dev/md13
> /dev/md13:
>        Version : 1.2
>  Creation Time : Wed Nov  2 11:47:48 2016
>     Raid Level : raid10
>     Array Size : 10477568 (9.99 GiB 10.73 GB)
>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
>   Raid Devices : 2
>  Total Devices : 2
>    Persistence : Superblock is persistent
> 
>    Update Time : Wed Nov  2 11:47:48 2016
>          State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
>  Spare Devices : 0
> 
>         Layout : near=2
>     Chunk Size : 512K
> 
>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
>         Events : 0
> 
>    Number   Major   Minor   RaidDevice State
>       0       7        2        0      active sync set-A   /dev/loop2
>       1       7        3        1      active sync set-B   /dev/loop3
> 
> # mdadm /dev/md13 -a /dev/loop4
> mdadm: added /dev/loop4
> 
> # mdadm --detail /dev/md13
> /dev/md13:
>        Version : 1.2
>  Creation Time : Wed Nov  2 11:47:48 2016
>     Raid Level : raid10
>     Array Size : 10477568 (9.99 GiB 10.73 GB)
>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
>   Raid Devices : 2
>  Total Devices : 3
>    Persistence : Superblock is persistent
> 
>    Update Time : Wed Nov  2 11:48:13 2016
>          State : clean
> Active Devices : 2
> Working Devices : 3
> Failed Devices : 0
>  Spare Devices : 1
> 
>         Layout : near=2
>     Chunk Size : 512K
> 
>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
>         Events : 1
> 
>    Number   Major   Minor   RaidDevice State
>       0       7        2        0      active sync set-A   /dev/loop2
>       1       7        3        1      active sync set-B   /dev/loop3
> 
>       2       7        4        -      spare   /dev/loop4
> 
> # mdadm --grow /dev/md13 -p n3 --raid-devices 3
> mdadm: Cannot change number of copies when reshaping RAID10
> 
> I also tried to add the device, grow raid-devices, let it reshape,
> then try to change the number of copies and it didn't like that
> either. It would be nice to supply -p nX and --raid-devices X at the
> same time to prevent the reshape and only copy the data over to the
> new drive (or drop a drive out completely). I could see changing -p
> separately or at a different rate of drives added/removed could be
> difficult, but for lockstep changes, it seems that it would be rather
> easy.
> 
> Any ideas?
> 
> Thanks,
> 
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 18:19 ` keld
@ 2016-11-02 19:02   ` Robert LeBlanc
  2016-11-02 19:48     ` keld
  0 siblings, 1 reply; 15+ messages in thread
From: Robert LeBlanc @ 2016-11-02 19:02 UTC (permalink / raw)
  To: keld; +Cc: linux-raid

My boss basically wants RAID1 with all drives able to be read from. He
has a requirement to have all the drives identical (minus the
superblock) hence the 'near' option being used. From my rudimentary
tests, sequential reds do seem to use all drives, but random reads
don't. I wonder what logic is preventing the spreading out of random
workloads for 'near'. 'far' is using all disks in random read and
getting better performance on both random and sequential. I'm testing
loopbacks on an NVME drive so seek latency should not be a major
concern.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Nov 2, 2016 at 12:19 PM,  <keld@keldix.com> wrote:
> There is some speed limits om raid10,n2 as also reported in
> https://raid.wiki.kernel.org/index.php/Performance
>
> f you want speed, I suggest you use raid10,f2.
>
> Unfortunatlely you cannot grow "far" layouts, Neil says it is too complicated.
>
> But in your case you should be  able to disable one of your raid10,N2 drives,
> then build a raid10,n2 array for 3 disks, but only with the disk you removed from
> your N2 disk plus your new disk. Then you can copy the contents of the remaining
> old disk to the new "far" disk, and when complete, add the old raid10,n2 disk to the
> new Far raid, with 3 disks. This should give you about 3 times the speed
> of your old raid10,n2 array.
>
> Best regards
> keld
>
>
>
> On Wed, Nov 02, 2016 at 11:59:25AM -0600, Robert LeBlanc wrote:
>> We would like to add read performance to our RAID10 volume by adding
>> another drive (we don't care about space), so I did the following test
>> with poor results.
>>
>> # mdadm --create /dev/md13 --level 10 --run --assume-clean -p n2
>> --raid-devices 2 /dev/loop{2..3}
>> mdadm: /dev/loop2 appears to be part of a raid array:
>>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
>> mdadm: /dev/loop3 appears to be part of a raid array:
>>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
>> mdadm: Defaulting to version 1.2 metadata
>> mdadm: array /dev/md13 started.
>>
>> # mdadm --detail /dev/md13
>> /dev/md13:
>>        Version : 1.2
>>  Creation Time : Wed Nov  2 11:47:48 2016
>>     Raid Level : raid10
>>     Array Size : 10477568 (9.99 GiB 10.73 GB)
>>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
>>   Raid Devices : 2
>>  Total Devices : 2
>>    Persistence : Superblock is persistent
>>
>>    Update Time : Wed Nov  2 11:47:48 2016
>>          State : clean
>> Active Devices : 2
>> Working Devices : 2
>> Failed Devices : 0
>>  Spare Devices : 0
>>
>>         Layout : near=2
>>     Chunk Size : 512K
>>
>>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
>>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
>>         Events : 0
>>
>>    Number   Major   Minor   RaidDevice State
>>       0       7        2        0      active sync set-A   /dev/loop2
>>       1       7        3        1      active sync set-B   /dev/loop3
>>
>> # mdadm /dev/md13 -a /dev/loop4
>> mdadm: added /dev/loop4
>>
>> # mdadm --detail /dev/md13
>> /dev/md13:
>>        Version : 1.2
>>  Creation Time : Wed Nov  2 11:47:48 2016
>>     Raid Level : raid10
>>     Array Size : 10477568 (9.99 GiB 10.73 GB)
>>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
>>   Raid Devices : 2
>>  Total Devices : 3
>>    Persistence : Superblock is persistent
>>
>>    Update Time : Wed Nov  2 11:48:13 2016
>>          State : clean
>> Active Devices : 2
>> Working Devices : 3
>> Failed Devices : 0
>>  Spare Devices : 1
>>
>>         Layout : near=2
>>     Chunk Size : 512K
>>
>>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
>>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
>>         Events : 1
>>
>>    Number   Major   Minor   RaidDevice State
>>       0       7        2        0      active sync set-A   /dev/loop2
>>       1       7        3        1      active sync set-B   /dev/loop3
>>
>>       2       7        4        -      spare   /dev/loop4
>>
>> # mdadm --grow /dev/md13 -p n3 --raid-devices 3
>> mdadm: Cannot change number of copies when reshaping RAID10
>>
>> I also tried to add the device, grow raid-devices, let it reshape,
>> then try to change the number of copies and it didn't like that
>> either. It would be nice to supply -p nX and --raid-devices X at the
>> same time to prevent the reshape and only copy the data over to the
>> new drive (or drop a drive out completely). I could see changing -p
>> separately or at a different rate of drives added/removed could be
>> difficult, but for lockstep changes, it seems that it would be rather
>> easy.
>>
>> Any ideas?
>>
>> Thanks,
>>
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 19:02   ` Robert LeBlanc
@ 2016-11-02 19:48     ` keld
  2016-11-02 19:56       ` Robert LeBlanc
  0 siblings, 1 reply; 15+ messages in thread
From: keld @ 2016-11-02 19:48 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: linux-raid

If you want all your disks to be identical, then you only can chose between
raid1 and raid10 near. I believe then the raid10  near is the better layout, as some 
stats say you will have better random performance. I don't know why. Probably a driver issue
I believe you can have raid1 in a 3-disk solution. You should try it out, and then please report the
stats back to the list, then I will add it to the wiki (it seems unacessibe at the moment, tho)

best regards
Keld

On Wed, Nov 02, 2016 at 01:02:29PM -0600, Robert LeBlanc wrote:
> My boss basically wants RAID1 with all drives able to be read from. He
> has a requirement to have all the drives identical (minus the
> superblock) hence the 'near' option being used. From my rudimentary
> tests, sequential reds do seem to use all drives, but random reads
> don't. I wonder what logic is preventing the spreading out of random
> workloads for 'near'. 'far' is using all disks in random read and
> getting better performance on both random and sequential. I'm testing
> loopbacks on an NVME drive so seek latency should not be a major
> concern.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Wed, Nov 2, 2016 at 12:19 PM,  <keld@keldix.com> wrote:
> > There is some speed limits om raid10,n2 as also reported in
> > https://raid.wiki.kernel.org/index.php/Performance
> >
> > f you want speed, I suggest you use raid10,f2.
> >
> > Unfortunatlely you cannot grow "far" layouts, Neil says it is too complicated.
> >
> > But in your case you should be  able to disable one of your raid10,N2 drives,
> > then build a raid10,n2 array for 3 disks, but only with the disk you removed from
> > your N2 disk plus your new disk. Then you can copy the contents of the remaining
> > old disk to the new "far" disk, and when complete, add the old raid10,n2 disk to the
> > new Far raid, with 3 disks. This should give you about 3 times the speed
> > of your old raid10,n2 array.
> >
> > Best regards
> > keld
> >
> >
> >
> > On Wed, Nov 02, 2016 at 11:59:25AM -0600, Robert LeBlanc wrote:
> >> We would like to add read performance to our RAID10 volume by adding
> >> another drive (we don't care about space), so I did the following test
> >> with poor results.
> >>
> >> # mdadm --create /dev/md13 --level 10 --run --assume-clean -p n2
> >> --raid-devices 2 /dev/loop{2..3}
> >> mdadm: /dev/loop2 appears to be part of a raid array:
> >>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
> >> mdadm: /dev/loop3 appears to be part of a raid array:
> >>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
> >> mdadm: Defaulting to version 1.2 metadata
> >> mdadm: array /dev/md13 started.
> >>
> >> # mdadm --detail /dev/md13
> >> /dev/md13:
> >>        Version : 1.2
> >>  Creation Time : Wed Nov  2 11:47:48 2016
> >>     Raid Level : raid10
> >>     Array Size : 10477568 (9.99 GiB 10.73 GB)
> >>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
> >>   Raid Devices : 2
> >>  Total Devices : 2
> >>    Persistence : Superblock is persistent
> >>
> >>    Update Time : Wed Nov  2 11:47:48 2016
> >>          State : clean
> >> Active Devices : 2
> >> Working Devices : 2
> >> Failed Devices : 0
> >>  Spare Devices : 0
> >>
> >>         Layout : near=2
> >>     Chunk Size : 512K
> >>
> >>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
> >>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
> >>         Events : 0
> >>
> >>    Number   Major   Minor   RaidDevice State
> >>       0       7        2        0      active sync set-A   /dev/loop2
> >>       1       7        3        1      active sync set-B   /dev/loop3
> >>
> >> # mdadm /dev/md13 -a /dev/loop4
> >> mdadm: added /dev/loop4
> >>
> >> # mdadm --detail /dev/md13
> >> /dev/md13:
> >>        Version : 1.2
> >>  Creation Time : Wed Nov  2 11:47:48 2016
> >>     Raid Level : raid10
> >>     Array Size : 10477568 (9.99 GiB 10.73 GB)
> >>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
> >>   Raid Devices : 2
> >>  Total Devices : 3
> >>    Persistence : Superblock is persistent
> >>
> >>    Update Time : Wed Nov  2 11:48:13 2016
> >>          State : clean
> >> Active Devices : 2
> >> Working Devices : 3
> >> Failed Devices : 0
> >>  Spare Devices : 1
> >>
> >>         Layout : near=2
> >>     Chunk Size : 512K
> >>
> >>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
> >>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
> >>         Events : 1
> >>
> >>    Number   Major   Minor   RaidDevice State
> >>       0       7        2        0      active sync set-A   /dev/loop2
> >>       1       7        3        1      active sync set-B   /dev/loop3
> >>
> >>       2       7        4        -      spare   /dev/loop4
> >>
> >> # mdadm --grow /dev/md13 -p n3 --raid-devices 3
> >> mdadm: Cannot change number of copies when reshaping RAID10
> >>
> >> I also tried to add the device, grow raid-devices, let it reshape,
> >> then try to change the number of copies and it didn't like that
> >> either. It would be nice to supply -p nX and --raid-devices X at the
> >> same time to prevent the reshape and only copy the data over to the
> >> new drive (or drop a drive out completely). I could see changing -p
> >> separately or at a different rate of drives added/removed could be
> >> difficult, but for lockstep changes, it seems that it would be rather
> >> easy.
> >>
> >> Any ideas?
> >>
> >> Thanks,
> >>
> >> ----------------
> >> Robert LeBlanc
> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 19:48     ` keld
@ 2016-11-02 19:56       ` Robert LeBlanc
  2016-11-02 20:16         ` keld
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Robert LeBlanc @ 2016-11-02 19:56 UTC (permalink / raw)
  To: keld; +Cc: linux-raid

Yes, we can have any number of disks in a RAID1 (we currently have
three), but reads only ever come from the first drive. We want to move
to RAID10 so that all drives can service reads and provide performance
as well. We just need the option to grow a RAID10 like we can with
RAID1. We don't need the "extra" space by growing a RAID10 without
changing '-p n'. Basically, we want to be super paranoid with several
identical copies of the data and get extra read performance. We know
that we will be limited in write performance which is kind of counter
intuitive for RAID10, but our workload is OK with that.

I hope that makes sense. I could provide some test data on n-disk
RAID1, but my experience says there is little value to it, it is very
similar to 2 disk RAID1. If I have time, I'll supply something.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Nov 2, 2016 at 1:48 PM,  <keld@keldix.com> wrote:
> If you want all your disks to be identical, then you only can chose between
> raid1 and raid10 near. I believe then the raid10  near is the better layout, as some
> stats say you will have better random performance. I don't know why. Probably a driver issue
> I believe you can have raid1 in a 3-disk solution. You should try it out, and then please report the
> stats back to the list, then I will add it to the wiki (it seems unacessibe at the moment, tho)
>
> best regards
> Keld
>
> On Wed, Nov 02, 2016 at 01:02:29PM -0600, Robert LeBlanc wrote:
>> My boss basically wants RAID1 with all drives able to be read from. He
>> has a requirement to have all the drives identical (minus the
>> superblock) hence the 'near' option being used. From my rudimentary
>> tests, sequential reds do seem to use all drives, but random reads
>> don't. I wonder what logic is preventing the spreading out of random
>> workloads for 'near'. 'far' is using all disks in random read and
>> getting better performance on both random and sequential. I'm testing
>> loopbacks on an NVME drive so seek latency should not be a major
>> concern.
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Nov 2, 2016 at 12:19 PM,  <keld@keldix.com> wrote:
>> > There is some speed limits om raid10,n2 as also reported in
>> > https://raid.wiki.kernel.org/index.php/Performance
>> >
>> > f you want speed, I suggest you use raid10,f2.
>> >
>> > Unfortunatlely you cannot grow "far" layouts, Neil says it is too complicated.
>> >
>> > But in your case you should be  able to disable one of your raid10,N2 drives,
>> > then build a raid10,n2 array for 3 disks, but only with the disk you removed from
>> > your N2 disk plus your new disk. Then you can copy the contents of the remaining
>> > old disk to the new "far" disk, and when complete, add the old raid10,n2 disk to the
>> > new Far raid, with 3 disks. This should give you about 3 times the speed
>> > of your old raid10,n2 array.
>> >
>> > Best regards
>> > keld
>> >
>> >
>> >
>> > On Wed, Nov 02, 2016 at 11:59:25AM -0600, Robert LeBlanc wrote:
>> >> We would like to add read performance to our RAID10 volume by adding
>> >> another drive (we don't care about space), so I did the following test
>> >> with poor results.
>> >>
>> >> # mdadm --create /dev/md13 --level 10 --run --assume-clean -p n2
>> >> --raid-devices 2 /dev/loop{2..3}
>> >> mdadm: /dev/loop2 appears to be part of a raid array:
>> >>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
>> >> mdadm: /dev/loop3 appears to be part of a raid array:
>> >>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
>> >> mdadm: Defaulting to version 1.2 metadata
>> >> mdadm: array /dev/md13 started.
>> >>
>> >> # mdadm --detail /dev/md13
>> >> /dev/md13:
>> >>        Version : 1.2
>> >>  Creation Time : Wed Nov  2 11:47:48 2016
>> >>     Raid Level : raid10
>> >>     Array Size : 10477568 (9.99 GiB 10.73 GB)
>> >>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
>> >>   Raid Devices : 2
>> >>  Total Devices : 2
>> >>    Persistence : Superblock is persistent
>> >>
>> >>    Update Time : Wed Nov  2 11:47:48 2016
>> >>          State : clean
>> >> Active Devices : 2
>> >> Working Devices : 2
>> >> Failed Devices : 0
>> >>  Spare Devices : 0
>> >>
>> >>         Layout : near=2
>> >>     Chunk Size : 512K
>> >>
>> >>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
>> >>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
>> >>         Events : 0
>> >>
>> >>    Number   Major   Minor   RaidDevice State
>> >>       0       7        2        0      active sync set-A   /dev/loop2
>> >>       1       7        3        1      active sync set-B   /dev/loop3
>> >>
>> >> # mdadm /dev/md13 -a /dev/loop4
>> >> mdadm: added /dev/loop4
>> >>
>> >> # mdadm --detail /dev/md13
>> >> /dev/md13:
>> >>        Version : 1.2
>> >>  Creation Time : Wed Nov  2 11:47:48 2016
>> >>     Raid Level : raid10
>> >>     Array Size : 10477568 (9.99 GiB 10.73 GB)
>> >>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
>> >>   Raid Devices : 2
>> >>  Total Devices : 3
>> >>    Persistence : Superblock is persistent
>> >>
>> >>    Update Time : Wed Nov  2 11:48:13 2016
>> >>          State : clean
>> >> Active Devices : 2
>> >> Working Devices : 3
>> >> Failed Devices : 0
>> >>  Spare Devices : 1
>> >>
>> >>         Layout : near=2
>> >>     Chunk Size : 512K
>> >>
>> >>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
>> >>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
>> >>         Events : 1
>> >>
>> >>    Number   Major   Minor   RaidDevice State
>> >>       0       7        2        0      active sync set-A   /dev/loop2
>> >>       1       7        3        1      active sync set-B   /dev/loop3
>> >>
>> >>       2       7        4        -      spare   /dev/loop4
>> >>
>> >> # mdadm --grow /dev/md13 -p n3 --raid-devices 3
>> >> mdadm: Cannot change number of copies when reshaping RAID10
>> >>
>> >> I also tried to add the device, grow raid-devices, let it reshape,
>> >> then try to change the number of copies and it didn't like that
>> >> either. It would be nice to supply -p nX and --raid-devices X at the
>> >> same time to prevent the reshape and only copy the data over to the
>> >> new drive (or drop a drive out completely). I could see changing -p
>> >> separately or at a different rate of drives added/removed could be
>> >> difficult, but for lockstep changes, it seems that it would be rather
>> >> easy.
>> >>
>> >> Any ideas?
>> >>
>> >> Thanks,
>> >>
>> >> ----------------
>> >> Robert LeBlanc
>> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 19:56       ` Robert LeBlanc
@ 2016-11-02 20:16         ` keld
  2016-11-02 20:27           ` Robert LeBlanc
  2016-11-02 20:41         ` Robin Hill
  2016-11-02 21:00         ` Andreas Klauer
  2 siblings, 1 reply; 15+ messages in thread
From: keld @ 2016-11-02 20:16 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: linux-raid

I am not sure what the problem is then. If it is growing your raid10,n2
to a raid10,n3 - which may not be doable with mdadm grow - then you could try out
creating a raid10,n3 array on your new disk, with only 1 disk. copy the stuff,
and then adding the 2 old drives.

I think it is a insight that raid1 only - mostly - performs out of one disk,
regardslessly of how many disks you have. I have used multi-disk raid1 to
have redundancy for booting, so some use can be found.

Best regards
Keld

On Wed, Nov 02, 2016 at 01:56:02PM -0600, Robert LeBlanc wrote:
> Yes, we can have any number of disks in a RAID1 (we currently have
> three), but reads only ever come from the first drive. We want to move
> to RAID10 so that all drives can service reads and provide performance
> as well. We just need the option to grow a RAID10 like we can with
> RAID1. We don't need the "extra" space by growing a RAID10 without
> changing '-p n'. Basically, we want to be super paranoid with several
> identical copies of the data and get extra read performance. We know
> that we will be limited in write performance which is kind of counter
> intuitive for RAID10, but our workload is OK with that.
> 
> I hope that makes sense. I could provide some test data on n-disk
> RAID1, but my experience says there is little value to it, it is very
> similar to 2 disk RAID1. If I have time, I'll supply something.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Wed, Nov 2, 2016 at 1:48 PM,  <keld@keldix.com> wrote:
> > If you want all your disks to be identical, then you only can chose between
> > raid1 and raid10 near. I believe then the raid10  near is the better layout, as some
> > stats say you will have better random performance. I don't know why. Probably a driver issue
> > I believe you can have raid1 in a 3-disk solution. You should try it out, and then please report the
> > stats back to the list, then I will add it to the wiki (it seems unacessibe at the moment, tho)
> >
> > best regards
> > Keld
> >
> > On Wed, Nov 02, 2016 at 01:02:29PM -0600, Robert LeBlanc wrote:
> >> My boss basically wants RAID1 with all drives able to be read from. He
> >> has a requirement to have all the drives identical (minus the
> >> superblock) hence the 'near' option being used. From my rudimentary
> >> tests, sequential reds do seem to use all drives, but random reads
> >> don't. I wonder what logic is preventing the spreading out of random
> >> workloads for 'near'. 'far' is using all disks in random read and
> >> getting better performance on both random and sequential. I'm testing
> >> loopbacks on an NVME drive so seek latency should not be a major
> >> concern.
> >> ----------------
> >> Robert LeBlanc
> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>
> >>
> >> On Wed, Nov 2, 2016 at 12:19 PM,  <keld@keldix.com> wrote:
> >> > There is some speed limits om raid10,n2 as also reported in
> >> > https://raid.wiki.kernel.org/index.php/Performance
> >> >
> >> > f you want speed, I suggest you use raid10,f2.
> >> >
> >> > Unfortunatlely you cannot grow "far" layouts, Neil says it is too complicated.
> >> >
> >> > But in your case you should be  able to disable one of your raid10,N2 drives,
> >> > then build a raid10,n2 array for 3 disks, but only with the disk you removed from
> >> > your N2 disk plus your new disk. Then you can copy the contents of the remaining
> >> > old disk to the new "far" disk, and when complete, add the old raid10,n2 disk to the
> >> > new Far raid, with 3 disks. This should give you about 3 times the speed
> >> > of your old raid10,n2 array.
> >> >
> >> > Best regards
> >> > keld
> >> >
> >> >
> >> >
> >> > On Wed, Nov 02, 2016 at 11:59:25AM -0600, Robert LeBlanc wrote:
> >> >> We would like to add read performance to our RAID10 volume by adding
> >> >> another drive (we don't care about space), so I did the following test
> >> >> with poor results.
> >> >>
> >> >> # mdadm --create /dev/md13 --level 10 --run --assume-clean -p n2
> >> >> --raid-devices 2 /dev/loop{2..3}
> >> >> mdadm: /dev/loop2 appears to be part of a raid array:
> >> >>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
> >> >> mdadm: /dev/loop3 appears to be part of a raid array:
> >> >>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
> >> >> mdadm: Defaulting to version 1.2 metadata
> >> >> mdadm: array /dev/md13 started.
> >> >>
> >> >> # mdadm --detail /dev/md13
> >> >> /dev/md13:
> >> >>        Version : 1.2
> >> >>  Creation Time : Wed Nov  2 11:47:48 2016
> >> >>     Raid Level : raid10
> >> >>     Array Size : 10477568 (9.99 GiB 10.73 GB)
> >> >>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
> >> >>   Raid Devices : 2
> >> >>  Total Devices : 2
> >> >>    Persistence : Superblock is persistent
> >> >>
> >> >>    Update Time : Wed Nov  2 11:47:48 2016
> >> >>          State : clean
> >> >> Active Devices : 2
> >> >> Working Devices : 2
> >> >> Failed Devices : 0
> >> >>  Spare Devices : 0
> >> >>
> >> >>         Layout : near=2
> >> >>     Chunk Size : 512K
> >> >>
> >> >>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
> >> >>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
> >> >>         Events : 0
> >> >>
> >> >>    Number   Major   Minor   RaidDevice State
> >> >>       0       7        2        0      active sync set-A   /dev/loop2
> >> >>       1       7        3        1      active sync set-B   /dev/loop3
> >> >>
> >> >> # mdadm /dev/md13 -a /dev/loop4
> >> >> mdadm: added /dev/loop4
> >> >>
> >> >> # mdadm --detail /dev/md13
> >> >> /dev/md13:
> >> >>        Version : 1.2
> >> >>  Creation Time : Wed Nov  2 11:47:48 2016
> >> >>     Raid Level : raid10
> >> >>     Array Size : 10477568 (9.99 GiB 10.73 GB)
> >> >>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
> >> >>   Raid Devices : 2
> >> >>  Total Devices : 3
> >> >>    Persistence : Superblock is persistent
> >> >>
> >> >>    Update Time : Wed Nov  2 11:48:13 2016
> >> >>          State : clean
> >> >> Active Devices : 2
> >> >> Working Devices : 3
> >> >> Failed Devices : 0
> >> >>  Spare Devices : 1
> >> >>
> >> >>         Layout : near=2
> >> >>     Chunk Size : 512K
> >> >>
> >> >>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
> >> >>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
> >> >>         Events : 1
> >> >>
> >> >>    Number   Major   Minor   RaidDevice State
> >> >>       0       7        2        0      active sync set-A   /dev/loop2
> >> >>       1       7        3        1      active sync set-B   /dev/loop3
> >> >>
> >> >>       2       7        4        -      spare   /dev/loop4
> >> >>
> >> >> # mdadm --grow /dev/md13 -p n3 --raid-devices 3
> >> >> mdadm: Cannot change number of copies when reshaping RAID10
> >> >>
> >> >> I also tried to add the device, grow raid-devices, let it reshape,
> >> >> then try to change the number of copies and it didn't like that
> >> >> either. It would be nice to supply -p nX and --raid-devices X at the
> >> >> same time to prevent the reshape and only copy the data over to the
> >> >> new drive (or drop a drive out completely). I could see changing -p
> >> >> separately or at a different rate of drives added/removed could be
> >> >> difficult, but for lockstep changes, it seems that it would be rather
> >> >> easy.
> >> >>
> >> >> Any ideas?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> ----------------
> >> >> Robert LeBlanc
> >> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 20:16         ` keld
@ 2016-11-02 20:27           ` Robert LeBlanc
  0 siblings, 0 replies; 15+ messages in thread
From: Robert LeBlanc @ 2016-11-02 20:27 UTC (permalink / raw)
  To: keld; +Cc: linux-raid

Keld,

This is not a 'one-off' issue I'm trying to resolve. It may be
possible in the future that we have to add disks to thousands of
arrays consisting of hundreds of TBs of data. This should also be able
to be automated. It is also possible that we never have to add disks,
but we can't be backed into that corner, we would just stick with
RAID1 at that point. We just ran across RAID10 as what seemed to be a
solution to the problem we were having and are exploring the options.
It may be possible that we can 'adjust' the code to work for us, but
this is a bit out of our realm. Someone here might be able to say "oh,
that should be easy to add" if it isn't already there, where it would
take us weeks to understand the code, etc.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Nov 2, 2016 at 2:16 PM,  <keld@keldix.com> wrote:
> I am not sure what the problem is then. If it is growing your raid10,n2
> to a raid10,n3 - which may not be doable with mdadm grow - then you could try out
> creating a raid10,n3 array on your new disk, with only 1 disk. copy the stuff,
> and then adding the 2 old drives.
>
> I think it is a insight that raid1 only - mostly - performs out of one disk,
> regardslessly of how many disks you have. I have used multi-disk raid1 to
> have redundancy for booting, so some use can be found.
>
> Best regards
> Keld
>
> On Wed, Nov 02, 2016 at 01:56:02PM -0600, Robert LeBlanc wrote:
>> Yes, we can have any number of disks in a RAID1 (we currently have
>> three), but reads only ever come from the first drive. We want to move
>> to RAID10 so that all drives can service reads and provide performance
>> as well. We just need the option to grow a RAID10 like we can with
>> RAID1. We don't need the "extra" space by growing a RAID10 without
>> changing '-p n'. Basically, we want to be super paranoid with several
>> identical copies of the data and get extra read performance. We know
>> that we will be limited in write performance which is kind of counter
>> intuitive for RAID10, but our workload is OK with that.
>>
>> I hope that makes sense. I could provide some test data on n-disk
>> RAID1, but my experience says there is little value to it, it is very
>> similar to 2 disk RAID1. If I have time, I'll supply something.
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Nov 2, 2016 at 1:48 PM,  <keld@keldix.com> wrote:
>> > If you want all your disks to be identical, then you only can chose between
>> > raid1 and raid10 near. I believe then the raid10  near is the better layout, as some
>> > stats say you will have better random performance. I don't know why. Probably a driver issue
>> > I believe you can have raid1 in a 3-disk solution. You should try it out, and then please report the
>> > stats back to the list, then I will add it to the wiki (it seems unacessibe at the moment, tho)
>> >
>> > best regards
>> > Keld
>> >
>> > On Wed, Nov 02, 2016 at 01:02:29PM -0600, Robert LeBlanc wrote:
>> >> My boss basically wants RAID1 with all drives able to be read from. He
>> >> has a requirement to have all the drives identical (minus the
>> >> superblock) hence the 'near' option being used. From my rudimentary
>> >> tests, sequential reds do seem to use all drives, but random reads
>> >> don't. I wonder what logic is preventing the spreading out of random
>> >> workloads for 'near'. 'far' is using all disks in random read and
>> >> getting better performance on both random and sequential. I'm testing
>> >> loopbacks on an NVME drive so seek latency should not be a major
>> >> concern.
>> >> ----------------
>> >> Robert LeBlanc
>> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>> >>
>> >>
>> >> On Wed, Nov 2, 2016 at 12:19 PM,  <keld@keldix.com> wrote:
>> >> > There is some speed limits om raid10,n2 as also reported in
>> >> > https://raid.wiki.kernel.org/index.php/Performance
>> >> >
>> >> > f you want speed, I suggest you use raid10,f2.
>> >> >
>> >> > Unfortunatlely you cannot grow "far" layouts, Neil says it is too complicated.
>> >> >
>> >> > But in your case you should be  able to disable one of your raid10,N2 drives,
>> >> > then build a raid10,n2 array for 3 disks, but only with the disk you removed from
>> >> > your N2 disk plus your new disk. Then you can copy the contents of the remaining
>> >> > old disk to the new "far" disk, and when complete, add the old raid10,n2 disk to the
>> >> > new Far raid, with 3 disks. This should give you about 3 times the speed
>> >> > of your old raid10,n2 array.
>> >> >
>> >> > Best regards
>> >> > keld
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Nov 02, 2016 at 11:59:25AM -0600, Robert LeBlanc wrote:
>> >> >> We would like to add read performance to our RAID10 volume by adding
>> >> >> another drive (we don't care about space), so I did the following test
>> >> >> with poor results.
>> >> >>
>> >> >> # mdadm --create /dev/md13 --level 10 --run --assume-clean -p n2
>> >> >> --raid-devices 2 /dev/loop{2..3}
>> >> >> mdadm: /dev/loop2 appears to be part of a raid array:
>> >> >>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
>> >> >> mdadm: /dev/loop3 appears to be part of a raid array:
>> >> >>       level=raid10 devices=3 ctime=Wed Nov  2 11:25:22 2016
>> >> >> mdadm: Defaulting to version 1.2 metadata
>> >> >> mdadm: array /dev/md13 started.
>> >> >>
>> >> >> # mdadm --detail /dev/md13
>> >> >> /dev/md13:
>> >> >>        Version : 1.2
>> >> >>  Creation Time : Wed Nov  2 11:47:48 2016
>> >> >>     Raid Level : raid10
>> >> >>     Array Size : 10477568 (9.99 GiB 10.73 GB)
>> >> >>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
>> >> >>   Raid Devices : 2
>> >> >>  Total Devices : 2
>> >> >>    Persistence : Superblock is persistent
>> >> >>
>> >> >>    Update Time : Wed Nov  2 11:47:48 2016
>> >> >>          State : clean
>> >> >> Active Devices : 2
>> >> >> Working Devices : 2
>> >> >> Failed Devices : 0
>> >> >>  Spare Devices : 0
>> >> >>
>> >> >>         Layout : near=2
>> >> >>     Chunk Size : 512K
>> >> >>
>> >> >>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
>> >> >>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
>> >> >>         Events : 0
>> >> >>
>> >> >>    Number   Major   Minor   RaidDevice State
>> >> >>       0       7        2        0      active sync set-A   /dev/loop2
>> >> >>       1       7        3        1      active sync set-B   /dev/loop3
>> >> >>
>> >> >> # mdadm /dev/md13 -a /dev/loop4
>> >> >> mdadm: added /dev/loop4
>> >> >>
>> >> >> # mdadm --detail /dev/md13
>> >> >> /dev/md13:
>> >> >>        Version : 1.2
>> >> >>  Creation Time : Wed Nov  2 11:47:48 2016
>> >> >>     Raid Level : raid10
>> >> >>     Array Size : 10477568 (9.99 GiB 10.73 GB)
>> >> >>  Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
>> >> >>   Raid Devices : 2
>> >> >>  Total Devices : 3
>> >> >>    Persistence : Superblock is persistent
>> >> >>
>> >> >>    Update Time : Wed Nov  2 11:48:13 2016
>> >> >>          State : clean
>> >> >> Active Devices : 2
>> >> >> Working Devices : 3
>> >> >> Failed Devices : 0
>> >> >>  Spare Devices : 1
>> >> >>
>> >> >>         Layout : near=2
>> >> >>     Chunk Size : 512K
>> >> >>
>> >> >>           Name : rleblanc-pc:13  (local to host rleblanc-pc)
>> >> >>           UUID : 1eb66d7c:21308453:1e731c8b:1c43dd55
>> >> >>         Events : 1
>> >> >>
>> >> >>    Number   Major   Minor   RaidDevice State
>> >> >>       0       7        2        0      active sync set-A   /dev/loop2
>> >> >>       1       7        3        1      active sync set-B   /dev/loop3
>> >> >>
>> >> >>       2       7        4        -      spare   /dev/loop4
>> >> >>
>> >> >> # mdadm --grow /dev/md13 -p n3 --raid-devices 3
>> >> >> mdadm: Cannot change number of copies when reshaping RAID10
>> >> >>
>> >> >> I also tried to add the device, grow raid-devices, let it reshape,
>> >> >> then try to change the number of copies and it didn't like that
>> >> >> either. It would be nice to supply -p nX and --raid-devices X at the
>> >> >> same time to prevent the reshape and only copy the data over to the
>> >> >> new drive (or drop a drive out completely). I could see changing -p
>> >> >> separately or at a different rate of drives added/removed could be
>> >> >> difficult, but for lockstep changes, it seems that it would be rather
>> >> >> easy.
>> >> >>
>> >> >> Any ideas?
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> ----------------
>> >> >> Robert LeBlanc
>> >> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 19:56       ` Robert LeBlanc
  2016-11-02 20:16         ` keld
@ 2016-11-02 20:41         ` Robin Hill
  2016-11-02 20:59           ` Robert LeBlanc
  2016-11-02 21:00         ` Andreas Klauer
  2 siblings, 1 reply; 15+ messages in thread
From: Robin Hill @ 2016-11-02 20:41 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: linux-raid

On Wed Nov 02, 2016 at 01:56:02pm -0600, Robert LeBlanc wrote:

> Yes, we can have any number of disks in a RAID1 (we currently have
> three), but reads only ever come from the first drive.
> 
How are you testing? I use RAID1 on a number of systems and reads
look to be pretty evenly spread across the drives.

Cheers,
    Robin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 20:41         ` Robin Hill
@ 2016-11-02 20:59           ` Robert LeBlanc
  2016-11-02 21:11             ` Robert LeBlanc
  0 siblings, 1 reply; 15+ messages in thread
From: Robert LeBlanc @ 2016-11-02 20:59 UTC (permalink / raw)
  To: Robert LeBlanc, linux-raid

root@rleblanc-pc:~# losetup -l
NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE   DIO
/dev/loop1         0      0         0  0 /root/junk1   0
/dev/loop4         0      0         0  0 /root/junk4   0
/dev/loop2         0      0         0  0 /root/junk2   0
/dev/loop5         0      0         0  0 /root/junk5   0
/dev/loop3         0      0         0  0 /root/junk3   0
root@rleblanc-pc:~# mdadm --create /dev/md13 --level 1 --raid-devices
4 --run /dev/loop{1..4}
mdadm: Note: this array has metadata at the start and
   may not be suitable as a boot device.  If you plan to
   store '/boot' on this device please ensure that
   your boot-loader understands md/v1.x metadata, or use
   --metadata=0.90
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md13 started.
root@rleblanc-pc:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md13 : active raid1 loop4[3] loop3[2] loop2[1] loop1[0]
     10477568 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>
root@rleblanc-pc:~# mkfs.ext4 /dev/md13
mke2fs 1.43.3 (04-Sep-2016)
Discarding device blocks: done
Creating filesystem with 2619392 4k blocks and 655360 inodes
Filesystem UUID: 3bb68653-50af-492f-a3d4-8d0a5f2f4ca4
Superblock backups stored on blocks:
       32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done

root@rleblanc-pc:~# mkdir junk
root@rleblanc-pc:~# mount /dev/md13 junk
root@rleblanc-pc:~# cd junk
root@rleblanc-pc:~/junk# fio -rw=read --size=5G --name=mdadm_test
mdadm_test: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.10
Starting 1 process
mdadm_test: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 1 (f=1): [R(1)] [100.0% done] [338.3MB/0KB/0KB /s] [86.6K/0/0
iops] [eta 00m:00s]
mdadm_test: (groupid=0, jobs=1): err= 0: pid=18198: Wed Nov  2 14:54:20 2016
 read : io=5120.0MB, bw=483750KB/s, iops=120937, runt= 10838msec
   clat (usec): min=0, max=21384, avg= 7.98, stdev=108.10
    lat (usec): min=0, max=21384, avg= 8.02, stdev=108.10
   clat percentiles (usec):
    |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    0],
    | 30.00th=[    0], 40.00th=[    0], 50.00th=[    1], 60.00th=[    1],
    | 70.00th=[    1], 80.00th=[    1], 90.00th=[    1], 95.00th=[    1],
    | 99.00th=[  274], 99.50th=[  386], 99.90th=[  828], 99.95th=[ 2704],
    | 99.99th=[ 4640]
   bw (KB  /s): min=324608, max=748032, per=95.94%, avg=464090.29,
stdev=120877.09
   lat (usec) : 2=95.25%, 4=3.09%, 10=0.06%, 20=0.02%, 50=0.09%
   lat (usec) : 100=0.01%, 250=0.35%, 500=0.88%, 750=0.13%, 1000=0.02%
   lat (msec) : 2=0.01%, 4=0.06%, 10=0.01%, 20=0.01%, 50=0.01%
 cpu          : usr=5.02%, sys=12.25%, ctx=19708, majf=0, minf=10
 IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
    submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    issued    : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
    latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  READ: io=5120.0MB, aggrb=483749KB/s, minb=483749KB/s,
maxb=483749KB/s, mint=10838msec, maxt=10838msec

Disk stats (read/write):
   md13: ios=60029/3, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=15360/6, aggrmerge=0/0, aggrticks=13502/101,
aggrin_queue=13600, aggrutil=98.75%
 loop1: ios=61427/6, merge=0/0, ticks=54008/116, in_queue=54112, util=98.75%
 loop4: ios=0/6, merge=0/0, ticks=0/92, in_queue=92, util=0.84%
 loop2: ios=16/6, merge=0/0, ticks=0/104, in_queue=104, util=0.95%
 loop3: ios=0/6, merge=0/0, ticks=0/92, in_queue=92, util=0.84%

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00  1206.50 3517.50 2018.50 446660.00 12878.00
166.02     1.60    0.29    0.42    0.06   0.17  93.00
loop1             0.00     0.00 5233.50    0.00 446536.25     0.00
170.65     5.01    0.96    0.96    0.00   0.19 100.00
loop2             0.00     0.00    1.00    0.00   120.00     0.00
240.00     0.00    0.00    0.00    0.00   0.00   0.00
loop3             0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
loop4             0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
loop5             0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
md13              0.00     0.00 5235.00    0.00 446720.00     0.00
170.67     0.00    0.00    0.00    0.00   0.00   0.00

root@rleblanc-pc:~/junk# fio -rw=randread --size=5G --name=mdadm_test
mdadm_test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.10
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [444.5MB/0KB/0KB /s] [114K/0/0
iops] [eta 00m:00s]
mdadm_test: (groupid=0, jobs=1): err= 0: pid=18924: Wed Nov  2 14:55:16 2016
 read : io=5120.0MB, bw=463890KB/s, iops=115972, runt= 11302msec
   clat (usec): min=4, max=15649, avg= 8.03, stdev=37.76
    lat (usec): min=4, max=15649, avg= 8.07, stdev=37.76
   clat percentiles (usec):
    |  1.00th=[    5],  5.00th=[    5], 10.00th=[    6], 20.00th=[    6],
    | 30.00th=[    6], 40.00th=[    6], 50.00th=[    7], 60.00th=[    7],
    | 70.00th=[    7], 80.00th=[    8], 90.00th=[    9], 95.00th=[   10],
    | 99.00th=[   17], 99.50th=[   95], 99.90th=[  151], 99.95th=[  179],
    | 99.99th=[ 1528]
   bw (KB  /s): min=237416, max=543576, per=99.67%, avg=462350.91,
stdev=62842.83
   lat (usec) : 10=93.06%, 20=6.09%, 50=0.25%, 100=0.13%, 250=0.45%
   lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
   lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
 cpu          : usr=12.39%, sys=46.90%, ctx=1310616, majf=1, minf=9
 IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
    submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    issued    : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
    latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  READ: io=5120.0MB, aggrb=463889KB/s, minb=463889KB/s,
maxb=463889KB/s, mint=11302msec, maxt=11302msec

Disk stats (read/write):
   md13: ios=1303936/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=327680/0, aggrmerge=0/0, aggrticks=1635/0, aggrin_queue=1621,
aggrutil=56.53%
 loop1: ios=1310359/0, merge=0/0, ticks=6504/0, in_queue=6448, util=56.53%
 loop4: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
 loop2: ios=361/0, merge=0/0, ticks=36/0, in_queue=36, util=0.32%
 loop3: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00     8.50 1255.00    9.50  7552.00    64.00
12.05     0.23    0.18    0.17    1.68   0.12  15.60
loop1             0.00     0.00 115485.50    0.00 461942.00     0.00
  8.00     0.63    0.01    0.01    0.00   0.01  62.80
loop2             0.00     0.00   31.50    0.00   126.00     0.00
8.00     0.00    0.00    0.00    0.00   0.00   0.00
loop3             0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
loop4             0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
loop5             0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
md13              0.00     0.00 115512.50    0.00 462050.00     0.00
  8.00     0.00    0.00    0.00    0.00   0.00   0.00

This is indicative of what we see in production as well. As you can
see fio closely matches what iostat shows as far as device work. I
don't know how you are seeing even reads. I've seen this on both
CentOS and Debian.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Nov 2, 2016 at 2:41 PM, Robin Hill <robin@robinhill.me.uk> wrote:
> On Wed Nov 02, 2016 at 01:56:02pm -0600, Robert LeBlanc wrote:
>
>> Yes, we can have any number of disks in a RAID1 (we currently have
>> three), but reads only ever come from the first drive.
>>
> How are you testing? I use RAID1 on a number of systems and reads
> look to be pretty evenly spread across the drives.
>
> Cheers,
>     Robin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 19:56       ` Robert LeBlanc
  2016-11-02 20:16         ` keld
  2016-11-02 20:41         ` Robin Hill
@ 2016-11-02 21:00         ` Andreas Klauer
  2016-11-02 21:27           ` Robert LeBlanc
  2 siblings, 1 reply; 15+ messages in thread
From: Andreas Klauer @ 2016-11-02 21:00 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: linux-raid

On Wed, Nov 02, 2016 at 01:56:02PM -0600, Robert LeBlanc wrote:
> Yes, we can have any number of disks in a RAID1 (we currently have
> three), but reads only ever come from the first drive.

Only if there's only one reader. So it depends on what activity 
there is on the machine. 

> We just need the option to grow a RAID10 like we can with RAID1.

Patches welcome, I'm sure? ;-)

> Basically, we want to be super paranoid with several identical copies 
> of the data and get extra read performance.

You could put RAID on RAID and thus achieve other modes but not sure 
if it's worth the overhead or even applies in any way to your use case 
and using non standard setups always comes with its own pitfalls.

RAID 1, with RAID0 on top, three disks ABC, two partitions ab,
different disk order.

  A B C
a 1 2 3
b 3 1 2

Three RAID 1 md1, md2, md3, (and md0 a RAID-0 on top).

You can grow it.

  A B C D
a 1 2 3 ?
b 3 1 2 ?

  A B C D
a 1 2 3 ?
b 3 1 2 3

md3 has 3 disks temporarily here.

  A B C D
a 1 2 3 4
b 4 1 2 3

md4 is new, to be added to md0.

Three copies? Same thing with three partitions.

Will it help any or make things worse? I dunno.
Have to be careful to make md0 assemble last.

Could also be RAID5 on top instead of RAID1.
That's even stranger though.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 20:59           ` Robert LeBlanc
@ 2016-11-02 21:11             ` Robert LeBlanc
  0 siblings, 0 replies; 15+ messages in thread
From: Robert LeBlanc @ 2016-11-02 21:11 UTC (permalink / raw)
  To: Robert LeBlanc, linux-raid

As a comparision, here is a RAID10 n4 with 4 disks....

root@rleblanc-pc:~# mdadm --detail /dev/md14
/dev/md14:
       Version : 1.2
 Creation Time : Wed Nov  2 15:01:09 2016
    Raid Level : raid10
    Array Size : 10477568 (9.99 GiB 10.73 GB)
 Used Dev Size : 10477568 (9.99 GiB 10.73 GB)
  Raid Devices : 4
 Total Devices : 4
   Persistence : Superblock is persistent

   Update Time : Wed Nov  2 15:01:28 2016
         State : clean, resyncing
Active Devices : 4
Working Devices : 4
Failed Devices : 0
 Spare Devices : 0

        Layout : near=4
    Chunk Size : 512K

 Resync Status : 38% complete

          Name : rleblanc-pc:14  (local to host rleblanc-pc)
          UUID : 61114475:19a4404b:07b0a66d:a0e4447a
        Events : 6

   Number   Major   Minor   RaidDevice State
      0       7       11        0      active sync set-A   /dev/loop11
      1       7       12        1      active sync set-B   /dev/loop12
      2       7       13        2      active sync set-C   /dev/loop13
      3       7       14        3      active sync set-D   /dev/loop14

root@rleblanc-pc:~/junk# fio -rw=read --size=5G --name=mdadm_test
mdadm_test: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.10
Starting 1 process
mdadm_test: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 1 (f=1): [R(1)] [100.0% done] [238.3MB/0KB/0KB /s] [60.1K/0/0
iops] [eta 00m:00s]
mdadm_test: (groupid=0, jobs=1): err= 0: pid=19925: Wed Nov  2 15:08:15 2016
 read : io=5120.0MB, bw=343278KB/s, iops=85819, runt= 15273msec
   clat (usec): min=0, max=25847, avg=11.16, stdev=237.64
    lat (usec): min=0, max=25847, avg=11.23, stdev=237.64
   clat percentiles (usec):
    |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    0],
    | 30.00th=[    1], 40.00th=[    1], 50.00th=[    1], 60.00th=[    1],
    | 70.00th=[    1], 80.00th=[    1], 90.00th=[    2], 95.00th=[    2],
    | 99.00th=[    4], 99.50th=[    8], 99.90th=[ 2992], 99.95th=[ 4080],
    | 99.99th=[11456]
   bw (KB  /s): min=240136, max=528384, per=100.00%, avg=345144.53,
stdev=83065.30
   lat (usec) : 2=82.29%, 4=16.62%, 10=0.63%, 20=0.05%, 50=0.03%
   lat (usec) : 100=0.01%, 250=0.01%, 500=0.04%, 750=0.02%, 1000=0.01%
   lat (msec) : 2=0.08%, 4=0.15%, 10=0.04%, 20=0.01%, 50=0.01%
 cpu          : usr=5.71%, sys=14.59%, ctx=4480, majf=0, minf=11
 IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
    submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    issued    : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
    latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  READ: io=5120.0MB, aggrb=343277KB/s, minb=343277KB/s,
maxb=343277KB/s, mint=15273msec, maxt=15273msec

Disk stats (read/write):
   md14: ios=46045/3, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=11520/7, aggrmerge=0/0, aggrticks=85659/98,
aggrin_queue=85756, aggrutil=80.49%
 loop13: ios=17421/7, merge=0/0, ticks=133600/132, in_queue=133732, util=74.76%
 loop11: ios=4006/7, merge=0/0, ticks=22572/80, in_queue=22648, util=45.68%
 loop14: ios=19532/7, merge=0/0, ticks=154152/112, in_queue=154268, util=80.49%
 loop12: ios=5124/7, merge=0/0, ticks=32312/68, in_queue=32376, util=49.54%

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1          46.50  1459.00 4351.00 2990.50 386402.00 17792.00
110.11     3.94    0.53    0.86    0.05   0.13  94.80
loop11            0.00     0.00  252.50    0.00 29785.50     0.00
235.92     1.77    6.82    6.82    0.00   1.89  47.60
loop12            0.00     0.00  260.50    0.00 30805.50     0.00
236.51     2.00    7.66    7.66    0.00   1.88  49.00
loop13            0.00     0.00  905.00    0.00 102173.00     0.00
225.80     8.08    8.95    8.95    0.00   0.80  72.80
loop14            0.00     0.00 1074.50    0.00 120820.25     0.00
224.89    10.61    9.90    9.90    0.00   0.78  83.60
loop15            0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
md14              0.00     0.00 2493.00    0.00 283648.00     0.00
227.56     0.00    0.00    0.00    0.00   0.00   0.00

root@rleblanc-pc:~/junk# fio -rw=randread --size=5G --name=mdadm_test
mdadm_test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.10
Starting 1 process
Jobs: 1 (f=1): [r(1)] [97.7% done] [195.4MB/0KB/0KB /s] [49.1K/0/0
iops] [eta 00m:02s]
mdadm_test: (groupid=0, jobs=1): err= 0: pid=19953: Wed Nov  2 15:10:18 2016
 read : io=5120.0MB, bw=62013KB/s, iops=15503, runt= 84545msec
   clat (usec): min=4, max=11510, avg=63.40, stdev=96.01
    lat (usec): min=4, max=11510, avg=63.47, stdev=96.03
   clat percentiles (usec):
    |  1.00th=[    6],  5.00th=[    7], 10.00th=[    8], 20.00th=[    8],
    | 30.00th=[    9], 40.00th=[   11], 50.00th=[   17], 60.00th=[   61],
    | 70.00th=[  102], 80.00th=[  122], 90.00th=[  155], 95.00th=[  185],
    | 99.00th=[  258], 99.50th=[  298], 99.90th=[  494], 99.95th=[ 1816],
    | 99.99th=[ 3056]
   bw (KB  /s): min=22992, max=227816, per=99.90%, avg=61952.96, stdev=53309.04
   lat (usec) : 10=33.36%, 20=18.05%, 50=7.94%, 100=9.46%, 250=29.99%
   lat (usec) : 500=1.09%, 750=0.02%, 1000=0.01%
   lat (msec) : 2=0.03%, 4=0.04%, 10=0.01%, 20=0.01%
 cpu          : usr=2.63%, sys=13.01%, ctx=1310641, majf=0, minf=9
 IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
    submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    issued    : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
    latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  READ: io=5120.0MB, aggrb=62012KB/s, minb=62012KB/s, maxb=62012KB/s,
mint=84545msec, maxt=84545msec

Disk stats (read/write):
   md14: ios=1304718/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=327680/0, aggrmerge=0/0, aggrticks=18719/0,
aggrin_queue=18689, aggrutil=88.37%
 loop13: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
 loop11: ios=1310108/0, merge=0/0, ticks=74856/0, in_queue=74736, util=88.37%
 loop14: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
 loop12: ios=612/0, merge=0/0, ticks=20/0, in_queue=20, util=0.02%

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1          11.00     0.00 7046.00    0.00 30048.00     0.00
8.53     0.60    0.09    0.09    0.00   0.08  59.20
loop11            0.00     0.00 7953.00    0.00 31812.00     0.00
8.00     0.88    0.11    0.11    0.00   0.11  88.40
loop12            0.00     0.00    3.50    0.00    14.00     0.00
8.00     0.00    0.00    0.00    0.00   0.00   0.00
loop13            0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
loop14            0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
loop15            0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
md14              0.00     0.00 7956.50    0.00 31826.00     0.00
8.00     0.00    0.00    0.00    0.00   0.00   0.00

So sequential reads are being spread out, not completely evenly, but
some. Random reads looks almost like RAID1 with only one disk doing
all the work.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Nov 2, 2016 at 2:59 PM, Robert LeBlanc <robert@leblancnet.us> wrote:
> root@rleblanc-pc:~# losetup -l
> NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE   DIO
> /dev/loop1         0      0         0  0 /root/junk1   0
> /dev/loop4         0      0         0  0 /root/junk4   0
> /dev/loop2         0      0         0  0 /root/junk2   0
> /dev/loop5         0      0         0  0 /root/junk5   0
> /dev/loop3         0      0         0  0 /root/junk3   0
> root@rleblanc-pc:~# mdadm --create /dev/md13 --level 1 --raid-devices
> 4 --run /dev/loop{1..4}
> mdadm: Note: this array has metadata at the start and
>    may not be suitable as a boot device.  If you plan to
>    store '/boot' on this device please ensure that
>    your boot-loader understands md/v1.x metadata, or use
>    --metadata=0.90
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md13 started.
> root@rleblanc-pc:~# cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md13 : active raid1 loop4[3] loop3[2] loop2[1] loop1[0]
>      10477568 blocks super 1.2 [4/4] [UUUU]
>
> unused devices: <none>
> root@rleblanc-pc:~# mkfs.ext4 /dev/md13
> mke2fs 1.43.3 (04-Sep-2016)
> Discarding device blocks: done
> Creating filesystem with 2619392 4k blocks and 655360 inodes
> Filesystem UUID: 3bb68653-50af-492f-a3d4-8d0a5f2f4ca4
> Superblock backups stored on blocks:
>        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
>
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (16384 blocks): done
> Writing superblocks and filesystem accounting information: done
>
> root@rleblanc-pc:~# mkdir junk
> root@rleblanc-pc:~# mount /dev/md13 junk
> root@rleblanc-pc:~# cd junk
> root@rleblanc-pc:~/junk# fio -rw=read --size=5G --name=mdadm_test
> mdadm_test: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
> fio-2.10
> Starting 1 process
> mdadm_test: Laying out IO file(s) (1 file(s) / 5120MB)
> Jobs: 1 (f=1): [R(1)] [100.0% done] [338.3MB/0KB/0KB /s] [86.6K/0/0
> iops] [eta 00m:00s]
> mdadm_test: (groupid=0, jobs=1): err= 0: pid=18198: Wed Nov  2 14:54:20 2016
>  read : io=5120.0MB, bw=483750KB/s, iops=120937, runt= 10838msec
>    clat (usec): min=0, max=21384, avg= 7.98, stdev=108.10
>     lat (usec): min=0, max=21384, avg= 8.02, stdev=108.10
>    clat percentiles (usec):
>     |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    0],
>     | 30.00th=[    0], 40.00th=[    0], 50.00th=[    1], 60.00th=[    1],
>     | 70.00th=[    1], 80.00th=[    1], 90.00th=[    1], 95.00th=[    1],
>     | 99.00th=[  274], 99.50th=[  386], 99.90th=[  828], 99.95th=[ 2704],
>     | 99.99th=[ 4640]
>    bw (KB  /s): min=324608, max=748032, per=95.94%, avg=464090.29,
> stdev=120877.09
>    lat (usec) : 2=95.25%, 4=3.09%, 10=0.06%, 20=0.02%, 50=0.09%
>    lat (usec) : 100=0.01%, 250=0.35%, 500=0.88%, 750=0.13%, 1000=0.02%
>    lat (msec) : 2=0.01%, 4=0.06%, 10=0.01%, 20=0.01%, 50=0.01%
>  cpu          : usr=5.02%, sys=12.25%, ctx=19708, majf=0, minf=10
>  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     issued    : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>     latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>   READ: io=5120.0MB, aggrb=483749KB/s, minb=483749KB/s,
> maxb=483749KB/s, mint=10838msec, maxt=10838msec
>
> Disk stats (read/write):
>    md13: ios=60029/3, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
> aggrios=15360/6, aggrmerge=0/0, aggrticks=13502/101,
> aggrin_queue=13600, aggrutil=98.75%
>  loop1: ios=61427/6, merge=0/0, ticks=54008/116, in_queue=54112, util=98.75%
>  loop4: ios=0/6, merge=0/0, ticks=0/92, in_queue=92, util=0.84%
>  loop2: ios=16/6, merge=0/0, ticks=0/104, in_queue=104, util=0.95%
>  loop3: ios=0/6, merge=0/0, ticks=0/92, in_queue=92, util=0.84%
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> nvme0n1           0.00  1206.50 3517.50 2018.50 446660.00 12878.00
> 166.02     1.60    0.29    0.42    0.06   0.17  93.00
> loop1             0.00     0.00 5233.50    0.00 446536.25     0.00
> 170.65     5.01    0.96    0.96    0.00   0.19 100.00
> loop2             0.00     0.00    1.00    0.00   120.00     0.00
> 240.00     0.00    0.00    0.00    0.00   0.00   0.00
> loop3             0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> loop4             0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> loop5             0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> md13              0.00     0.00 5235.00    0.00 446720.00     0.00
> 170.67     0.00    0.00    0.00    0.00   0.00   0.00
>
> root@rleblanc-pc:~/junk# fio -rw=randread --size=5G --name=mdadm_test
> mdadm_test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
> fio-2.10
> Starting 1 process
> Jobs: 1 (f=1): [r(1)] [100.0% done] [444.5MB/0KB/0KB /s] [114K/0/0
> iops] [eta 00m:00s]
> mdadm_test: (groupid=0, jobs=1): err= 0: pid=18924: Wed Nov  2 14:55:16 2016
>  read : io=5120.0MB, bw=463890KB/s, iops=115972, runt= 11302msec
>    clat (usec): min=4, max=15649, avg= 8.03, stdev=37.76
>     lat (usec): min=4, max=15649, avg= 8.07, stdev=37.76
>    clat percentiles (usec):
>     |  1.00th=[    5],  5.00th=[    5], 10.00th=[    6], 20.00th=[    6],
>     | 30.00th=[    6], 40.00th=[    6], 50.00th=[    7], 60.00th=[    7],
>     | 70.00th=[    7], 80.00th=[    8], 90.00th=[    9], 95.00th=[   10],
>     | 99.00th=[   17], 99.50th=[   95], 99.90th=[  151], 99.95th=[  179],
>     | 99.99th=[ 1528]
>    bw (KB  /s): min=237416, max=543576, per=99.67%, avg=462350.91,
> stdev=62842.83
>    lat (usec) : 10=93.06%, 20=6.09%, 50=0.25%, 100=0.13%, 250=0.45%
>    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
>    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
>  cpu          : usr=12.39%, sys=46.90%, ctx=1310616, majf=1, minf=9
>  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     issued    : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>     latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>   READ: io=5120.0MB, aggrb=463889KB/s, minb=463889KB/s,
> maxb=463889KB/s, mint=11302msec, maxt=11302msec
>
> Disk stats (read/write):
>    md13: ios=1303936/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
> aggrios=327680/0, aggrmerge=0/0, aggrticks=1635/0, aggrin_queue=1621,
> aggrutil=56.53%
>  loop1: ios=1310359/0, merge=0/0, ticks=6504/0, in_queue=6448, util=56.53%
>  loop4: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>  loop2: ios=361/0, merge=0/0, ticks=36/0, in_queue=36, util=0.32%
>  loop3: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> nvme0n1           0.00     8.50 1255.00    9.50  7552.00    64.00
> 12.05     0.23    0.18    0.17    1.68   0.12  15.60
> loop1             0.00     0.00 115485.50    0.00 461942.00     0.00
>   8.00     0.63    0.01    0.01    0.00   0.01  62.80
> loop2             0.00     0.00   31.50    0.00   126.00     0.00
> 8.00     0.00    0.00    0.00    0.00   0.00   0.00
> loop3             0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> loop4             0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> loop5             0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> md13              0.00     0.00 115512.50    0.00 462050.00     0.00
>   8.00     0.00    0.00    0.00    0.00   0.00   0.00
>
> This is indicative of what we see in production as well. As you can
> see fio closely matches what iostat shows as far as device work. I
> don't know how you are seeing even reads. I've seen this on both
> CentOS and Debian.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Nov 2, 2016 at 2:41 PM, Robin Hill <robin@robinhill.me.uk> wrote:
>> On Wed Nov 02, 2016 at 01:56:02pm -0600, Robert LeBlanc wrote:
>>
>>> Yes, we can have any number of disks in a RAID1 (we currently have
>>> three), but reads only ever come from the first drive.
>>>
>> How are you testing? I use RAID1 on a number of systems and reads
>> look to be pretty evenly spread across the drives.
>>
>> Cheers,
>>     Robin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 21:00         ` Andreas Klauer
@ 2016-11-02 21:27           ` Robert LeBlanc
  2016-11-02 22:07             ` Robert LeBlanc
  0 siblings, 1 reply; 15+ messages in thread
From: Robert LeBlanc @ 2016-11-02 21:27 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

hmmm....

RAID1
root@rleblanc-pc:~/junk# fio -rw=read --size=1G --numjobs=4
--name=mdadm_test --group_reporting
mdadm_test: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
fio-2.10
Starting 4 processes
mdadm_test: Laying out IO file(s) (1 file(s) / 1024MB)
mdadm_test: Laying out IO file(s) (1 file(s) / 1024MB)
mdadm_test: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [R(1),_(3)] [88.9% done] [423.8MB/0KB/0KB /s] [108K/0/0
iops] [eta 00m:01s]
mdadm_test: (groupid=0, jobs=4): err= 0: pid=20564: Wed Nov  2 15:15:40 2016
 read : io=4096.0MB, bw=567642KB/s, iops=141910, runt=  7389msec
   clat (usec): min=0, max=22233, avg=23.02, stdev=288.38
    lat (usec): min=0, max=22233, avg=23.12, stdev=288.38
   clat percentiles (usec):
    |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    1],
    | 30.00th=[    1], 40.00th=[    1], 50.00th=[    1], 60.00th=[    2],
    | 70.00th=[    2], 80.00th=[    2], 90.00th=[    2], 95.00th=[    3],
    | 99.00th=[  644], 99.50th=[ 1144], 99.90th=[ 4128], 99.95th=[ 5600],
    | 99.99th=[11584]
   bw (KB  /s): min=94396, max=469418, per=28.62%, avg=162451.40, stdev=81106.83
   lat (usec) : 2=58.15%, 4=39.21%, 10=0.87%, 20=0.09%, 50=0.16%
   lat (usec) : 100=0.13%, 250=0.14%, 500=0.13%, 750=0.26%, 1000=0.29%
   lat (msec) : 2=0.29%, 4=0.20%, 10=0.09%, 20=0.01%, 50=0.01%
 cpu          : usr=4.14%, sys=10.87%, ctx=15564, majf=0, minf=41
 IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
    submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    issued    : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
    latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  READ: io=4096.0MB, aggrb=567641KB/s, minb=567641KB/s,
maxb=567641KB/s, mint=7389msec, maxt=7389msec

Disk stats (read/write):
   md13: ios=48375/3, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=12292/6, aggrmerge=0/0, aggrticks=31009/140,
aggrin_queue=31145, aggrutil=97.41%
 loop1: ios=14654/6, merge=0/0, ticks=39524/156, in_queue=39672, util=97.41%
 loop4: ios=5791/6, merge=0/0, ticks=13976/100, in_queue=14072, util=45.45%
 loop2: ios=16575/6, merge=0/0, ticks=37360/152, in_queue=37508, util=90.92%
 loop3: ios=12150/6, merge=0/0, ticks=33176/152, in_queue=33328, util=91.08%

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.50  1387.00 3234.00 2996.50 388746.00 17500.00
130.41     4.44    0.71    1.29    0.09   0.16  98.40
loop1             0.00     0.00 1510.00    2.50 128839.75     6.50
170.38     5.10    3.37    3.34   24.80   0.66 100.00
loop2             0.00     0.00 1570.00    2.50 133952.25     6.50
170.38     5.22    3.31    3.27   25.60   0.64 100.00
loop3             0.00     0.00 1521.50    2.50 129855.75     6.50
170.42     5.00    3.27    3.24   25.60   0.65  98.60
loop4             0.00     0.00    2.50    2.50   248.00     6.50
101.80     0.04    8.40    1.60   15.20   8.00   4.00
loop5             0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
md13              0.00     0.00 4603.50    1.50 392832.00     6.00
170.61     0.00    0.00    0.00    0.00   0.00   0.00

root@rleblanc-pc:~/junk# fio -rw=randread --size=1G --numjobs=4
--name=mdadm_test --group_reporting
mdadm_test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
fio-2.10
Starting 4 processes
Jobs: 1 (f=1): [_(3),r(1)] [100.0% done] [35996KB/0KB/0KB /s]
[8999/0/0 iops] [eta 00m:00s]
mdadm_test: (groupid=0, jobs=4): err= 0: pid=21036: Wed Nov  2 15:17:47 2016
 read : io=4096.0MB, bw=133254KB/s, iops=33313, runt= 31476msec
   clat (usec): min=4, max=14896, avg=103.19, stdev=123.06
    lat (usec): min=4, max=14896, avg=103.27, stdev=123.06
   clat percentiles (usec):
    |  1.00th=[    7],  5.00th=[    9], 10.00th=[   11], 20.00th=[   90],
    | 30.00th=[   95], 40.00th=[   99], 50.00th=[  104], 60.00th=[  112],
    | 70.00th=[  118], 80.00th=[  125], 90.00th=[  141], 95.00th=[  167],
    | 99.00th=[  247], 99.50th=[  318], 99.90th=[ 2256], 99.95th=[ 2512],
    | 99.99th=[ 4256]
   bw (KB  /s): min=26472, max=57008, per=28.80%, avg=38380.41, stdev=7929.82
   lat (usec) : 10=6.96%, 20=10.26%, 50=1.27%, 100=22.67%, 250=57.86%
   lat (usec) : 500=0.68%, 750=0.04%, 1000=0.02%
   lat (msec) : 2=0.09%, 4=0.12%, 10=0.01%, 20=0.01%
 cpu          : usr=1.51%, sys=7.30%, ctx=1051111, majf=0, minf=38
 IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
    submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    issued    : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
    latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  READ: io=4096.0MB, aggrb=133254KB/s, minb=133254KB/s,
maxb=133254KB/s, mint=31476msec, maxt=31476msec

Disk stats (read/write):
   md13: ios=1047839/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=262144/0, aggrmerge=0/0, aggrticks=25507/0,
aggrin_queue=25490, aggrutil=92.98%
 loop1: ios=342845/0, merge=0/0, ticks=29440/0, in_queue=29424, util=92.98%
 loop4: ios=190900/0, merge=0/0, ticks=20568/0, in_queue=20552, util=65.09%
 loop2: ios=257401/0, merge=0/0, ticks=26512/0, in_queue=26492, util=83.65%
 loop3: ios=257430/0, merge=0/0, ticks=25508/0, in_queue=25492, util=80.67%

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00     0.00 34484.50    0.00 141398.00     0.00
 8.20     3.02    0.09    0.09    0.00   0.03 100.00
loop11            0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
loop12            0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
loop13            0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
loop14            0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
loop15            0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
md14              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00

RAID10
root@rleblanc-pc:~/junk# fio -rw=read --size=1G --numjobs=4
--name=mdadm_test --group_reporting
...
Disk stats (read/write):
   md14: ios=36295/19, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=9227/27, aggrmerge=0/0, aggrticks=274586/1967,
aggrin_queue=276552, aggrutil=98.05%
 loop13: ios=9006/27, merge=0/0, ticks=253296/1824, in_queue=255120, util=95.31%
 loop11: ios=9171/27, merge=0/0, ticks=260884/1876, in_queue=262760, util=96.57%
 loop14: ios=9593/27, merge=0/0, ticks=313672/2256, in_queue=315924, util=98.05%
 loop12: ios=9141/27, merge=0/0, ticks=270492/1912, in_queue=272404, util=97.20%

root@rleblanc-pc:~/junk# fio -rw=randread --size=1G --numjobs=4
--name=mdadm_test --group_reporting
...
Disk stats (read/write):
   md14: ios=1047470/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=262144/0, aggrmerge=0/0, aggrticks=33242/0,
aggrin_queue=33209, aggrutil=92.62%
 loop13: ios=258512/0, merge=0/0, ticks=33188/0, in_queue=33160, util=90.21%
 loop11: ios=275798/0, merge=0/0, ticks=34120/0, in_queue=34088, util=92.62%
 loop14: ios=252031/0, merge=0/0, ticks=31976/0, in_queue=31936, util=87.15%
 loop12: ios=262235/0, merge=0/0, ticks=33684/0, in_queue=33652, util=91.52%

Much better distribution, especially on RAID10. I wonder if because we
are running a single VM on the array that libvirt is basically single
threaded causing what we are seeing. I think libvirt can have multiple
threads for I/O, we'll have to look into that. It is obvious that md
can split reads from a single thread, I wonder what is preventing from
allowing it to do it more efficiently.

This warrants more probing.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Nov 2, 2016 at 3:00 PM, Andreas Klauer
<Andreas.Klauer@metamorpher.de> wrote:
> On Wed, Nov 02, 2016 at 01:56:02PM -0600, Robert LeBlanc wrote:
>> Yes, we can have any number of disks in a RAID1 (we currently have
>> three), but reads only ever come from the first drive.
>
> Only if there's only one reader. So it depends on what activity
> there is on the machine.
>
>> We just need the option to grow a RAID10 like we can with RAID1.
>
> Patches welcome, I'm sure? ;-)
>
>> Basically, we want to be super paranoid with several identical copies
>> of the data and get extra read performance.
>
> You could put RAID on RAID and thus achieve other modes but not sure
> if it's worth the overhead or even applies in any way to your use case
> and using non standard setups always comes with its own pitfalls.
>
> RAID 1, with RAID0 on top, three disks ABC, two partitions ab,
> different disk order.
>
>   A B C
> a 1 2 3
> b 3 1 2
>
> Three RAID 1 md1, md2, md3, (and md0 a RAID-0 on top).
>
> You can grow it.
>
>   A B C D
> a 1 2 3 ?
> b 3 1 2 ?
>
>   A B C D
> a 1 2 3 ?
> b 3 1 2 3
>
> md3 has 3 disks temporarily here.
>
>   A B C D
> a 1 2 3 4
> b 4 1 2 3
>
> md4 is new, to be added to md0.
>
> Three copies? Same thing with three partitions.
>
> Will it help any or make things worse? I dunno.
> Have to be careful to make md0 assemble last.
>
> Could also be RAID5 on top instead of RAID1.
> That's even stranger though.
>
> Regards
> Andreas Klauer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Issue with growing RAID10
  2016-11-02 21:27           ` Robert LeBlanc
@ 2016-11-02 22:07             ` Robert LeBlanc
  0 siblings, 0 replies; 15+ messages in thread
From: Robert LeBlanc @ 2016-11-02 22:07 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Oh, and since '-p f4' works so well, it really seems like there is a
bug in the 'near' code. We are going to see if we can find anything in
the code. I could see that mechanical drives get an advantage with
'far', but SSDs should make little difference.

RAID10 f4
# fio -rw=read --size=5G --name=mdadm_test
...
Disk stats (read/write):
   md15: ios=45212/5, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=14064/13, aggrmerge=0/0, aggrticks=290590/893,
aggrin_queue=291481, aggrutil=98.95%
 loop23: ios=15328/13, merge=0/0, ticks=337884/928, in_queue=338816, util=98.95%
 loop21: ios=15329/13, merge=0/0, ticks=314396/984, in_queue=315372, util=98.75%
 loop24: ios=12800/13, merge=0/0, ticks=270368/904, in_queue=271268, util=98.59%
 loop22: ios=12800/13, merge=0/0, ticks=239712/756, in_queue=240468, util=98.51%

# fio -rw=randread --size=5G --name=mdadm_test
...
Disk stats (read/write):
   md15: ios=1305867/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=327680/0, aggrmerge=0/0, aggrticks=21163/0,
aggrin_queue=21146, aggrutil=23.32%
 loop23: ios=327680/0, merge=0/0, ticks=21512/0, in_queue=21496, util=23.32%
 loop21: ios=327680/0, merge=0/0, ticks=20716/0, in_queue=20692, util=22.44%
 loop24: ios=327680/0, merge=0/0, ticks=21500/0, in_queue=21488, util=23.31%
 loop22: ios=327680/0, merge=0/0, ticks=20924/0, in_queue=20908, util=22.68%
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Nov 2, 2016 at 3:27 PM, Robert LeBlanc <robert@leblancnet.us> wrote:
> hmmm....
>
> RAID1
> root@rleblanc-pc:~/junk# fio -rw=read --size=1G --numjobs=4
> --name=mdadm_test --group_reporting
> mdadm_test: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
> ...
> fio-2.10
> Starting 4 processes
> mdadm_test: Laying out IO file(s) (1 file(s) / 1024MB)
> mdadm_test: Laying out IO file(s) (1 file(s) / 1024MB)
> mdadm_test: Laying out IO file(s) (1 file(s) / 1024MB)
> Jobs: 1 (f=1): [R(1),_(3)] [88.9% done] [423.8MB/0KB/0KB /s] [108K/0/0
> iops] [eta 00m:01s]
> mdadm_test: (groupid=0, jobs=4): err= 0: pid=20564: Wed Nov  2 15:15:40 2016
>  read : io=4096.0MB, bw=567642KB/s, iops=141910, runt=  7389msec
>    clat (usec): min=0, max=22233, avg=23.02, stdev=288.38
>     lat (usec): min=0, max=22233, avg=23.12, stdev=288.38
>    clat percentiles (usec):
>     |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    1],
>     | 30.00th=[    1], 40.00th=[    1], 50.00th=[    1], 60.00th=[    2],
>     | 70.00th=[    2], 80.00th=[    2], 90.00th=[    2], 95.00th=[    3],
>     | 99.00th=[  644], 99.50th=[ 1144], 99.90th=[ 4128], 99.95th=[ 5600],
>     | 99.99th=[11584]
>    bw (KB  /s): min=94396, max=469418, per=28.62%, avg=162451.40, stdev=81106.83
>    lat (usec) : 2=58.15%, 4=39.21%, 10=0.87%, 20=0.09%, 50=0.16%
>    lat (usec) : 100=0.13%, 250=0.14%, 500=0.13%, 750=0.26%, 1000=0.29%
>    lat (msec) : 2=0.29%, 4=0.20%, 10=0.09%, 20=0.01%, 50=0.01%
>  cpu          : usr=4.14%, sys=10.87%, ctx=15564, majf=0, minf=41
>  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     issued    : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>     latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>   READ: io=4096.0MB, aggrb=567641KB/s, minb=567641KB/s,
> maxb=567641KB/s, mint=7389msec, maxt=7389msec
>
> Disk stats (read/write):
>    md13: ios=48375/3, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
> aggrios=12292/6, aggrmerge=0/0, aggrticks=31009/140,
> aggrin_queue=31145, aggrutil=97.41%
>  loop1: ios=14654/6, merge=0/0, ticks=39524/156, in_queue=39672, util=97.41%
>  loop4: ios=5791/6, merge=0/0, ticks=13976/100, in_queue=14072, util=45.45%
>  loop2: ios=16575/6, merge=0/0, ticks=37360/152, in_queue=37508, util=90.92%
>  loop3: ios=12150/6, merge=0/0, ticks=33176/152, in_queue=33328, util=91.08%
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> nvme0n1           0.50  1387.00 3234.00 2996.50 388746.00 17500.00
> 130.41     4.44    0.71    1.29    0.09   0.16  98.40
> loop1             0.00     0.00 1510.00    2.50 128839.75     6.50
> 170.38     5.10    3.37    3.34   24.80   0.66 100.00
> loop2             0.00     0.00 1570.00    2.50 133952.25     6.50
> 170.38     5.22    3.31    3.27   25.60   0.64 100.00
> loop3             0.00     0.00 1521.50    2.50 129855.75     6.50
> 170.42     5.00    3.27    3.24   25.60   0.65  98.60
> loop4             0.00     0.00    2.50    2.50   248.00     6.50
> 101.80     0.04    8.40    1.60   15.20   8.00   4.00
> loop5             0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> md13              0.00     0.00 4603.50    1.50 392832.00     6.00
> 170.61     0.00    0.00    0.00    0.00   0.00   0.00
>
> root@rleblanc-pc:~/junk# fio -rw=randread --size=1G --numjobs=4
> --name=mdadm_test --group_reporting
> mdadm_test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
> ...
> fio-2.10
> Starting 4 processes
> Jobs: 1 (f=1): [_(3),r(1)] [100.0% done] [35996KB/0KB/0KB /s]
> [8999/0/0 iops] [eta 00m:00s]
> mdadm_test: (groupid=0, jobs=4): err= 0: pid=21036: Wed Nov  2 15:17:47 2016
>  read : io=4096.0MB, bw=133254KB/s, iops=33313, runt= 31476msec
>    clat (usec): min=4, max=14896, avg=103.19, stdev=123.06
>     lat (usec): min=4, max=14896, avg=103.27, stdev=123.06
>    clat percentiles (usec):
>     |  1.00th=[    7],  5.00th=[    9], 10.00th=[   11], 20.00th=[   90],
>     | 30.00th=[   95], 40.00th=[   99], 50.00th=[  104], 60.00th=[  112],
>     | 70.00th=[  118], 80.00th=[  125], 90.00th=[  141], 95.00th=[  167],
>     | 99.00th=[  247], 99.50th=[  318], 99.90th=[ 2256], 99.95th=[ 2512],
>     | 99.99th=[ 4256]
>    bw (KB  /s): min=26472, max=57008, per=28.80%, avg=38380.41, stdev=7929.82
>    lat (usec) : 10=6.96%, 20=10.26%, 50=1.27%, 100=22.67%, 250=57.86%
>    lat (usec) : 500=0.68%, 750=0.04%, 1000=0.02%
>    lat (msec) : 2=0.09%, 4=0.12%, 10=0.01%, 20=0.01%
>  cpu          : usr=1.51%, sys=7.30%, ctx=1051111, majf=0, minf=38
>  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     issued    : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>     latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>   READ: io=4096.0MB, aggrb=133254KB/s, minb=133254KB/s,
> maxb=133254KB/s, mint=31476msec, maxt=31476msec
>
> Disk stats (read/write):
>    md13: ios=1047839/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
> aggrios=262144/0, aggrmerge=0/0, aggrticks=25507/0,
> aggrin_queue=25490, aggrutil=92.98%
>  loop1: ios=342845/0, merge=0/0, ticks=29440/0, in_queue=29424, util=92.98%
>  loop4: ios=190900/0, merge=0/0, ticks=20568/0, in_queue=20552, util=65.09%
>  loop2: ios=257401/0, merge=0/0, ticks=26512/0, in_queue=26492, util=83.65%
>  loop3: ios=257430/0, merge=0/0, ticks=25508/0, in_queue=25492, util=80.67%
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> nvme0n1           0.00     0.00 34484.50    0.00 141398.00     0.00
>  8.20     3.02    0.09    0.09    0.00   0.03 100.00
> loop11            0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> loop12            0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> loop13            0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> loop14            0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> loop15            0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> md14              0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
>
> RAID10
> root@rleblanc-pc:~/junk# fio -rw=read --size=1G --numjobs=4
> --name=mdadm_test --group_reporting
> ...
> Disk stats (read/write):
>    md14: ios=36295/19, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
> aggrios=9227/27, aggrmerge=0/0, aggrticks=274586/1967,
> aggrin_queue=276552, aggrutil=98.05%
>  loop13: ios=9006/27, merge=0/0, ticks=253296/1824, in_queue=255120, util=95.31%
>  loop11: ios=9171/27, merge=0/0, ticks=260884/1876, in_queue=262760, util=96.57%
>  loop14: ios=9593/27, merge=0/0, ticks=313672/2256, in_queue=315924, util=98.05%
>  loop12: ios=9141/27, merge=0/0, ticks=270492/1912, in_queue=272404, util=97.20%
>
> root@rleblanc-pc:~/junk# fio -rw=randread --size=1G --numjobs=4
> --name=mdadm_test --group_reporting
> ...
> Disk stats (read/write):
>    md14: ios=1047470/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
> aggrios=262144/0, aggrmerge=0/0, aggrticks=33242/0,
> aggrin_queue=33209, aggrutil=92.62%
>  loop13: ios=258512/0, merge=0/0, ticks=33188/0, in_queue=33160, util=90.21%
>  loop11: ios=275798/0, merge=0/0, ticks=34120/0, in_queue=34088, util=92.62%
>  loop14: ios=252031/0, merge=0/0, ticks=31976/0, in_queue=31936, util=87.15%
>  loop12: ios=262235/0, merge=0/0, ticks=33684/0, in_queue=33652, util=91.52%
>
> Much better distribution, especially on RAID10. I wonder if because we
> are running a single VM on the array that libvirt is basically single
> threaded causing what we are seeing. I think libvirt can have multiple
> threads for I/O, we'll have to look into that. It is obvious that md
> can split reads from a single thread, I wonder what is preventing from
> allowing it to do it more efficiently.
>
> This warrants more probing.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Nov 2, 2016 at 3:00 PM, Andreas Klauer
> <Andreas.Klauer@metamorpher.de> wrote:
>> On Wed, Nov 02, 2016 at 01:56:02PM -0600, Robert LeBlanc wrote:
>>> Yes, we can have any number of disks in a RAID1 (we currently have
>>> three), but reads only ever come from the first drive.
>>
>> Only if there's only one reader. So it depends on what activity
>> there is on the machine.
>>
>>> We just need the option to grow a RAID10 like we can with RAID1.
>>
>> Patches welcome, I'm sure? ;-)
>>
>>> Basically, we want to be super paranoid with several identical copies
>>> of the data and get extra read performance.
>>
>> You could put RAID on RAID and thus achieve other modes but not sure
>> if it's worth the overhead or even applies in any way to your use case
>> and using non standard setups always comes with its own pitfalls.
>>
>> RAID 1, with RAID0 on top, three disks ABC, two partitions ab,
>> different disk order.
>>
>>   A B C
>> a 1 2 3
>> b 3 1 2
>>
>> Three RAID 1 md1, md2, md3, (and md0 a RAID-0 on top).
>>
>> You can grow it.
>>
>>   A B C D
>> a 1 2 3 ?
>> b 3 1 2 ?
>>
>>   A B C D
>> a 1 2 3 ?
>> b 3 1 2 3
>>
>> md3 has 3 disks temporarily here.
>>
>>   A B C D
>> a 1 2 3 4
>> b 4 1 2 3
>>
>> md4 is new, to be added to md0.
>>
>> Three copies? Same thing with three partitions.
>>
>> Will it help any or make things worse? I dunno.
>> Have to be careful to make md0 assemble last.
>>
>> Could also be RAID5 on top instead of RAID1.
>> That's even stranger though.
>>
>> Regards
>> Andreas Klauer

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-11-02 22:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-02 17:59 Issue with growing RAID10 Robert LeBlanc
2016-11-02 18:09 ` Wols Lists
2016-11-02 18:13   ` Robert LeBlanc
2016-11-02 18:19 ` keld
2016-11-02 19:02   ` Robert LeBlanc
2016-11-02 19:48     ` keld
2016-11-02 19:56       ` Robert LeBlanc
2016-11-02 20:16         ` keld
2016-11-02 20:27           ` Robert LeBlanc
2016-11-02 20:41         ` Robin Hill
2016-11-02 20:59           ` Robert LeBlanc
2016-11-02 21:11             ` Robert LeBlanc
2016-11-02 21:00         ` Andreas Klauer
2016-11-02 21:27           ` Robert LeBlanc
2016-11-02 22:07             ` Robert LeBlanc

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.