Linux RAID subsystem development
 help / color / mirror / Atom feed
* Raid5 to raid6 reshape or recreate?
@ 2014-08-13  1:05 Ram Ramesh
  2014-08-13  9:02 ` Mikael Abrahamsson
  2014-08-13 17:25 ` Ram Ramesh
  0 siblings, 2 replies; 6+ messages in thread
From: Ram Ramesh @ 2014-08-13  1:05 UTC (permalink / raw)
  To: linux-raid

I have read several threads about reshaping being very slow. I am seeing 
it myself when I converted 3-disk (2TB each) raid5 to 4-disk (again, 2TB 
each) raid6. I am only getting about 8M rate. I tried all sorts of 
things with /proc and /sys variables. Nothing helped. Each time I set 
something, I get a temporary boost to 10M/s, and after a little while, 
it settles down back to 8M/s rate.

I have currently very little in terms of useful files  in the raid 
(about 100G). I also took a fresh backup before the reshape and 
unmounted md0. Would it be faster if I trash the raid5 and create raid6 
from scratch instead of reshape? Any guess on how much faster? I 
typically got 100+M on checkarray cron jobs.  It would fire around 1-2AM 
and finish by 6-7AM on first Sun. Reshape has been running more than a 
day and has only about 45% done.

BTW, this is really old machine with SATA 3.0G and Athlon 64 X2 (max 
freq 2Ghz).  I also checked that my cpu is not running out of gas. In 
fact on-demand governet has reduced the freq to 1Ghz with about 50% cpu 
load. However uptime shows 2.7 jobs in the ready queue. Here are the 
actual outputs. Please let me know if I missed any optimization.

    lata:/home/rramesh# blockdev --getra /dev/sdb (all other sdx's also
    havethe same value)
    131072
    lata:/home/rramesh# cat /proc/sys/dev/raid/speed_limit_max
    200000
    lata:/home/rramesh# cat /proc/sys/dev/raid/speed_limit_min
    100000
    lata:/home/rramesh# cat /sys/block/md0/md/stripe_cache_active
    0
    lata:/home/rramesh# cat /sys/block/md0/md/stripe_cache_size
    16384
    lata:/home/rramesh# cat /sys/block/md0/md/sync_speed
    9265
    lata:/home/rramesh# cat /sys/block/md0/md/sync_speed_max
    200000 (system)
    lata:/home/rramesh# cat /sys/block/md0/md/sync_speed_min
    200000 (local)
    lata:/home/rramesh# cat /sys/block/md0/md/chunk_size
    524288
    lata:/home/rramesh# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4]
    md0 : active raid6 sde1[4] sdb1[0] sdd1[3] sdc1[1]
           3906763776 blocks super 1.2 level 6, 512k chunk, algorithm 18
    [4/3] [UUU_]
           [=========>...........]  reshape = 46.1%
    (902017024/1953381888) finish=1907.6min speed=9184K/sec

    unused devices: <none>

    lata:/home/rramesh# iostat -xd 10 3
    Linux 3.2.0-4-amd64 (lata)      08/12/2014 _x86_64_        (2 CPU)

    Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm %util
    sda               0.01     0.07    0.09    0.10 2.63     4.15   
    70.84     0.00    8.34    1.67   14.77   0.54 0.01
    sdb            2240.97  2241.60   39.15   23.44 18327.30 9040.86  
    874.57     1.38   22.11   18.74   27.74  10.41  65.16
    sdc            2240.41  2241.53   39.70   23.50 18327.36 9040.80  
    866.07     1.10   17.49   12.23   26.38   9.23  58.36
    sdd            2240.28  2241.47   39.84   23.56 18327.44 9040.83  
    863.38     1.03   16.21   10.96   25.09   8.68  55.03
    sde               0.02  2240.73    0.01   24.08     0.03 9039.95  
    750.62     0.37   15.49    0.91   15.49  10.18  24.53
    sdf               0.00     3.82    0.10   43.64     0.40 18869.21  
    862.92     2.29   52.34    1.37   52.45   5.01  21.93
    md0               0.00     0.00    9.32    0.35 758.03     1.39  
    157.08     0.00    0.00    0.00    0.00 0.00   0.00

    Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm %util
    sda               0.00     0.00    0.00    0.00 0.00     0.00    
    0.00     0.00    0.00    0.00    0.00   0.00 0.00
    sdb            2155.90  2158.60   37.30   23.10 17561.60 8809.15  
    873.20     1.51   25.11   22.94   28.62  11.33  68.44
    sdc            2155.90  2158.60   37.40   23.00 17579.20 8757.95  
    872.09     1.13   18.74   11.99   29.70  10.11  61.04
    sdd            2155.90  2158.50   37.50   23.10 17630.40 8757.95  
    870.90     0.97   16.09   10.65   24.94   9.27  56.16
    sde               0.00  2158.10    0.00   23.50     0.00 8757.95  
    745.36     0.40   16.95    0.00   16.95  11.78  27.68
    sdf               0.00     2.20    0.00   41.70     0.00 18057.20  
    866.05     1.59   38.25    0.00   38.25   4.78  19.92
    md0               0.00     0.00    0.00    0.00 0.00     0.00    
    0.00     0.00    0.00    0.00    0.00   0.00 0.00

    Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm %util
    sda               0.00     0.00    0.00    0.00 0.00     0.00    
    0.00     0.00    0.00    0.00    0.00   0.00 0.00
    sdb            1980.20  1980.80   31.80   20.70 15718.40 7989.55  
    903.16     1.52   28.92   24.42   35.85  13.78  72.36
    sdc            1980.20  1980.70   31.70   20.80 15700.80 7989.55  
    902.49     0.99   18.92   14.35   25.88  10.30  54.08
    sdd            1980.00  1980.90   31.70   20.60 15598.40 7989.55  
    902.02     0.89   16.99   11.81   24.95   9.23  48.28
    sde               0.00  1980.10    0.00   21.40     0.00 7989.55  
    746.69     0.35   16.13    0.00   16.13  11.07  23.68
    sdf               0.00     1.80    0.00   34.30     0.00 14774.80  
    861.50     1.53   44.79    0.00   44.79   5.36  18.40
    md0               0.00     0.00    0.00    0.00 0.00     0.00    
    0.00     0.00    0.00    0.00    0.00   0.00 0.00

Thanks in advance for your help. Even the assurance that I have done 
everything except waiting for a finish will make me feel better :-)

I wonder if my reshape is so slow because of constant seeks rather than 
read. All my disks are 5400 rpm (two WD green and two Samsung Ecogreen 5400)

I forgot, I did check SMART data, and did not find anything bad.

Regards
Ramesh


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Raid5 to raid6 reshape or recreate?
  2014-08-13  1:05 Raid5 to raid6 reshape or recreate? Ram Ramesh
@ 2014-08-13  9:02 ` Mikael Abrahamsson
  2014-08-13 17:25 ` Ram Ramesh
  1 sibling, 0 replies; 6+ messages in thread
From: Mikael Abrahamsson @ 2014-08-13  9:02 UTC (permalink / raw)
  To: Ram Ramesh; +Cc: linux-raid

On Tue, 12 Aug 2014, Ram Ramesh wrote:

> 100G). I also took a fresh backup before the reshape and unmounted md0. 
> Would it be faster if I trash the raid5 and create raid6 from scratch 
> instead of reshape? Any guess on how much faster? I typically got 100+M 
> on checkarray cron jobs.  It would fire around 1-2AM and finish by 6-7AM 
> on first Sun. Reshape has been running more than a day and has only 
> about 45% done.

Creating a new array would be almost as fast as checkarray, much faster 
than reshape. Re-shape moves data around where when you create, it just 
writes parity blocks, so it's a lot faster.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Raid5 to raid6 reshape or recreate?
  2014-08-13  1:05 Raid5 to raid6 reshape or recreate? Ram Ramesh
  2014-08-13  9:02 ` Mikael Abrahamsson
@ 2014-08-13 17:25 ` Ram Ramesh
  2014-08-14  3:29   ` Brad Campbell
  1 sibling, 1 reply; 6+ messages in thread
From: Ram Ramesh @ 2014-08-13 17:25 UTC (permalink / raw)
  To: linux-raid

>>>.100G). I also took a fresh backup before the reshape and unmounted md0. Would it be faster if I trash the raid5 and create raid6 from scratch instead of reshape? Any guess on how much faster? I typically got 100+M on checkarray cron jobs. It would fire around 1-2AM and finish by 6-7AM on first Sun. Reshape has been running more than a day and has only about 45% done.
>>
>>
>>Creating a new array would be almost as fast as checkarray, much faster than reshape. Re-shape moves data around where when you create, it just writes parity blocks, so it's a lot faster.

So, it is the seek that kills the reshape. I know somewhere Neil had
said to use --layout switch to put Q parity on the new disk (and I do
not recall the exact switch value).
If we built the raid6 parity on the new drive and then did the
distribution, would it not be faster as we will simply be swapping two
(or more) blocks on the same stripe.
So there will be less seek. Also we will be protected at raid6 level
during this swap.

On a slightly different topic, will it be faster after a disk
fail/replacement as opposed to raid reshaping? I am asking because
when a disk fails, the other drives are likely to be aged also. If we
juggle them around this much for a couple of days (or more), are we
not risking more fails? In other words, by choosing this method are we
not increasing
our chances of failure?

Also on a different note, is there a reason to compute parity for
unused stripes? Would it not be possible to keep a list of written
stripes much like as SSD? Is this the bitmap?

Ramesh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Raid5 to raid6 reshape or recreate?
  2014-08-13 17:25 ` Ram Ramesh
@ 2014-08-14  3:29   ` Brad Campbell
  2014-08-14  4:10     ` Ram Ramesh
  0 siblings, 1 reply; 6+ messages in thread
From: Brad Campbell @ 2014-08-14  3:29 UTC (permalink / raw)
  To: Ram Ramesh, linux-raid

On 14/08/14 01:25, Ram Ramesh wrote:

> On a slightly different topic, will it be faster after a disk
> fail/replacement as opposed to raid reshaping?

Much.
I just re-striped a RAID6 changing the chunk size from 128k to 64k. This 
took 12 days all up. It's the seeking that kills it. To replace a disk 
or do a resync on the same array takes less than 10 hours.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Raid5 to raid6 reshape or recreate?
  2014-08-14  3:29   ` Brad Campbell
@ 2014-08-14  4:10     ` Ram Ramesh
  2014-08-14  6:06       ` Brad Campbell
  0 siblings, 1 reply; 6+ messages in thread
From: Ram Ramesh @ 2014-08-14  4:10 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-raid

On 08/13/2014 10:29 PM, Brad Campbell wrote:
> On 14/08/14 01:25, Ram Ramesh wrote:
>
>> On a slightly different topic, will it be faster after a disk
>> fail/replacement as opposed to raid reshaping?
>
> Much.
> I just re-striped a RAID6 changing the chunk size from 128k to 64k. 
> This took 12 days all up. It's the seeking that kills it. To replace a 
> disk or do a resync on the same array takes less than 10 hours.
>
>
>
I am curious. Why do you have to change chunk size? What is the 
benefit/advantage?

This seem to confirm the --layout switch Neil Brown talked about. I need 
to find that switch. I have one more 3disk-raid5 to 4disk-raid6 
reshaping to be done on another machine. I am going to do the --layout 
switch and do it in two steps. I suspect it will take about 10 hours 
with minimal seek.

Ramesh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Raid5 to raid6 reshape or recreate?
  2014-08-14  4:10     ` Ram Ramesh
@ 2014-08-14  6:06       ` Brad Campbell
  0 siblings, 0 replies; 6+ messages in thread
From: Brad Campbell @ 2014-08-14  6:06 UTC (permalink / raw)
  To: Ram Ramesh; +Cc: linux-raid

On 14/08/14 12:10, Ram Ramesh wrote:
> On 08/13/2014 10:29 PM, Brad Campbell wrote:
>> On 14/08/14 01:25, Ram Ramesh wrote:
>>
>>> On a slightly different topic, will it be faster after a disk
>>> fail/replacement as opposed to raid reshaping?
>>
>> Much.
>> I just re-striped a RAID6 changing the chunk size from 128k to 64k.
>> This took 12 days all up. It's the seeking that kills it. To replace a
>> disk or do a resync on the same array takes less than 10 hours.
>>
>>
>>
> I am curious. Why do you have to change chunk size? What is the
> benefit/advantage?

When I initially selected 128k it was for a stripe of 8 chunks (10 
disks) and for a workload that contained lots of fairly big streaming 
writes and reads.  The array has since grown to 12 chunks (14 disks) and 
the workload turned out to be a lot more random than I initially had 
profiled, so I re-striped to attempt to reduce the amount of RMW 
happening on the disks.

It may well be academic, but it's a lot easier to find 768k to write in 
one action than 1.5M.

... and probably because I could.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-08-14  6:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-13  1:05 Raid5 to raid6 reshape or recreate? Ram Ramesh
2014-08-13  9:02 ` Mikael Abrahamsson
2014-08-13 17:25 ` Ram Ramesh
2014-08-14  3:29   ` Brad Campbell
2014-08-14  4:10     ` Ram Ramesh
2014-08-14  6:06       ` Brad Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox