linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Need to remove failed disk from RAID5 array
@ 2012-07-16 23:17 Alex
  2012-07-18 20:26 ` Bill Davidsen
  0 siblings, 1 reply; 14+ messages in thread
From: Alex @ 2012-07-16 23:17 UTC (permalink / raw)
  To: linux-raid

Hi,

I have a degraded RAID5 array on an fc15 box due to sda failing:

Personalities : [raid6] [raid5] [raid4]
md1 : active raid5 sda3[5](F) sdd2[4] sdc2[2] sdb2[1]
      2890747392 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      bitmap: 8/8 pages [32KB], 65536KB chunk

md0 : active raid5 sda2[5] sdd1[4] sdc1[2] sdb1[1]
      30715392 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

There's a ton of messages like these:

end_request: I/O error, dev sda, sector 1668467332
md/raid:md1: read error NOT corrected!! (sector 1646961280 on sda3).
md/raid:md1: Disk failure on sda3, disabling device.
md/raid:md1: Operation continuing on 3 devices.
md/raid:md1: read error not correctable (sector 1646961288 on sda3).

What is the proper procedure to remove the disk from the array,
shutdown the server, and reboot with a new sda?

# mdadm --version
mdadm - v3.2.5 - 18th May 2012

# mdadm -Es
ARRAY /dev/md/0 metadata=1.1 UUID=4b5a3704:c681f663:99e744e4:254ebe3e
name=pixie.example.com:0
ARRAY /dev/md/1 metadata=1.1 UUID=d5032866:15381f0b:e725e8ae:26f9a971
name=pixie.example.com:1

# mdadm --detail /dev/md1
/dev/md1:
        Version : 1.1
  Creation Time : Sun Aug  7 12:52:18 2011
     Raid Level : raid5
     Array Size : 2890747392 (2756.83 GiB 2960.13 GB)
  Used Dev Size : 963582464 (918.94 GiB 986.71 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Jul 16 19:14:11 2012
          State : active, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : pixie.example.com:1  (local to host pixie.example.com)
           UUID : d5032866:15381f0b:e725e8ae:26f9a971
         Events : 162567

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2
       4       8       50        3      active sync   /dev/sdd2

       5       8        3        -      faulty spare   /dev/sda3

I'd appreciate a pointer to any existing documentation, or some
general guidance on the proper procedure.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-16 23:17 Need to remove failed disk from RAID5 array Alex
@ 2012-07-18 20:26 ` Bill Davidsen
  2012-07-19  2:44   ` Alex
  0 siblings, 1 reply; 14+ messages in thread
From: Bill Davidsen @ 2012-07-18 20:26 UTC (permalink / raw)
  To: Alex, Linux RAID, Neil Brown

Alex wrote:
> Hi,
>
> I have a degraded RAID5 array on an fc15 box due to sda failing:
>
> Personalities : [raid6] [raid5] [raid4]
> md1 : active raid5 sda3[5](F) sdd2[4] sdc2[2] sdb2[1]
>        2890747392 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
>        bitmap: 8/8 pages [32KB], 65536KB chunk
>
> md0 : active raid5 sda2[5] sdd1[4] sdc1[2] sdb1[1]
>        30715392 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
>        bitmap: 0/1 pages [0KB], 65536KB chunk
>
> There's a ton of messages like these:
>
> end_request: I/O error, dev sda, sector 1668467332
> md/raid:md1: read error NOT corrected!! (sector 1646961280 on sda3).
> md/raid:md1: Disk failure on sda3, disabling device.
> md/raid:md1: Operation continuing on 3 devices.
> md/raid:md1: read error not correctable (sector 1646961288 on sda3).
>
> What is the proper procedure to remove the disk from the array,
> shutdown the server, and reboot with a new sda?
>
> # mdadm --version
> mdadm - v3.2.5 - 18th May 2012
>
> # mdadm -Es
> ARRAY /dev/md/0 metadata=1.1 UUID=4b5a3704:c681f663:99e744e4:254ebe3e
> name=pixie.example.com:0
> ARRAY /dev/md/1 metadata=1.1 UUID=d5032866:15381f0b:e725e8ae:26f9a971
> name=pixie.example.com:1
>
> # mdadm --detail /dev/md1
> /dev/md1:
>          Version : 1.1
>    Creation Time : Sun Aug  7 12:52:18 2011
>       Raid Level : raid5
>       Array Size : 2890747392 (2756.83 GiB 2960.13 GB)
>    Used Dev Size : 963582464 (918.94 GiB 986.71 GB)
>     Raid Devices : 4
>    Total Devices : 4
>      Persistence : Superblock is persistent
>
>    Intent Bitmap : Internal
>
>      Update Time : Mon Jul 16 19:14:11 2012
>            State : active, degraded
>   Active Devices : 3
> Working Devices : 3
>   Failed Devices : 1
>    Spare Devices : 0
>
>           Layout : left-symmetric
>       Chunk Size : 512K
>
>             Name : pixie.example.com:1  (local to host pixie.example.com)
>             UUID : d5032866:15381f0b:e725e8ae:26f9a971
>           Events : 162567
>
>      Number   Major   Minor   RaidDevice State
>         0       0        0        0      removed
>         1       8       18        1      active sync   /dev/sdb2
>         2       8       34        2      active sync   /dev/sdc2
>         4       8       50        3      active sync   /dev/sdd2
>
>         5       8        3        -      faulty spare   /dev/sda3
>
> I'd appreciate a pointer to any existing documentation, or some
> general guidance on the proper procedure.
>

Once the drive is failed about all you can do is add another drive as a spare, 
wait until the rebuild completes, then remove the old drive from the array. If 
you have a new kernel, 3.3 or newer you might have been able to use the 
undocumented but amazing "want_replacement" action to speed your rebuild, but 
when it is so bad it gets kicked I think it's too late.

Neil might have a thought on this, the option makes the rebuild vastly faster 
and safer.


-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-18 20:26 ` Bill Davidsen
@ 2012-07-19  2:44   ` Alex
  2012-07-19  3:16     ` Roman Mamedov
  0 siblings, 1 reply; 14+ messages in thread
From: Alex @ 2012-07-19  2:44 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID, Neil Brown

Hi,

>> What is the proper procedure to remove the disk from the array,
>> shutdown the server, and reboot with a new sda?
...
>> I'd appreciate a pointer to any existing documentation, or some
>> general guidance on the proper procedure.
>>
>
> Once the drive is failed about all you can do is add another drive as a
> spare, wait until the rebuild completes, then remove the old drive from the
> array. If you have a new kernel, 3.3 or newer you might have been able to
> use the undocumented but amazing "want_replacement" action to speed your
> rebuild, but when it is so bad it gets kicked I think it's too late.
>
> Neil might have a thought on this, the option makes the rebuild vastly
> faster and safer.

I've just successfully replaced the failed disk. I marked it as
failed, removed it from the array, powered the server off, switched
the disk with a replacement, rebooted and added the new disk, and it's
now starting the rebuild process.

However, it's extremely slow. This isn't a super-fast machine, but it
should at least be able to do 40M/sec as I've seen it do before. Why
would it be going at only 11M?

[root@pixie ~]# echo 100000 > /proc/sys/dev/raid/speed_limit_max
[root@pixie ~]# echo 100000 > /proc/sys/dev/raid/speed_limit_min
[root@pixie ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : active raid5 sda3[5] sdb2[1] sdd2[4] sdc2[2]
      2890747392 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      [>....................]  recovery =  4.0% (38872364/963582464)
finish=1347.4min speed=11437K/sec
      bitmap: 8/8 pages [32KB], 65536KB chunk

md0 : active raid5 sda2[5] sdb1[1] sdd1[4] sdc1[2]
      30715392 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
        resync=DELAYED
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

I'm not sure what stats I could provide to troubleshoot this further.
At this rate, the 2.7T array will take a full day to resync. Is that
to be expected?

Thanks,
Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-19  2:44   ` Alex
@ 2012-07-19  3:16     ` Roman Mamedov
  2012-07-19 14:25       ` Bill Davidsen
  2012-07-19 15:37       ` Alex
  0 siblings, 2 replies; 14+ messages in thread
From: Roman Mamedov @ 2012-07-19  3:16 UTC (permalink / raw)
  To: Alex; +Cc: Bill Davidsen, Linux RAID, Neil Brown

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]

On Wed, 18 Jul 2012 22:44:06 -0400
Alex <mysqlstudent@gmail.com> wrote:

> I'm not sure what stats I could provide to troubleshoot this further.
> At this rate, the 2.7T array will take a full day to resync. Is that
> to be expected?

1) did you try increasing stripe_cache_size?

2) maybe it's an "Advanced Format" drive, the RAID partition is not properly
aligned?

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-19  3:16     ` Roman Mamedov
@ 2012-07-19 14:25       ` Bill Davidsen
  2012-07-19 14:35         ` Roman Mamedov
  2012-07-19 21:08         ` NeilBrown
  2012-07-19 15:37       ` Alex
  1 sibling, 2 replies; 14+ messages in thread
From: Bill Davidsen @ 2012-07-19 14:25 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Alex, Linux RAID, Neil Brown

Roman Mamedov wrote:
> On Wed, 18 Jul 2012 22:44:06 -0400
> Alex <mysqlstudent@gmail.com> wrote:
>
>> I'm not sure what stats I could provide to troubleshoot this further.
>> At this rate, the 2.7T array will take a full day to resync. Is that
>> to be expected?
> 1) did you try increasing stripe_cache_size?
>
> 2) maybe it's an "Advanced Format" drive, the RAID partition is not properly
> aligned?
>
That's a good argument for not using "whole disk" array members, a partition can 
be started at a good offset and may perform better. As for the speed, since it 
is reconstructing the array data (hope the other drives are okay), every block 
written requires three blocks read and a reconstruct in cpu and memory. You can 
use "blockdev" to increase readahead, and set the devices to use the deadline 
scheduler, that _may_ improve things somewhat, but you have to read three block 
to write one, so it's not going to be fast.

-- 
Bill Davidsen <davidsen@tmr.com>
   We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination.  -me, 2010



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-19 14:25       ` Bill Davidsen
@ 2012-07-19 14:35         ` Roman Mamedov
  2012-07-19 14:51           ` Bill Davidsen
  2012-07-19 21:08         ` NeilBrown
  1 sibling, 1 reply; 14+ messages in thread
From: Roman Mamedov @ 2012-07-19 14:35 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Alex, Linux RAID, Neil Brown

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

On Thu, 19 Jul 2012 10:25:25 -0400
Bill Davidsen <davidsen@tmr.com> wrote:

> That's a good argument for not using "whole disk" array members, a partition can 
> be started at a good offset and may perform better.

Not really, at least with the modern metadata versions this is not a problem,
as the whole-disk array members will start the actual data at an offset that is
suitably aligned for the AF drives.

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-19 14:35         ` Roman Mamedov
@ 2012-07-19 14:51           ` Bill Davidsen
  0 siblings, 0 replies; 14+ messages in thread
From: Bill Davidsen @ 2012-07-19 14:51 UTC (permalink / raw)
  To: Roman Mamedov, Linux RAID

Roman Mamedov wrote:
> On Thu, 19 Jul 2012 10:25:25 -0400
> Bill Davidsen <davidsen@tmr.com> wrote:
>
>> That's a good argument for not using "whole disk" array members, a partition can
>> be started at a good offset and may perform better.
>
> Not really, at least with the modern metadata versions this is not a problem,
> as the whole-disk array members will start the actual data at an offset that is
> suitably aligned for the AF drives.
>
When did that start? And how does it handle adding a new drive to an array 
created back under 2.6 which didn't do that? Good to be aware of, though.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-19  3:16     ` Roman Mamedov
  2012-07-19 14:25       ` Bill Davidsen
@ 2012-07-19 15:37       ` Alex
  1 sibling, 0 replies; 14+ messages in thread
From: Alex @ 2012-07-19 15:37 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Bill Davidsen, Linux RAID, Neil Brown

Hi,

>> I'm not sure what stats I could provide to troubleshoot this further.
>> At this rate, the 2.7T array will take a full day to resync. Is that
>> to be expected?
>
> 1) did you try increasing stripe_cache_size?
>
> 2) maybe it's an "Advanced Format" drive, the RAID partition is not properly
> aligned?

I'm not familiar with either of those settings. The sync speed is now
down to about 6M/sec, or about two days.

I tried increasing the stripe_cache_size with various values,
including 1024, 4096, and 8192 from the default 256. Looks like the
1024 is the best, but with only a marginal increase in speed from the
256. It's now about 8M/sec.

What is an AF drive?

According to dstat, after watching for a few minutes, there are some
large reads (15-17M), but all writes are small, and there are times
when read and write are very low or near zero.

There are four disks in the system. One is a WD and the others are
Seagate's of the same model. Does this tell you anything?

# hdparm -i /dev/sda

/dev/sda:

 Model=WDC WD1002FAEX-00Z3A0, FwRev=05.01D05, SerialNo=WD-WMATR0921907
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

# hdparm -i /dev/sdb

/dev/sdb:

 Model=ST31000340NS, FwRev=SN06, SerialNo=9QJ70X3C
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-4,5,6,7

Any ideas greatly appreciated. The system is also now horribly slow.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-19 14:25       ` Bill Davidsen
  2012-07-19 14:35         ` Roman Mamedov
@ 2012-07-19 21:08         ` NeilBrown
  2012-07-20  1:04           ` Alex
  1 sibling, 1 reply; 14+ messages in thread
From: NeilBrown @ 2012-07-19 21:08 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Roman Mamedov, Alex, Linux RAID

[-- Attachment #1: Type: text/plain, Size: 1434 bytes --]

On Thu, 19 Jul 2012 10:25:25 -0400 Bill Davidsen <davidsen@tmr.com> wrote:

> Roman Mamedov wrote:
> > On Wed, 18 Jul 2012 22:44:06 -0400
> > Alex <mysqlstudent@gmail.com> wrote:
> >
> >> I'm not sure what stats I could provide to troubleshoot this further.
> >> At this rate, the 2.7T array will take a full day to resync. Is that
> >> to be expected?
> > 1) did you try increasing stripe_cache_size?
> >
> > 2) maybe it's an "Advanced Format" drive, the RAID partition is not properly
> > aligned?
> >
> That's a good argument for not using "whole disk" array members, a partition can 
> be started at a good offset and may perform better. As for the speed, since it 
> is reconstructing the array data (hope the other drives are okay), every block 
> written requires three blocks read and a reconstruct in cpu and memory. You can 
> use "blockdev" to increase readahead, and set the devices to use the deadline 
> scheduler, that _may_ improve things somewhat, but you have to read three block 
> to write one, so it's not going to be fast.
> 

Read-ahead has absolutely no effect in this context.

Read-ahead is a function of the page cache.  When filling the page cache,
read-ahead suggests how much more to be read than has been asked for.

resync/recovery does not use the page cache, consequently the readahead
setting is irrelevant.

IO scheduler choice may make a difference.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-19 21:08         ` NeilBrown
@ 2012-07-20  1:04           ` Alex
  2012-07-20  1:22             ` Bill Davidsen
                               ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Alex @ 2012-07-20  1:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: Bill Davidsen, Roman Mamedov, Linux RAID

Hi,

>> That's a good argument for not using "whole disk" array members, a partition can
>> be started at a good offset and may perform better. As for the speed, since it
>> is reconstructing the array data (hope the other drives are okay), every block
>> written requires three blocks read and a reconstruct in cpu and memory. You can
>> use "blockdev" to increase readahead, and set the devices to use the deadline
>> scheduler, that _may_ improve things somewhat, but you have to read three block
>> to write one, so it's not going to be fast.
>>
>
> Read-ahead has absolutely no effect in this context.
>
> Read-ahead is a function of the page cache.  When filling the page cache,
> read-ahead suggests how much more to be read than has been asked for.
>
> resync/recovery does not use the page cache, consequently the readahead
> setting is irrelevant.
>
> IO scheduler choice may make a difference.

It's already set for cfq. I assume that would be the preferred over deadline?

I set it on the actual disk devices. Should I also set it on md0/1
devices as well? It is currently 'none'.

/sys/devices/virtual/block/md0/queue/scheduler

Thanks,
Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-20  1:04           ` Alex
@ 2012-07-20  1:22             ` Bill Davidsen
  2012-07-20  1:37             ` NeilBrown
  2012-07-23  4:14             ` Bill Davidsen
  2 siblings, 0 replies; 14+ messages in thread
From: Bill Davidsen @ 2012-07-20  1:22 UTC (permalink / raw)
  To: Alex; +Cc: NeilBrown, Roman Mamedov, Linux RAID

Alex wrote:
> Hi,
>
>>> That's a good argument for not using "whole disk" array members, a partition can
>>> be started at a good offset and may perform better. As for the speed, since it
>>> is reconstructing the array data (hope the other drives are okay), every block
>>> written requires three blocks read and a reconstruct in cpu and memory. You can
>>> use "blockdev" to increase readahead, and set the devices to use the deadline
>>> scheduler, that _may_ improve things somewhat, but you have to read three block
>>> to write one, so it's not going to be fast.
>>>
>> Read-ahead has absolutely no effect in this context.
>>
>> Read-ahead is a function of the page cache.  When filling the page cache,
>> read-ahead suggests how much more to be read than has been asked for.
>>
>> resync/recovery does not use the page cache, consequently the readahead
>> setting is irrelevant.
>>
>> IO scheduler choice may make a difference.
> It's already set for cfq. I assume that would be the preferred over deadline?
>
> I set it on the actual disk devices. Should I also set it on md0/1
> devices as well? It is currently 'none'.
>
> /sys/devices/virtual/block/md0/queue/scheduler
>
>
Never tried doing the array itself, but do the underlying devices.


-- 
Bill Davidsen <davidsen@tmr.com>
   We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination.  -me, 2010



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-20  1:04           ` Alex
  2012-07-20  1:22             ` Bill Davidsen
@ 2012-07-20  1:37             ` NeilBrown
  2012-07-23  4:14             ` Bill Davidsen
  2 siblings, 0 replies; 14+ messages in thread
From: NeilBrown @ 2012-07-20  1:37 UTC (permalink / raw)
  To: Alex; +Cc: Bill Davidsen, Roman Mamedov, Linux RAID

[-- Attachment #1: Type: text/plain, Size: 1618 bytes --]

On Thu, 19 Jul 2012 21:04:17 -0400 Alex <mysqlstudent@gmail.com> wrote:

> Hi,
> 
> >> That's a good argument for not using "whole disk" array members, a partition can
> >> be started at a good offset and may perform better. As for the speed, since it
> >> is reconstructing the array data (hope the other drives are okay), every block
> >> written requires three blocks read and a reconstruct in cpu and memory. You can
> >> use "blockdev" to increase readahead, and set the devices to use the deadline
> >> scheduler, that _may_ improve things somewhat, but you have to read three block
> >> to write one, so it's not going to be fast.
> >>
> >
> > Read-ahead has absolutely no effect in this context.
> >
> > Read-ahead is a function of the page cache.  When filling the page cache,
> > read-ahead suggests how much more to be read than has been asked for.
> >
> > resync/recovery does not use the page cache, consequently the readahead
> > setting is irrelevant.
> >
> > IO scheduler choice may make a difference.
> 
> It's already set for cfq. I assume that would be the preferred over deadline?

The one that is preferred is the one that works best.  I suggest trying them
all and seeing what effect they have on your workload.

> 
> I set it on the actual disk devices. Should I also set it on md0/1
> devices as well? It is currently 'none'.
> 
> /sys/devices/virtual/block/md0/queue/scheduler

The queue/scheduler is meaningless for md devices.  They do not use the
block-layer queuing code.
The queue/scheduler setting on the actual devices is meaningful.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-20  1:04           ` Alex
  2012-07-20  1:22             ` Bill Davidsen
  2012-07-20  1:37             ` NeilBrown
@ 2012-07-23  4:14             ` Bill Davidsen
  2012-07-24 14:02               ` Alex
  2 siblings, 1 reply; 14+ messages in thread
From: Bill Davidsen @ 2012-07-23  4:14 UTC (permalink / raw)
  To: Alex, Linux RAID

Alex wrote:
> Hi,
>
>>> That's a good argument for not using "whole disk" array members, a partition can
>>> be started at a good offset and may perform better. As for the speed, since it
>>> is reconstructing the array data (hope the other drives are okay), every block
>>> written requires three blocks read and a reconstruct in cpu and memory. You can
>>> use "blockdev" to increase readahead, and set the devices to use the deadline
>>> scheduler, that _may_ improve things somewhat, but you have to read three block
>>> to write one, so it's not going to be fast.
>>>
>>
>> Read-ahead has absolutely no effect in this context.
>>
>> Read-ahead is a function of the page cache.  When filling the page cache,
>> read-ahead suggests how much more to be read than has been asked for.
>>
>> resync/recovery does not use the page cache, consequently the readahead
>> setting is irrelevant.
>>
>> IO scheduler choice may make a difference.
>
> It's already set for cfq. I assume that would be the preferred over deadline?
>
> I set it on the actual disk devices. Should I also set it on md0/1
> devices as well? It is currently 'none'.
>
> /sys/devices/virtual/block/md0/queue/scheduler

For what it's worth, my experience has beem that deadline works better for 
writes to arrays. In arrays with only a few drives, sometimes markedly better.


-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Need to remove failed disk from RAID5 array
  2012-07-23  4:14             ` Bill Davidsen
@ 2012-07-24 14:02               ` Alex
  0 siblings, 0 replies; 14+ messages in thread
From: Alex @ 2012-07-24 14:02 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID

Hi,

>>>> That's a good argument for not using "whole disk" array members, a
>>>> partition can
>>>> be started at a good offset and may perform better. As for the speed,
>>>> since it
>>>> is reconstructing the array data (hope the other drives are okay), every
>>>> block
>>>> written requires three blocks read and a reconstruct in cpu and memory.
>>>> You can
>>>> use "blockdev" to increase readahead, and set the devices to use the
>>>> deadline
>>>> scheduler, that _may_ improve things somewhat, but you have to read
>>>> three block
>>>> to write one, so it's not going to be fast.
>>>>
>>>
>>> Read-ahead has absolutely no effect in this context.
>>>
>>> Read-ahead is a function of the page cache.  When filling the page cache,
>>> read-ahead suggests how much more to be read than has been asked for.
>>>
>>> resync/recovery does not use the page cache, consequently the readahead
>>> setting is irrelevant.
>>>
>>> IO scheduler choice may make a difference.
>>
>>
>> It's already set for cfq. I assume that would be the preferred over
>> deadline?
>>
>> I set it on the actual disk devices. Should I also set it on md0/1
>> devices as well? It is currently 'none'.
>>
>> /sys/devices/virtual/block/md0/queue/scheduler
>
> For what it's worth, my experience has beem that deadline works better for
> writes to arrays. In arrays with only a few drives, sometimes markedly
> better.

Guys, it thought it would be worth it to follow up and let you know
that the array eventually did rebuild successfully, and is now fully
functional. It took about 4 full days to rebuild the 3.0T 4-disk RAID5
at about 4M/sec, sometimes much slower.

Thanks again,
Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-07-24 14:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-16 23:17 Need to remove failed disk from RAID5 array Alex
2012-07-18 20:26 ` Bill Davidsen
2012-07-19  2:44   ` Alex
2012-07-19  3:16     ` Roman Mamedov
2012-07-19 14:25       ` Bill Davidsen
2012-07-19 14:35         ` Roman Mamedov
2012-07-19 14:51           ` Bill Davidsen
2012-07-19 21:08         ` NeilBrown
2012-07-20  1:04           ` Alex
2012-07-20  1:22             ` Bill Davidsen
2012-07-20  1:37             ` NeilBrown
2012-07-23  4:14             ` Bill Davidsen
2012-07-24 14:02               ` Alex
2012-07-19 15:37       ` Alex

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).