Help raid10 recovery from 2 disks removed

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Help raid10 recovery from 2 disks removed
@ 2013-10-24  5:10 yuji_touya
  2013-10-24  8:54 ` Mikael Abrahamsson
  0 siblings, 1 reply; 14+ messages in thread
From: yuji_touya @ 2013-10-24  5:10 UTC (permalink / raw)
  To: linux-raid

Hi all,

Unfortunately my raid10 array (2TB x 4 disks, total 4TB) stopped working and
mdadm command reports State "removed" and "faulty removed" disks are there.
Could someone advise me how to recover the raid10 array?

I read this page and jumped into here.
https://raid.wiki.kernel.org/index.php/RAID_Recovery

Your help is appreciated,
Yuji

raid10 created as:
#  mdadm --create --verbose --assume-clean /dev/md0 --level=10 --raid-devices=4 --chunk=1024 --layout=n2 /dev/sd[bcde]

/dev/md0 is formatted by xfs.

*** when working well: ***
# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sdb[0] sde[3] sdd[2] sdc[1]
      3907028992 blocks 1024K chunks 2 near-copies [4/4] [UUUU]

unused devices: <none>

**** now broken: ****
# cat /proc/mdstat
Personalities : [raid10]
md0 : inactive sdd[2] sde[3]
      3907028992 blocks

unused devices: <none>


# mdadm --examine /dev/sd[bcde]
/dev/sdb:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 5911f80e:dfc48edd:c6cdfb15:1768b409
  Creation Time : Mon May  7 17:00:11 2012
     Raid Level : raid10
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 3907028992 (3726.03 GiB 4000.80 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Wed Oct 23 10:56:11 2013
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : d6c88478 - correct
         Events : 34220

         Layout : near=2
     Chunk Size : 1024K

      Number   Major   Minor   RaidDevice State
this     0       8       16        0      active sync   /dev/sdb

   0     0       8       16        0      active sync   /dev/sdb
   1     1       0        0        1      faulty removed
   2     2       8       48        2      active sync   /dev/sdd
   3     3       8       64        3      active sync   /dev/sde
/dev/sdc:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 5911f80e:dfc48edd:c6cdfb15:1768b409
  Creation Time : Mon May  7 17:00:11 2012
     Raid Level : raid10
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 3907028992 (3726.03 GiB 4000.80 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon Jun  3 09:32:45 2013
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : d60c30db - correct
         Events : 8

         Layout : near=2
     Chunk Size : 1024K

      Number   Major   Minor   RaidDevice State
this     1       8       32        1      active sync   /dev/sdc

   0     0       8       16        0      active sync   /dev/sdb
   1     1       8       32        1      active sync   /dev/sdc
   2     2       8       48        2      active sync   /dev/sdd
   3     3       8       64        3      active sync   /dev/sde
/dev/sdd:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 5911f80e:dfc48edd:c6cdfb15:1768b409
  Creation Time : Mon May  7 17:00:11 2012
     Raid Level : raid10
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 3907028992 (3726.03 GiB 4000.80 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Wed Oct 23 18:25:25 2013
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
       Checksum : d6c8edf7 - correct
         Events : 34224

         Layout : near=2
     Chunk Size : 1024K

      Number   Major   Minor   RaidDevice State
this     2       8       48        2      active sync   /dev/sdd

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       48        2      active sync   /dev/sdd
   3     3       8       64        3      active sync   /dev/sde
/dev/sde:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 5911f80e:dfc48edd:c6cdfb15:1768b409
  Creation Time : Mon May  7 17:00:11 2012
     Raid Level : raid10
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 3907028992 (3726.03 GiB 4000.80 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Wed Oct 23 18:25:25 2013
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
       Checksum : d6c8ee09 - correct
         Events : 34224

         Layout : near=2
     Chunk Size : 1024K

      Number   Major   Minor   RaidDevice State
this     3       8       64        3      active sync   /dev/sde

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       48        2      active sync   /dev/sdd
   3     3       8       64        3      active sync   /dev/sde



# mdadm --examine /dev/sd[b-z] | egrep 'Event|/dev/sd'
/dev/sdb:
         Events : 34220
this     0       8       16        0      active sync   /dev/sdb
   0     0       8       16        0      active sync   /dev/sdb
   2     2       8       48        2      active sync   /dev/sdd
   3     3       8       64        3      active sync   /dev/sde
/dev/sdc:
         Events : 8
this     1       8       32        1      active sync   /dev/sdc
   0     0       8       16        0      active sync   /dev/sdb
   1     1       8       32        1      active sync   /dev/sdc
   2     2       8       48        2      active sync   /dev/sdd
   3     3       8       64        3      active sync   /dev/sde
/dev/sdd:
         Events : 34224
this     2       8       48        2      active sync   /dev/sdd
   2     2       8       48        2      active sync   /dev/sdd
   3     3       8       64        3      active sync   /dev/sde
/dev/sde:
         Events : 34224
this     3       8       64        3      active sync   /dev/sde
   2     2       8       48        2      active sync   /dev/sdd
   3     3       8       64        3      active sync   /dev/sde

-----
CONFIDENTIAL: This e-mail may contain information that is confidential or otherwise protected from disclosure and intended only for the party to whom it is addressed. If you are not the intended recipient, please notify the sender by return and delete this e-mail. You are hereby formally advised that any unauthorized use, disclosure or copying of this email is strictly prohibited and may be unlawful.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Help raid10 recovery from 2 disks removed
  2013-10-24  5:10 Help raid10 recovery from 2 disks removed yuji_touya
@ 2013-10-24  8:54 ` Mikael Abrahamsson
  2013-10-24 10:14   ` yuji_touya
  0 siblings, 1 reply; 14+ messages in thread
From: Mikael Abrahamsson @ 2013-10-24  8:54 UTC (permalink / raw)
  To: yuji_touya; +Cc: linux-raid

On Thu, 24 Oct 2013, yuji_touya@yokogawa-digital.com wrote:

> Hi all,
>
> Unfortunately my raid10 array (2TB x 4 disks, total 4TB) stopped working and
> mdadm command reports State "removed" and "faulty removed" disks are there.
> Could someone advise me how to recover the raid10 array?
>
> I read this page and jumped into here.
> https://raid.wiki.kernel.org/index.php/RAID_Recovery
>
> Your help is appreciated,
>
> # mdadm --examine /dev/sd[b-z] | egrep 'Event|/dev/sd'
> /dev/sdb:
>         Events : 34220
> this     0       8       16        0      active sync   /dev/sdb
>   0     0       8       16        0      active sync   /dev/sdb
>   2     2       8       48        2      active sync   /dev/sdd
>   3     3       8       64        3      active sync   /dev/sde
> /dev/sdc:
>         Events : 8
> this     1       8       32        1      active sync   /dev/sdc
>   0     0       8       16        0      active sync   /dev/sdb
>   1     1       8       32        1      active sync   /dev/sdc
>   2     2       8       48        2      active sync   /dev/sdd
>   3     3       8       64        3      active sync   /dev/sde
> /dev/sdd:
>         Events : 34224
> this     2       8       48        2      active sync   /dev/sdd
>   2     2       8       48        2      active sync   /dev/sdd
>   3     3       8       64        3      active sync   /dev/sde
> /dev/sde:
>         Events : 34224
> this     3       8       64        3      active sync   /dev/sde
>   2     2       8       48        2      active sync   /dev/sdd
>   3     3       8       64        3      active sync   /dev/sde

Try doing --stop and then do --assemble --force with sdb, sdd and sde. 
Whatever you do, don't include sdc because it looks like sdc has been out 
of the array for a long time (the event count is 8 when all the others are 
around 34220).

You need to figure out what happened to get sdb kicked out of the array, 
check logs and "dmesg". Also use smartctl to check sdb and see if it's 
failing.

From what I can see, sdc was kicked out a long time ago and you haven't 
hard redundancy for most of the lifetime of the array.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Help raid10 recovery from 2 disks removed
  2013-10-24  8:54 ` Mikael Abrahamsson
@ 2013-10-24 10:14   ` yuji_touya
  2013-10-24 12:16     ` Phil Turmel
  2013-10-24 12:44     ` Mikael Abrahamsson
  0 siblings, 2 replies; 14+ messages in thread
From: yuji_touya @ 2013-10-24 10:14 UTC (permalink / raw)
  To: swmike; +Cc: linux-raid

Mikael,

Thank you for your great help!
I could assemble /dev/md0 by using only /dev/sd[bde], and mount it (read only) to /home.

# mdadm --misc --stop /dev/md0
mdadm: stopped /dev/md0

# mdadm --assemble --force /dev/md0 /dev/sd[bde]
mdadm: forcing event count in /dev/sdb(0) from 34220 upto 34224
mdadm: /dev/md0 has been started with 3 drives (out of 4).

# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sdb[0] sde[3] sdd[2]
      3907028992 blocks 1024K chunks 2 near-copies [4/3] [U_UU]

unused devices: <none>

> You need to figure out what happened to get sdb kicked out of the array,
> check logs and "dmesg". Also use smartctl to check sdb and see if it's
> failing.

Here's syslog entries about raid10 and smartctl output.
sdb seems to have too many bad blocks. Is that the reason why sdb was kicked out?
I'm going to copy files from /dev/md0 to anywhere else as soon as possible.
Should I repair filesystem before copying? (like xfs_repair /dev/md0)


Oct 23 18:11:29 localhost kernel: sd 0:0:1:0: [sdb] Sense Key : Medium Error [current] [descriptor]
Oct 23 18:11:29 localhost kernel: Descriptor sense data with sense descriptors (in hex):
Oct 23 18:11:29 localhost kernel:        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Oct 23 18:11:29 localhost kernel:        11 af 30 10
Oct 23 18:11:29 localhost kernel: sd 0:0:1:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed
Oct 23 18:11:29 localhost kernel: end_request: I/O error, dev sdb, sector 296693776
Oct 23 18:11:29 localhost kernel: ata1: EH complete
Oct 23 18:11:29 localhost kernel: raid10: Disk failure on sdb, disabling device.
Oct 23 18:11:29 localhost kernel: raid10: Operation continuing on 2 devices.
Oct 23 18:11:29 localhost kernel: raid10: sdb: unrecoverable I/O read error for block 593387520
Oct 23 18:11:29 localhost kernel: raid10: sdb: unrecoverable I/O read error for block 593387768
Oct 23 18:11:29 localhost kernel: raid10: sdb: unrecoverable I/O read error for block 593388016
Oct 23 18:11:29 localhost kernel: raid10: sdb: unrecoverable I/O read error for block 593396704
Oct 23 18:11:29 localhost kernel: raid10: sdb: unrecoverable I/O read error for block 593396952
Oct 23 18:11:29 localhost kernel: raid10: sdb: unrecoverable I/O read error for block 593397200

# badblocks /dev/sdb > sdb.bad
# wc -l sdb.bad
69 sdb.bad

# smartctl -a /dev/sdb
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST2000DM001-9YN164
Serial Number:    W240D0XN
Firmware Version: CC4C
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Oct 24 18:48:02 2013 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 592) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   097   006    Pre-fail  Always       -       88125160
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       14
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   068   060   030    Pre-fail  Always       -       6495909
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       12734
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       14
183 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       655
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   053   038   045    Old_age   Always   In_the_past 47 (13 90 58 39)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       7
193 Load_Cycle_Count        0x0032   096   096   000    Old_age   Always       -       9838
194 Temperature_Celsius     0x0022   047   062   000    Old_age   Always       -       47 (0 21 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       112
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       112
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       106154411688763
241 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       956764483658
242 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       154136770090

SMART Error Log Version: 1
ATA Error Count: 522 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 522 occurred at disk power-on lifetime: 12729 hours (530 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00  23d+01:40:53.125  READ DMA EXT
  27 00 00 00 00 00 e0 00  23d+01:40:53.124  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+01:40:53.121  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  23d+01:40:53.119  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  23d+01:40:53.090  READ NATIVE MAX ADDRESS EXT

Error 521 occurred at disk power-on lifetime: 12729 hours (530 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00  23d+01:40:50.249  READ DMA EXT
  27 00 00 00 00 00 e0 00  23d+01:40:50.249  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+01:40:50.246  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  23d+01:40:50.243  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  23d+01:40:50.215  READ NATIVE MAX ADDRESS EXT

Error 520 occurred at disk power-on lifetime: 12729 hours (530 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00  23d+01:40:47.373  READ DMA EXT
  27 00 00 00 00 00 e0 00  23d+01:40:47.372  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+01:40:47.369  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  23d+01:40:47.367  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  23d+01:40:47.340  READ NATIVE MAX ADDRESS EXT

Error 519 occurred at disk power-on lifetime: 12729 hours (530 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00  23d+01:40:44.506  READ DMA EXT
  27 00 00 00 00 00 e0 00  23d+01:40:44.505  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+01:40:44.502  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  23d+01:40:44.499  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  23d+01:40:44.472  READ NATIVE MAX ADDRESS EXT

Error 518 occurred at disk power-on lifetime: 12729 hours (530 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00  23d+01:40:41.622  READ DMA EXT
  27 00 00 00 00 00 e0 00  23d+01:40:41.622  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+01:40:41.618  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  23d+01:40:41.616  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  23d+01:40:41.589  READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
-----
CONFIDENTIAL: This e-mail may contain information that is confidential or otherwise protected from disclosure and intended only for the party to whom it is addressed. If you are not the intended recipient, please notify the sender by return and delete this e-mail. You are hereby formally advised that any unauthorized use, disclosure or copying of this email is strictly prohibited and may be unlawful.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Help raid10 recovery from 2 disks removed
  2013-10-24 10:14   ` yuji_touya
@ 2013-10-24 12:16     ` Phil Turmel
  2013-10-25 10:47       ` yuji_touya
  2013-10-24 12:44     ` Mikael Abrahamsson
  1 sibling, 1 reply; 14+ messages in thread
From: Phil Turmel @ 2013-10-24 12:16 UTC (permalink / raw)
  To: yuji_touya, swmike; +Cc: linux-raid

Good morning,

On 10/24/2013 06:14 AM, yuji_touya@yokogawa-digital.com wrote:
> Mikael,

[trim /]

>> You need to figure out what happened to get sdb kicked out of the array,
>> check logs and "dmesg". Also use smartctl to check sdb and see if it's
>> failing.

[trim /]

> Device Model:     ST2000DM001-9YN164

If I recall correctly, this model doesn't support error recovery
control.  If you haven't fixed your driver timeouts, it explains your
situation.

> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   115   097   006    Pre-fail  Always       -       88125160
>   3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
>   4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       14
>   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

No reallocations...

> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       112
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       112

But many sectors waiting for rewrite (which will either fix them or
reallocate them).  Rewrites can't succeed in normal MD operation with
mismatched timeouts.

If you search the archives for various combinations of "scterc",
"timeout mismatch", "URE" and "error recovery", you'll find numerous
discussion of this problem and ways to mitigate it.  (More like horror
stories, to be honest.)  Most importantly, plan to buy RAID-capable
drives in the future.

HTH,

Phil

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Help raid10 recovery from 2 disks removed
  2013-10-24 12:16     ` Phil Turmel
@ 2013-10-25 10:47       ` yuji_touya
  2013-10-25 12:07         ` Mikael Abrahamsson
  2013-10-25 12:09         ` Phil Turmel
  0 siblings, 2 replies; 14+ messages in thread
From: yuji_touya @ 2013-10-25 10:47 UTC (permalink / raw)
  To: philip, swmike; +Cc: linux-raid

Phil,

>> Device Model:     ST2000DM001-9YN164
>
> If I recall correctly, this model doesn't support error recovery
> control.  If you haven't fixed your driver timeouts, it explains your
> situation.

I had been believing that nowadays all hdd drive can reallocate bad sectors.
It is wrong, right?

I had not fixed driver timeouts. It's a new word to me.
I wonder these settings are required to use rest Seagate drives (sdd and sde).

I'll use RAID-capable drives when exchanging kicked out disks.
Thank you for your advice.

I'm learnning a lot through my problem and really thank to linux-raid community,
people helped me, and of course software RAID system. ;-)

Thanks
Yuji

-----
CONFIDENTIAL: This e-mail may contain information that is confidential or otherwise protected from disclosure and intended only for the party to whom it is addressed. If you are not the intended recipient, please notify the sender by return and delete this e-mail. You are hereby formally advised that any unauthorized use, disclosure or copying of this email is strictly prohibited and may be unlawful.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Help raid10 recovery from 2 disks removed
  2013-10-25 10:47       ` yuji_touya
@ 2013-10-25 12:07         ` Mikael Abrahamsson
  2013-10-25 12:09         ` Phil Turmel
  1 sibling, 0 replies; 14+ messages in thread
From: Mikael Abrahamsson @ 2013-10-25 12:07 UTC (permalink / raw)
  To: yuji_touya; +Cc: linux-raid

On Fri, 25 Oct 2013, yuji_touya@yokogawa-digital.com wrote:

> I had been believing that nowadays all hdd drive can reallocate bad 
> sectors. It is wrong, right?

No, the functionality referred to is how long the drive tries to read an 
errored sector before returning a read error. A RAID class device will 
typically wait 7 seconds, whereas a consumer grade device will wait 120 
(or so) seconds. The linux kernel by default timeouts the drive if it's 
unresponsive for this duration of time.

That's why you need to adjust the linux kernel timeouts, personally I set 
180 seconds before kicking the drive from the array.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Help raid10 recovery from 2 disks removed
  2013-10-25 10:47       ` yuji_touya
  2013-10-25 12:07         ` Mikael Abrahamsson
@ 2013-10-25 12:09         ` Phil Turmel
  1 sibling, 0 replies; 14+ messages in thread
From: Phil Turmel @ 2013-10-25 12:09 UTC (permalink / raw)
  To: yuji_touya, swmike; +Cc: linux-raid

On 10/25/2013 06:47 AM, yuji_touya@yokogawa-digital.com wrote:
> Phil,
> 
>>> Device Model:     ST2000DM001-9YN164
>>
>> If I recall correctly, this model doesn't support error recovery
>> control.  If you haven't fixed your driver timeouts, it explains your
>> situation.
> 
> I had been believing that nowadays all hdd drive can reallocate bad sectors.
> It is wrong, right?

Yes and no.  Yes, all modern hard drives can reallocate bad sectors.
No, they can't do so until that sector is *written*.  Since you can't
read the data, and you need to write it to fix it, you need to get that
data from a redundant location.

The kernel MD module will do this on a running array when it encounters
a read error.  Your problem is that your drives do not report read
errors before the low level kernel driver (not MD) times out (30
seconds).  Non-raid drives typically take over two minutes to report a
read error.  When the kernel driver times out, it attempts to reset the
connection to the drive.  But the drive is still trying to read, and
ignores the reset.  So when MD tries to write the sector to repair it,
the drive appears to be disconnected.  *BOOM*

When you look at it a little later, it appears to be fine (and it
probably is).

> I had not fixed driver timeouts. It's a new word to me.
> I wonder these settings are required to use rest Seagate drives (sdd and sde).

Please search the list archives.  I've explained in detail many times in
the past few years.

And yes, you need to deal with the problem for all of your drives.

> I'll use RAID-capable drives when exchanging kicked out disks.
> Thank you for your advice.
> 
> I'm learnning a lot through my problem and really thank to linux-raid community,
> people helped me, and of course software RAID system. ;-)

You're welcome.

Phil

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Help raid10 recovery from 2 disks removed
  2013-10-24 10:14   ` yuji_touya
  2013-10-24 12:16     ` Phil Turmel
@ 2013-10-24 12:44     ` Mikael Abrahamsson
  2013-10-25  7:27       ` Dag Nygren
  1 sibling, 1 reply; 14+ messages in thread
From: Mikael Abrahamsson @ 2013-10-24 12:44 UTC (permalink / raw)
  To: yuji_touya; +Cc: linux-raid

On Thu, 24 Oct 2013, yuji_touya@yokogawa-digital.com wrote:

> Here's syslog entries about raid10 and smartctl output.
> sdb seems to have too many bad blocks. Is that the reason why sdb was kicked out?

Most likely.

> I'm going to copy files from /dev/md0 to anywhere else as soon as possible.
> Should I repair filesystem before copying? (like xfs_repair /dev/md0)

What you need to do now is to use dd_rescue or equivalent to copy the data 
off of sdb to a good drive. Stop the array first. This means you'll lose 
data on the bad blocks. After this is done, and you have assembled the 
array with the good drive with (most of) the data from sdb, start the 
array, then hot-add in sdc and let things sync up. You should now have 
redundancy.

Also check why you didn't get notification that sdc wasn't part of the 
array, usually mdmon or equivalent will send email about these events.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Help raid10 recovery from 2 disks removed
  2013-10-24 12:44     ` Mikael Abrahamsson
@ 2013-10-25  7:27       ` Dag Nygren
  2013-10-25  8:24         ` Mikael Abrahamsson
                           ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Dag Nygren @ 2013-10-25  7:27 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: yuji_touya, linux-raid

On Thursday 24 October 2013 14:44:14 Mikael Abrahamsson wrote:
> On Thu, 24 Oct 2013, yuji_touya@yokogawa-digital.com wrote:
> 
> > Here's syslog entries about raid10 and smartctl output.
> > sdb seems to have too many bad blocks. Is that the reason why sdb was kicked out?
> 
> Most likely.
> 
> > I'm going to copy files from /dev/md0 to anywhere else as soon as possible.
> > Should I repair filesystem before copying? (like xfs_repair /dev/md0)
> 
> What you need to do now is to use dd_rescue or equivalent to copy the data 
> off of sdb to a good drive. Stop the array first. This means you'll lose 
> data on the bad blocks. After this is done, and you have assembled the 
> array with the good drive with (most of) the data from sdb, start the 
> array, then hot-add in sdc and let things sync up. You should now have 
> redundancy.

all!

Just had a fight with this myself, also using Seagate drives.
And I don't think he needs to loose any data, nor use ddrescue here.

Just enabling scterc (which is disabled by default and will be
after a power down of the drive), setting the timeout 
and then running a repair on the array
fixed it for me as md was smart enough to try to rewrite the
sector(s) that had failed and with scterc the drive would then reallocate
the failed sector. 
I thought I had this done, but a syntax error in the script had
prevented it from working.. :-( )

The working script I ran for this was:
=============================
# Set up RAID drive timeouts
for x in b c d e
do
        smartctl -l scterc,70,70 /dev/sd$x
        echo 180 >/sys/block/sd$x/device/timeout
done
==============================

After taht run "echo "repair" >/sys/block/md0/md/sync_action"

This should move the 112 count for your "Pending" sectors to "Reallocated_Sector_Ct"
in the smartctl output and fix your array.
After that again you should readd the drive that has been missing almost since
the initialization of the array and keep a close eye on the error counts there.

You should also keep an eye on the Reallocated_Sector_Ct for sdb though.
Your 112 is still below the health limit for Seagate's (200), but it is
fairly high and indicates a "not so good" drive.
If the count goes over 200 Seagate will replace the drive.

If someone with more insight has objections to the procedure above, please
tell me. But this worked for me.

> Also check why you didn't get notification that sdc wasn't part of the 
> array, usually mdmon or equivalent will send email about these events.

Good advice! Set up the smartctl email address!

Best
Dag

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Help raid10 recovery from 2 disks removed
  2013-10-25  7:27       ` Dag Nygren
@ 2013-10-25  8:24         ` Mikael Abrahamsson
  2013-10-25  8:34           ` Dag Nygren
  2013-10-25 10:08         ` yuji_touya
  2013-10-25 12:21         ` Phil Turmel
  2 siblings, 1 reply; 14+ messages in thread
From: Mikael Abrahamsson @ 2013-10-25  8:24 UTC (permalink / raw)
  To: Dag Nygren; +Cc: yuji_touya, linux-raid

On Fri, 25 Oct 2013, Dag Nygren wrote:

> And I don't think he needs to loose any data, nor use ddrescue here.

He has no redundancy drive for his data, so unfortunately I believe he 
does.

Your advice makes a lot of sense and all people should look into this (if 
you feel like it, document it on the wiki, but in his case his parity 
drive was kicked out long ago.

> After taht run "echo "repair" >/sys/block/md0/md/sync_action"

Works great if there is proper redundancy.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Help raid10 recovery from 2 disks removed
  2013-10-25  8:24         ` Mikael Abrahamsson
@ 2013-10-25  8:34           ` Dag Nygren
  0 siblings, 0 replies; 14+ messages in thread
From: Dag Nygren @ 2013-10-25  8:34 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: yuji_touya, linux-raid

On Friday 25 October 2013 10:24:45 Mikael Abrahamsson wrote:
> On Fri, 25 Oct 2013, Dag Nygren wrote:
> 
> > And I don't think he needs to loose any data, nor use ddrescue here.
> 
> He has no redundancy drive for his data, so unfortunately I believe he 
> does.

Ouch. Yes, of course, I somehow misread and thought he had a RAID-6.
Sorry for the misunderstanding!

Best
Dag

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Help raid10 recovery from 2 disks removed
  2013-10-25  7:27       ` Dag Nygren
  2013-10-25  8:24         ` Mikael Abrahamsson
@ 2013-10-25 10:08         ` yuji_touya
  2013-10-25 12:21         ` Phil Turmel
  2 siblings, 0 replies; 14+ messages in thread
From: yuji_touya @ 2013-10-25 10:08 UTC (permalink / raw)
  To: dag, swmike; +Cc: linux-raid

Hi Dag,

> Just had a fight with this myself, also using Seagate drives.
> And I don't think he needs to loose any data, nor use ddrescue here.

What a coincidence! thank you for your advice.
But sorry, dd_rescue is now running on my PC. (Thank you Mikael)
Looking at the progress, it will take about 12 hours more to finish copying.

To move the 112 "Pending" sectors to "Reallocated_Sector_Ct" in sdb, redundancy
drive is required? I don't know whether it works or not in my case.

>> Also check why you didn't get notification that sdc wasn't part of the
>> array, usually mdmon or equivalent will send email about these events.
>
> Good advice! Set up the smartctl email address!

Not mdadm.conf but smartctl?
I added "MAILADDR" entry (mailing list in my dept.) to /etc/mdadm.conf, and
mdmonitor service was running. But no notifier e-mail was delivered to the list.
There is a mail server issue in my company. :(
Anyway now I set my personal e-mail address to MAILADDR and found notifier works.

Thanks
Yuji

-----
CONFIDENTIAL: This e-mail may contain information that is confidential or otherwise protected from disclosure and intended only for the party to whom it is addressed. If you are not the intended recipient, please notify the sender by return and delete this e-mail. You are hereby formally advised that any unauthorized use, disclosure or copying of this email is strictly prohibited and may be unlawful.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Help raid10 recovery from 2 disks removed
  2013-10-25  7:27       ` Dag Nygren
  2013-10-25  8:24         ` Mikael Abrahamsson
  2013-10-25 10:08         ` yuji_touya
@ 2013-10-25 12:21         ` Phil Turmel
  2013-10-25 16:05           ` Dag Nygren
  2 siblings, 1 reply; 14+ messages in thread
From: Phil Turmel @ 2013-10-25 12:21 UTC (permalink / raw)
  To: dag, Mikael Abrahamsson; +Cc: yuji_touya, linux-raid

Good morning Dag,

On 10/25/2013 03:27 AM, Dag Nygren wrote:

> Just enabling scterc (which is disabled by default and will be
> after a power down of the drive), setting the timeout 
> and then running a repair on the array
> fixed it for me as md was smart enough to try to rewrite the
> sector(s) that had failed and with scterc the drive would then reallocate
> the failed sector. 
> I thought I had this done, but a syntax error in the script had
> prevented it from working.. :-( )
> 
> The working script I ran for this was:
> =============================
> # Set up RAID drive timeouts
> for x in b c d e
> do
>         smartctl -l scterc,70,70 /dev/sd$x
>         echo 180 >/sys/block/sd$x/device/timeout
> done
> ==============================

You shouldn't do both.  You only need the long driver timeout if the
hard disk doesn't support scterc.  Long timeouts are bad for application
software, as you can get very long system pauses while waiting for a
sector recovery.  But the long timeout is the only option if you have
non-scterc drives.

Some time ago I posted a similar script that checked the result code
from smartctl.  It only resets the driver timeout if smartctl couldn't
set scterc.

HTH,

Phil

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Help raid10 recovery from 2 disks removed
  2013-10-25 12:21         ` Phil Turmel
@ 2013-10-25 16:05           ` Dag Nygren
  0 siblings, 0 replies; 14+ messages in thread
From: Dag Nygren @ 2013-10-25 16:05 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Mikael Abrahamsson, yuji_touya, linux-raid

On Friday 25 October 2013 08:21:09 Phil Turmel wrote:
> Good morning Dag,
> 
> On 10/25/2013 03:27 AM, Dag Nygren wrote:
> 
> > Just enabling scterc (which is disabled by default and will be
> > after a power down of the drive), setting the timeout 
> > and then running a repair on the array
> > fixed it for me as md was smart enough to try to rewrite the
> > sector(s) that had failed and with scterc the drive would then reallocate
> > the failed sector. 
> > I thought I had this done, but a syntax error in the script had
> > prevented it from working.. :-( )
> > 
> > The working script I ran for this was:
> > =============================
> > # Set up RAID drive timeouts
> > for x in b c d e
> > do
> >         smartctl -l scterc,70,70 /dev/sd$x
> >         echo 180 >/sys/block/sd$x/device/timeout
> > done
> > ==============================
> 
> You shouldn't do both.  You only need the long driver timeout if the
> hard disk doesn't support scterc.  Long timeouts are bad for application
> software, as you can get very long system pauses while waiting for a
> sector recovery.  But the long timeout is the only option if you have
> non-scterc drives.

Ok, Just wanted to be on the safe side.. And this array is just serving
my MythTV recordings, so not very critical.

> Some time ago I posted a similar script that checked the result code
> from smartctl.  It only resets the driver timeout if smartctl couldn't
> set scterc.

This is of course even more optimal.

Thanks!
Dag

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-10-25 16:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-24  5:10 Help raid10 recovery from 2 disks removed yuji_touya
2013-10-24  8:54 ` Mikael Abrahamsson
2013-10-24 10:14   ` yuji_touya
2013-10-24 12:16     ` Phil Turmel
2013-10-25 10:47       ` yuji_touya
2013-10-25 12:07         ` Mikael Abrahamsson
2013-10-25 12:09         ` Phil Turmel
2013-10-24 12:44     ` Mikael Abrahamsson
2013-10-25  7:27       ` Dag Nygren
2013-10-25  8:24         ` Mikael Abrahamsson
2013-10-25  8:34           ` Dag Nygren
2013-10-25 10:08         ` yuji_touya
2013-10-25 12:21         ` Phil Turmel
2013-10-25 16:05           ` Dag Nygren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox