Help with recovering a RAID5 array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Help with recovering a RAID5 array
@ 2013-05-02 12:24 Stefan Borggraefe
  2013-05-02 12:30 ` Mathias Burén
  2013-05-03  8:38 ` Ole Tange
  0 siblings, 2 replies; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-02 12:24 UTC (permalink / raw)
  To: linux-raid

Hi,

I am using a RAID5 software RAID on Ubuntu 12.04 (kernel
3.2.0-37-generic x86_64).

It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file system.
There are no spare devices.

Yesterday evening I exchanged a drive that showed SMART errors and the
array started rebuilding its redundancy normally.

When I returned to this server this morning, the array was in the following
state:

md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
      19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4] 
[U_U_UU]

sdc is the newly added hard disk, but now also sdd failed. :( It would be
great if there was a way to have the this RAID5 working again. Perhaps sdc1
can then be fully added to the array and after this drive sdd also exchanged.

I have not started experimenting or changing this array in any way, but wanted 
to ask here for assistance first. Thank you for your help!

mdadm --examine /dev/sd[cdegfh]1 | egrep 'Event|/dev/sd'

shows

/dev/sdc1:
         Events : 494
/dev/sdd1:
         Events : 478
/dev/sde1:
         Events : 494
/dev/sdf1:
         Events : 494
/dev/sdg1:
         Events : 494
/dev/sdh1:
         Events : 494



mdadm --examine /dev/sd[cdegfh]1

showsThank you for your help! :)

/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 7433213e:0dd2e5ed:073dd59d:bf1f83d8

    Update Time : Tue Apr 30 10:06:55 2013
       Checksum : 9e83f72 - correct
         Events : 494

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : c2e5423f:6d91a061:c3f55aa7:6d1cec87

    Update Time : Mon Apr 29 17:24:26 2013
       Checksum : 37b97776 - correct
         Events : 478

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 68207885:02c05297:8ef62633:65b83839

    Update Time : Tue Apr 30 10:06:55 2013
       Checksum : f0b36c7f - correct
         Events : 494

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 7d328a98:6c02f550:ab1837c0:cb773ac1

    Update Time : Tue Apr 30 10:06:55 2013
       Checksum : d2799f34 - correct
         Events : 494

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 76b683b1:58e053ff:57ac0cfc:be114f75

    Update Time : Tue Apr 30 10:06:55 2013
       Checksum : 89bc2e05 - correct
         Events : 494

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 3c88705f:9f3add0e:d58d46a7:b40d02d7

    Update Time : Tue Apr 30 10:06:55 2013
       Checksum : 541f3913 - correct
         Events : 494

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : A.A.AA ('A' == active, '.' == missing)

This is the dmesg output from when the failure happened:

[6669459.855352] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855362] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.855368] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 2a 00 00 08 
00
[6669459.855387] end_request: I/O error, dev sdd, sector 590910506
[6669459.855456] raid5_end_read_request: 14 callbacks suppressed
[6669459.855463] md/raid:md126: read error not correctable (sector 590910472 
on sdd1).
[6669459.855490] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855496] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.855501] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 32 00 00 08 
00
[6669459.855515] end_request: I/O error, dev sdd, sector 590910514
[6669459.855594] md/raid:md126: read error not correctable (sector 590910480 
on sdd1).
[6669459.855608] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855611] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.855620] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 3a 00 00 08 
00
[6669459.855648] end_request: I/O error, dev sdd, sector 590910522
[6669459.855710] md/raid:md126: read error not correctable (sector 590910488 
on sdd1).
[6669459.855720] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855723] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.855727] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 42 00 00 08 
00
[6669459.855737] end_request: I/O error, dev sdd, sector 590910530
[6669459.855796] md/raid:md126: read error not correctable (sector 590910496 
on sdd1).
[6669459.855814] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855817] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.855821] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 4a 00 00 08 
00
[6669459.855831] end_request: I/O error, dev sdd, sector 590910538
[6669459.855889] md/raid:md126: read error not correctable (sector 590910504 
on sdd1).
[6669459.855907] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855910] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.855914] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 52 00 00 08 
00
[6669459.855924] end_request: I/O error, dev sdd, sector 590910546
[6669459.855982] md/raid:md126: read error not correctable (sector 590910512 
on sdd1).
[6669459.855990] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855992] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.855996] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 5a 00 00 08 
00
[6669459.856004] end_request: I/O error, dev sdd, sector 590910554
[6669459.856062] md/raid:md126: read error not correctable (sector 590910520 
on sdd1).
[6669459.856072] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856075] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.856079] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 62 00 00 08 
00
[6669459.856088] end_request: I/O error, dev sdd, sector 590910562
[6669459.856153] md/raid:md126: read error not correctable (sector 590910528 
on sdd1).
[6669459.856171] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856174] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.856178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 6a 00 00 08 
00
[6669459.856188] end_request: I/O error, dev sdd, sector 590910570
[6669459.856256] md/raid:md126: read error not correctable (sector 590910536 
on sdd1).
[6669459.856265] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856268] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.856272] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 72 00 00 08 
00
[6669459.856281] end_request: I/O error, dev sdd, sector 590910578
[6669459.856346] md/raid:md126: read error not correctable (sector 590910544 
on sdd1).
[6669459.856364] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856368] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.856374] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 7a 00 00 08 
00
[6669459.856385] end_request: I/O error, dev sdd, sector 590910586
[6669459.856445] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856449] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.856456] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 82 00 00 08 
00
[6669459.856466] end_request: I/O error, dev sdd, sector 590910594
[6669459.856526] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856530] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.856537] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 8a 00 00 08 
00
[6669459.856547] end_request: I/O error, dev sdd, sector 590910602
[6669459.856607] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856611] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.856617] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 92 00 00 08 
00
[6669459.856628] end_request: I/O error, dev sdd, sector 590910610
[6669459.856687] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856691] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.856697] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 9a 00 00 08 
00
[6669459.856707] end_request: I/O error, dev sdd, sector 590910618
[6669459.856767] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856772] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.856778] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 a2 00 00 08 
00
[6669459.856788] end_request: I/O error, dev sdd, sector 590910626
[6669459.856847] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856851] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.856859] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 aa 00 00 08 
00
[6669459.856869] end_request: I/O error, dev sdd, sector 590910634
[6669459.856928] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856932] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.856938] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 b2 00 00 08 
00
[6669459.856949] end_request: I/O error, dev sdd, sector 590910642
[6669459.857008] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857011] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.857018] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ba 00 00 08 
00
[6669459.857028] end_request: I/O error, dev sdd, sector 590910650
[6669459.857088] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857092] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.857098] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 c2 00 00 08 
00
[6669459.857109] end_request: I/O error, dev sdd, sector 590910658
[6669459.857168] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857171] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.857178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ca 00 00 08 
00
[6669459.857188] end_request: I/O error, dev sdd, sector 590910666
[6669459.857248] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857251] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.857258] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 d2 00 00 08 
00
[6669459.857269] end_request: I/O error, dev sdd, sector 590910674
[6669459.857328] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857333] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.857339] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 da 00 00 08 
00
[6669459.857349] end_request: I/O error, dev sdd, sector 590910682
[6669459.857408] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857412] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.857418] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 e2 00 00 08 
00
[6669459.857429] end_request: I/O error, dev sdd, sector 590910690
[6669459.857488] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857492] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.857499] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 93 4a 00 00 08 
00
[6669459.857509] end_request: I/O error, dev sdd, sector 590910282
[6669459.857569] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857573] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_OK
[6669459.857579] sd 6:1:10:0: [sdd] CDB: 
[6669459.857585] aacraid: Host adapter abort request (6,1,10,0)
[6669459.857639] Read(10): 28 00 23 38 93 42 00 00 08 00
[6669459.857648] end_request: I/O error, dev sdd, sector 590910274
[6669459.857844] aacraid: Host adapter reset request. SCSI hang ?
[6669470.028090] RAID conf printout:
[6669470.028097]  --- level:5 rd:6 wd:4
[6669470.028101]  disk 0, o:1, dev:sde1
[6669470.028105]  disk 1, o:1, dev:sdc1
[6669470.028109]  disk 2, o:1, dev:sdf1
[6669470.028112]  disk 3, o:0, dev:sdd1
[6669470.028115]  disk 4, o:1, dev:sdh1
[6669470.028118]  disk 5, o:1, dev:sdg1
[6669470.034462] RAID conf printout:
[6669470.034464]  --- level:5 rd:6 wd:4
[6669470.034465]  disk 0, o:1, dev:sde1
[6669470.034466]  disk 2, o:1, dev:sdf1
[6669470.034467]  disk 3, o:0, dev:sdd1
[6669470.034468]  disk 4, o:1, dev:sdh1
[6669470.034469]  disk 5, o:1, dev:sdg1
[6669470.034484] RAID conf printout:
[6669470.034486]  --- level:5 rd:6 wd:4
[6669470.034489]  disk 0, o:1, dev:sde1
[6669470.034491]  disk 2, o:1, dev:sdf1
[6669470.034494]  disk 3, o:0, dev:sdd1
[6669470.034496]  disk 4, o:1, dev:sdh1
[6669470.034499]  disk 5, o:1, dev:sdg1
[6669470.034571] RAID conf printout:
[6669470.034577]  --- level:5 rd:6 wd:4
[6669470.034581]  disk 0, o:1, dev:sde1
[6669470.034584]  disk 2, o:1, dev:sdf1
[6669470.034587]  disk 4, o:1, dev:sdh1
[6669470.034589]  disk 5, o:1, dev:sdg1

Please let me know if you need any more information.
-- 
Best regards,
Stefan Borggraefe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-02 12:24 Help with recovering a RAID5 array Stefan Borggraefe
@ 2013-05-02 12:30 ` Mathias Burén
  2013-05-02 13:14   ` Stefan Borggraefe
  2013-05-03  8:38 ` Ole Tange
  1 sibling, 1 reply; 13+ messages in thread
From: Mathias Burén @ 2013-05-02 12:30 UTC (permalink / raw)
  To: Stefan Borggraefe; +Cc: Linux-RAID

On 2 May 2013 13:24, Stefan Borggraefe <stefan@spybot.info> wrote:
> Hi,
>
> I am using a RAID5 software RAID on Ubuntu 12.04 (kernel
> 3.2.0-37-generic x86_64).
>
> It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file system.
> There are no spare devices.
>
> Yesterday evening I exchanged a drive that showed SMART errors and the
> array started rebuilding its redundancy normally.
>
> When I returned to this server this morning, the array was in the following
> state:
>
> md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
>       19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> [U_U_UU]
>
> sdc is the newly added hard disk, but now also sdd failed. :( It would be
> great if there was a way to have the this RAID5 working again. Perhaps sdc1
> can then be fully added to the array and after this drive sdd also exchanged.
>
> I have not started experimenting or changing this array in any way, but wanted
> to ask here for assistance first. Thank you for your help!
>
> mdadm --examine /dev/sd[cdegfh]1 | egrep 'Event|/dev/sd'
>
> shows
>
> /dev/sdc1:
>          Events : 494
> /dev/sdd1:
>          Events : 478
> /dev/sde1:
>          Events : 494
> /dev/sdf1:
>          Events : 494
> /dev/sdg1:
>          Events : 494
> /dev/sdh1:
>          Events : 494
>
>
>
> mdadm --examine /dev/sd[cdegfh]1
>
> showsThank you for your help! :)
>
> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 7433213e:0dd2e5ed:073dd59d:bf1f83d8
>
>     Update Time : Tue Apr 30 10:06:55 2013
>        Checksum : 9e83f72 - correct
>          Events : 494
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : spare
>    Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdd1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : c2e5423f:6d91a061:c3f55aa7:6d1cec87
>
>     Update Time : Mon Apr 29 17:24:26 2013
>        Checksum : 37b97776 - correct
>          Events : 478
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 3
>    Array State : AAAAAA ('A' == active, '.' == missing)
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 68207885:02c05297:8ef62633:65b83839
>
>     Update Time : Tue Apr 30 10:06:55 2013
>        Checksum : f0b36c7f - correct
>          Events : 494
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 0
>    Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdf1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 7d328a98:6c02f550:ab1837c0:cb773ac1
>
>     Update Time : Tue Apr 30 10:06:55 2013
>        Checksum : d2799f34 - correct
>          Events : 494
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 2
>    Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdg1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 76b683b1:58e053ff:57ac0cfc:be114f75
>
>     Update Time : Tue Apr 30 10:06:55 2013
>        Checksum : 89bc2e05 - correct
>          Events : 494
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 5
>    Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdh1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 3c88705f:9f3add0e:d58d46a7:b40d02d7
>
>     Update Time : Tue Apr 30 10:06:55 2013
>        Checksum : 541f3913 - correct
>          Events : 494
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 4
>    Array State : A.A.AA ('A' == active, '.' == missing)
>
> This is the dmesg output from when the failure happened:
>
> [6669459.855352] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855362] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855368] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 2a 00 00 08
> 00
> [6669459.855387] end_request: I/O error, dev sdd, sector 590910506
> [6669459.855456] raid5_end_read_request: 14 callbacks suppressed
> [6669459.855463] md/raid:md126: read error not correctable (sector 590910472
> on sdd1).
> [6669459.855490] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855496] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855501] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 32 00 00 08
> 00
> [6669459.855515] end_request: I/O error, dev sdd, sector 590910514
> [6669459.855594] md/raid:md126: read error not correctable (sector 590910480
> on sdd1).
> [6669459.855608] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855611] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855620] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 3a 00 00 08
> 00
> [6669459.855648] end_request: I/O error, dev sdd, sector 590910522
> [6669459.855710] md/raid:md126: read error not correctable (sector 590910488
> on sdd1).
> [6669459.855720] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855723] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855727] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 42 00 00 08
> 00
> [6669459.855737] end_request: I/O error, dev sdd, sector 590910530
> [6669459.855796] md/raid:md126: read error not correctable (sector 590910496
> on sdd1).
> [6669459.855814] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855817] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855821] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 4a 00 00 08
> 00
> [6669459.855831] end_request: I/O error, dev sdd, sector 590910538
> [6669459.855889] md/raid:md126: read error not correctable (sector 590910504
> on sdd1).
> [6669459.855907] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855910] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855914] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 52 00 00 08
> 00
> [6669459.855924] end_request: I/O error, dev sdd, sector 590910546
> [6669459.855982] md/raid:md126: read error not correctable (sector 590910512
> on sdd1).
> [6669459.855990] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855992] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855996] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 5a 00 00 08
> 00
> [6669459.856004] end_request: I/O error, dev sdd, sector 590910554
> [6669459.856062] md/raid:md126: read error not correctable (sector 590910520
> on sdd1).
> [6669459.856072] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856075] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856079] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 62 00 00 08
> 00
> [6669459.856088] end_request: I/O error, dev sdd, sector 590910562
> [6669459.856153] md/raid:md126: read error not correctable (sector 590910528
> on sdd1).
> [6669459.856171] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856174] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 6a 00 00 08
> 00
> [6669459.856188] end_request: I/O error, dev sdd, sector 590910570
> [6669459.856256] md/raid:md126: read error not correctable (sector 590910536
> on sdd1).
> [6669459.856265] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856268] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856272] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 72 00 00 08
> 00
> [6669459.856281] end_request: I/O error, dev sdd, sector 590910578
> [6669459.856346] md/raid:md126: read error not correctable (sector 590910544
> on sdd1).
> [6669459.856364] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856368] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856374] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 7a 00 00 08
> 00
> [6669459.856385] end_request: I/O error, dev sdd, sector 590910586
> [6669459.856445] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856449] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856456] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 82 00 00 08
> 00
> [6669459.856466] end_request: I/O error, dev sdd, sector 590910594
> [6669459.856526] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856530] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856537] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 8a 00 00 08
> 00
> [6669459.856547] end_request: I/O error, dev sdd, sector 590910602
> [6669459.856607] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856611] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856617] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 92 00 00 08
> 00
> [6669459.856628] end_request: I/O error, dev sdd, sector 590910610
> [6669459.856687] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856691] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856697] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 9a 00 00 08
> 00
> [6669459.856707] end_request: I/O error, dev sdd, sector 590910618
> [6669459.856767] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856772] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856778] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 a2 00 00 08
> 00
> [6669459.856788] end_request: I/O error, dev sdd, sector 590910626
> [6669459.856847] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856851] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856859] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 aa 00 00 08
> 00
> [6669459.856869] end_request: I/O error, dev sdd, sector 590910634
> [6669459.856928] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856932] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856938] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 b2 00 00 08
> 00
> [6669459.856949] end_request: I/O error, dev sdd, sector 590910642
> [6669459.857008] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857011] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857018] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ba 00 00 08
> 00
> [6669459.857028] end_request: I/O error, dev sdd, sector 590910650
> [6669459.857088] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857092] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857098] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 c2 00 00 08
> 00
> [6669459.857109] end_request: I/O error, dev sdd, sector 590910658
> [6669459.857168] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857171] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ca 00 00 08
> 00
> [6669459.857188] end_request: I/O error, dev sdd, sector 590910666
> [6669459.857248] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857251] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857258] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 d2 00 00 08
> 00
> [6669459.857269] end_request: I/O error, dev sdd, sector 590910674
> [6669459.857328] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857333] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857339] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 da 00 00 08
> 00
> [6669459.857349] end_request: I/O error, dev sdd, sector 590910682
> [6669459.857408] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857412] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857418] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 e2 00 00 08
> 00
> [6669459.857429] end_request: I/O error, dev sdd, sector 590910690
> [6669459.857488] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857492] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857499] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 93 4a 00 00 08
> 00
> [6669459.857509] end_request: I/O error, dev sdd, sector 590910282
> [6669459.857569] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857573] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857579] sd 6:1:10:0: [sdd] CDB:
> [6669459.857585] aacraid: Host adapter abort request (6,1,10,0)
> [6669459.857639] Read(10): 28 00 23 38 93 42 00 00 08 00
> [6669459.857648] end_request: I/O error, dev sdd, sector 590910274
> [6669459.857844] aacraid: Host adapter reset request. SCSI hang ?
> [6669470.028090] RAID conf printout:
> [6669470.028097]  --- level:5 rd:6 wd:4
> [6669470.028101]  disk 0, o:1, dev:sde1
> [6669470.028105]  disk 1, o:1, dev:sdc1
> [6669470.028109]  disk 2, o:1, dev:sdf1
> [6669470.028112]  disk 3, o:0, dev:sdd1
> [6669470.028115]  disk 4, o:1, dev:sdh1
> [6669470.028118]  disk 5, o:1, dev:sdg1
> [6669470.034462] RAID conf printout:
> [6669470.034464]  --- level:5 rd:6 wd:4
> [6669470.034465]  disk 0, o:1, dev:sde1
> [6669470.034466]  disk 2, o:1, dev:sdf1
> [6669470.034467]  disk 3, o:0, dev:sdd1
> [6669470.034468]  disk 4, o:1, dev:sdh1
> [6669470.034469]  disk 5, o:1, dev:sdg1
> [6669470.034484] RAID conf printout:
> [6669470.034486]  --- level:5 rd:6 wd:4
> [6669470.034489]  disk 0, o:1, dev:sde1
> [6669470.034491]  disk 2, o:1, dev:sdf1
> [6669470.034494]  disk 3, o:0, dev:sdd1
> [6669470.034496]  disk 4, o:1, dev:sdh1
> [6669470.034499]  disk 5, o:1, dev:sdg1
> [6669470.034571] RAID conf printout:
> [6669470.034577]  --- level:5 rd:6 wd:4
> [6669470.034581]  disk 0, o:1, dev:sde1
> [6669470.034584]  disk 2, o:1, dev:sdf1
> [6669470.034587]  disk 4, o:1, dev:sdh1
> [6669470.034589]  disk 5, o:1, dev:sdg1
>
> Please let me know if you need any more information.
> --
> Best regards,
> Stefan Borggraefe
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


I won't scold you for using RAID5 instead of RAID6 with this number of
if drives and especially the size of the drives.

Could you please post the output of smartctl -a for each device? (from
smartmontools)

That way we can verify which HDDs are broken, before proceeding.

Mathias

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-02 12:30 ` Mathias Burén
@ 2013-05-02 13:14   ` Stefan Borggraefe
  2013-05-02 13:17     ` Mathias Burén
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-02 13:14 UTC (permalink / raw)
  To: Mathias Burén; +Cc: Linux-RAID

Am Donnerstag, 2. Mai 2013, 13:30:22 schrieb Mathias Burén:
> I won't scold you for using RAID5 instead of RAID6 with this number of
> if drives and especially the size of the drives.
> 
> Could you please post the output of smartctl -a for each device? (from
> smartmontools)
> 
> That way we can verify which HDDs are broken, before proceeding.
> 
> Mathias

Hello Mathias,

RAID6 would have been the safer option clearly, but we needed the
extra-space and only had this number of drives available.

Here the requested output:

smartctl -a /dev/sdc
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               Hitachi 
Product:              HUS724040ALE640 
Revision:             MJAO
User Capacity:        4.000.787.030.016 bytes [4,00 TB]
Logical block size:   512 bytes
Logical Unit id:      0x5000cca22bd08a85
Serial number:              PK2331PAH5D0YT
Device type:          disk
Local Time is:        Thu May  2 15:09:16 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     <not available>

Error Counter logging not supported
Device does not support Self Test logging

smartctl -a /dev/sdd
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               Hitachi 
Product:              HUS724040ALE640 
Revision:             MJAO
User Capacity:        4.000.787.030.016 bytes [4,00 TB]
Logical block size:   512 bytes
Logical Unit id:      0x5000cca22bc3effb
Serial number:              PK2331PAG8NHZT
Device type:          disk
Local Time is:        Thu May  2 15:09:19 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     <not available>

Error Counter logging not supported
Device does not support Self Test logging

smartctl -a /dev/sde
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               Hitachi 
Product:              HUS724040ALE640 
Revision:             MJAO
User Capacity:        4.000.787.030.016 bytes [4,00 TB]
Logical block size:   512 bytes
Logical Unit id:      0x5000cca22bc3ff79
Serial number:              PK2331PAG8TMXT
Device type:          disk
Local Time is:        Thu May  2 15:09:23 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     <not available>

Error Counter logging not supported
Device does not support Self Test logging

smartctl -a /dev/sdf
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               Hitachi 
Product:              HUS724040ALE640 
Revision:             MJAO
User Capacity:        4.000.787.030.016 bytes [4,00 TB]
Logical block size:   512 bytes
^[[3~Logical Unit id:      0x5000cca22bc419ef
Serial number:              PK2331PAG90PET
Device type:          disk
Local Time is:        Thu May  2 15:09:25 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     <not available>

Error Counter logging not supported
Device does not support Self Test logging

smartctl -a /dev/sdg
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               Hitachi 
Product:              HUS724040ALE640 
Revision:             MJAO
User Capacity:        4.000.787.030.016 bytes [4,00 TB]
Logical block size:   512 bytes
Logical Unit id:      0x5000cca22bc2fe51
Serial number:              PK2331PAG6L49T
Device type:          disk
Local Time is:        Thu May  2 15:09:27 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     <not available>

Error Counter logging not supported
Device does not support Self Test logging

smartctl -a /dev/sdh
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               Hitachi 
Product:              HUS724040ALE640 
Revision:             MJAO
User Capacity:        4.000.787.030.016 bytes [4,00 TB]
Logical block size:   512 bytes
Logical Unit id:      0x5000cca22bc2fe2d
Serial number:              PK2331PAG6L34T
Device type:          disk
Local Time is:        Thu May  2 15:09:30 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     <not available>

Error Counter logging not supported
Device does not support Self Test logging
-- 
Best regards,
Stefan Borggraefe
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-02 13:14   ` Stefan Borggraefe
@ 2013-05-02 13:17     ` Mathias Burén
  2013-05-02 13:29       ` Stefan Borggraefe
  0 siblings, 1 reply; 13+ messages in thread
From: Mathias Burén @ 2013-05-02 13:17 UTC (permalink / raw)
  To: Stefan Borggraefe; +Cc: Linux-RAID

On 2 May 2013 14:14, Stefan Borggraefe <stefan@spybot.info> wrote:
> Am Donnerstag, 2. Mai 2013, 13:30:22 schrieb Mathias Burén:
>> I won't scold you for using RAID5 instead of RAID6 with this number of
>> if drives and especially the size of the drives.
>>
>> Could you please post the output of smartctl -a for each device? (from
>> smartmontools)
>>
>> That way we can verify which HDDs are broken, before proceeding.
>>
>> Mathias
>
> Hello Mathias,
>
> RAID6 would have been the safer option clearly, but we needed the
> extra-space and only had this number of drives available.
>
> Here the requested output:
>
> smartctl -a /dev/sdc
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor:               Hitachi
> Product:              HUS724040ALE640
> Revision:             MJAO
> User Capacity:        4.000.787.030.016 bytes [4,00 TB]
> Logical block size:   512 bytes
> Logical Unit id:      0x5000cca22bd08a85
> Serial number:              PK2331PAH5D0YT
> Device type:          disk
> Local Time is:        Thu May  2 15:09:16 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature:     <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
>
> smartctl -a /dev/sdd
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor:               Hitachi
> Product:              HUS724040ALE640
> Revision:             MJAO
> User Capacity:        4.000.787.030.016 bytes [4,00 TB]
> Logical block size:   512 bytes
> Logical Unit id:      0x5000cca22bc3effb
> Serial number:              PK2331PAG8NHZT
> Device type:          disk
> Local Time is:        Thu May  2 15:09:19 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature:     <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
>
> smartctl -a /dev/sde
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor:               Hitachi
> Product:              HUS724040ALE640
> Revision:             MJAO
> User Capacity:        4.000.787.030.016 bytes [4,00 TB]
> Logical block size:   512 bytes
> Logical Unit id:      0x5000cca22bc3ff79
> Serial number:              PK2331PAG8TMXT
> Device type:          disk
> Local Time is:        Thu May  2 15:09:23 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature:     <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
>
> smartctl -a /dev/sdf
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor:               Hitachi
> Product:              HUS724040ALE640
> Revision:             MJAO
> User Capacity:        4.000.787.030.016 bytes [4,00 TB]
> Logical block size:   512 bytes
> ^[[3~Logical Unit id:      0x5000cca22bc419ef
> Serial number:              PK2331PAG90PET
> Device type:          disk
> Local Time is:        Thu May  2 15:09:25 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature:     <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
>
> smartctl -a /dev/sdg
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor:               Hitachi
> Product:              HUS724040ALE640
> Revision:             MJAO
> User Capacity:        4.000.787.030.016 bytes [4,00 TB]
> Logical block size:   512 bytes
> Logical Unit id:      0x5000cca22bc2fe51
> Serial number:              PK2331PAG6L49T
> Device type:          disk
> Local Time is:        Thu May  2 15:09:27 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature:     <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
>
> smartctl -a /dev/sdh
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor:               Hitachi
> Product:              HUS724040ALE640
> Revision:             MJAO
> User Capacity:        4.000.787.030.016 bytes [4,00 TB]
> Logical block size:   512 bytes
> Logical Unit id:      0x5000cca22bc2fe2d
> Serial number:              PK2331PAG6L34T
> Device type:          disk
> Local Time is:        Thu May  2 15:09:30 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature:     <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
> --
> Best regards,
> Stefan Borggraefe


Hm are these behind some controller of sorts? What about smartctl -x ?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-02 13:17     ` Mathias Burén
@ 2013-05-02 13:29       ` Stefan Borggraefe
  2013-05-02 13:49         ` Mathias Burén
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-02 13:29 UTC (permalink / raw)
  To: Mathias Burén; +Cc: Linux-RAID

Am Donnerstag, 2. Mai 2013, 14:17:04 schrieb Mathias Burén:
> On 2 May 2013 14:14, Stefan Borggraefe <stefan@spybot.info> wrote:
> > Am Donnerstag, 2. Mai 2013, 13:30:22 schrieb Mathias Burén:
> >> I won't scold you for using RAID5 instead of RAID6 with this number of
> >> if drives and especially the size of the drives.
> >> 
> >> Could you please post the output of smartctl -a for each device? (from
> >> smartmontools)
> >> 
> >> That way we can verify which HDDs are broken, before proceeding.
> >> 
> >> Mathias
> > 
> > Hello Mathias,
> > 
> > RAID6 would have been the safer option clearly, but we needed the
> > extra-space and only had this number of drives available.
> > 
> > Here the requested output:
> > 
> > smartctl -a /dev/sdc
> > smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local
> > build) Copyright (C) 2002-11 by Bruce Allen,
> > http://smartmontools.sourceforge.net
> > 
> > Vendor:               Hitachi
> > Product:              HUS724040ALE640
> > Revision:             MJAO
> > User Capacity:        4.000.787.030.016 bytes [4,00 TB]
> > Logical block size:   512 bytes
> > Logical Unit id:      0x5000cca22bd08a85
> > Serial number:              PK2331PAH5D0YT
> > Device type:          disk
> > Local Time is:        Thu May  2 15:09:16 2013 CEST
> > Device supports SMART and is Enabled
> > Temperature Warning Disabled or Not Supported
> > SMART Health Status: OK
> > 
> > [...]
> 
> Hm are these behind some controller of sorts? What about smartctl -x ?

We use an Adaptec 71605 controller. smartctl -x does not provide any more
useful information, I suppose. I only post the output of one drive as an
example this time. They all give a similar result.

smartctl -x /dev/sdc
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               Hitachi 
Product:              HUS724040ALE640 
Revision:             MJAO
User Capacity:        4.000.787.030.016 bytes [4,00 TB]
Logical block size:   512 bytes
Logical Unit id:      0x5000cca22bd08a85
Serial number:              PK2331PAH5D0YT
Device type:          disk
Local Time is:        Thu May  2 15:20:55 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     <not available>

Error Counter logging not supported
Device does not support Self Test logging
Device does not support Background scan results logging
scsiPrintSasPhy Log Sense Failed [unsupported field in scsi command]

What about the state of the software RAID5? It would be great if I
could bring it back to a state where the filesystem on it is fully
working again without having to copy the 20 TB of data to it again
(copying this amount of data takes some time :( ).
-- 
Best regards,
Stefan Borggraefe
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-02 13:29       ` Stefan Borggraefe
@ 2013-05-02 13:49         ` Mathias Burén
  2013-05-02 14:17           ` Stefan Borggraefe
  0 siblings, 1 reply; 13+ messages in thread
From: Mathias Burén @ 2013-05-02 13:49 UTC (permalink / raw)
  To: Stefan Borggraefe; +Cc: Linux-RAID

On 2 May 2013 14:29, Stefan Borggraefe <stefan@spybot.info> wrote:
> Am Donnerstag, 2. Mai 2013, 14:17:04 schrieb Mathias Burén:
>> On 2 May 2013 14:14, Stefan Borggraefe <stefan@spybot.info> wrote:
>> > Am Donnerstag, 2. Mai 2013, 13:30:22 schrieb Mathias Burén:
>> >> I won't scold you for using RAID5 instead of RAID6 with this number of
>> >> if drives and especially the size of the drives.
>> >>
>> >> Could you please post the output of smartctl -a for each device? (from
>> >> smartmontools)
>> >>
>> >> That way we can verify which HDDs are broken, before proceeding.
>> >>
>> >> Mathias
>> >
>> > Hello Mathias,
>> >
>> > RAID6 would have been the safer option clearly, but we needed the
>> > extra-space and only had this number of drives available.
>> >
>> > Here the requested output:
>> >
>> > smartctl -a /dev/sdc
>> > smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local
>> > build) Copyright (C) 2002-11 by Bruce Allen,
>> > http://smartmontools.sourceforge.net
>> >
>> > Vendor:               Hitachi
>> > Product:              HUS724040ALE640
>> > Revision:             MJAO
>> > User Capacity:        4.000.787.030.016 bytes [4,00 TB]
>> > Logical block size:   512 bytes
>> > Logical Unit id:      0x5000cca22bd08a85
>> > Serial number:              PK2331PAH5D0YT
>> > Device type:          disk
>> > Local Time is:        Thu May  2 15:09:16 2013 CEST
>> > Device supports SMART and is Enabled
>> > Temperature Warning Disabled or Not Supported
>> > SMART Health Status: OK
>> >
>> > [...]
>>
>> Hm are these behind some controller of sorts? What about smartctl -x ?
>
> We use an Adaptec 71605 controller. smartctl -x does not provide any more
> useful information, I suppose. I only post the output of one drive as an
> example this time. They all give a similar result.
>
> smartctl -x /dev/sdc
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor:               Hitachi
> Product:              HUS724040ALE640
> Revision:             MJAO
> User Capacity:        4.000.787.030.016 bytes [4,00 TB]
> Logical block size:   512 bytes
> Logical Unit id:      0x5000cca22bd08a85
> Serial number:              PK2331PAH5D0YT
> Device type:          disk
> Local Time is:        Thu May  2 15:20:55 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature:     <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
> Device does not support Background scan results logging
> scsiPrintSasPhy Log Sense Failed [unsupported field in scsi command]
>
> What about the state of the software RAID5? It would be great if I
> could bring it back to a state where the filesystem on it is fully
> working again without having to copy the 20 TB of data to it again
> (copying this amount of data takes some time :( ).
> --
> Best regards,
> Stefan Borggraefe


Ugh, Adaptec, not my favourite controller. Do you have arcconf
installed? You could to arcconf getconfig 1 (or whatever your
controller number is) to grab some information regarding your
controller and the HDDs connected to it.

Do you have /dev/sg? devices? If so, smartctl -a might work on them.
Re the software RAID, I would double check the health of your HDDs
before attempting anything.

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-02 13:49         ` Mathias Burén
@ 2013-05-02 14:17           ` Stefan Borggraefe
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-02 14:17 UTC (permalink / raw)
  To: Mathias Burén; +Cc: Linux-RAID

Am Donnerstag, 2. Mai 2013, 14:49:39 schrieb Mathias Burén:
> Ugh, Adaptec, not my favourite controller. Do you have arcconf
> installed? You could to arcconf getconfig 1 (or whatever your
> controller number is) to grab some information regarding your
> controller and the HDDs connected to it.

arcconf getconfig 1
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
   Controller Status                        : Inaccessible

----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
   No logical devices configured

----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------


Command completed successfully.

Could this be because we do not use the hardware RAID features of
the controller?
 
> Do you have /dev/sg? devices? If so, smartctl -a might work on them.

Yes, but using smartctl -a/-x on these devices gives the same output as
using it on the /dev/sd? devices. :(

> Re the software RAID, I would double check the health of your HDDs
> before attempting anything.

Ok, this makes sense.
-- 
Best regards,
Stefan Borggraefe

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-02 12:24 Help with recovering a RAID5 array Stefan Borggraefe
  2013-05-02 12:30 ` Mathias Burén
@ 2013-05-03  8:38 ` Ole Tange
  2013-05-04 11:13   ` Stefan Borggraefe
  1 sibling, 1 reply; 13+ messages in thread
From: Ole Tange @ 2013-05-03  8:38 UTC (permalink / raw)
  To: Stefan Borggraefe; +Cc: linux-raid

On Thu, May 2, 2013 at 2:24 PM, Stefan Borggraefe <stefan@spybot.info> wrote:

> I am using a RAID5 software RAID on Ubuntu 12.04
:
> It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file system.
>
> When I returned to this server this morning, the array was in the following
> state:
>
> md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
>       19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> [U_U_UU]
>
> sdc is the newly added hard disk, but now also sdd failed. :( It would be
> great if there was a way to have the this RAID5 working again. Perhaps sdc1
> can then be fully added to the array and after this drive sdd also exchanged.

I have had a few raid6 fail in a similar fashion: the 3rd drive
faliing during rebuild (Also 4 TB Hitachi by the way).

I tested if the drives were fine:

  parallel dd if={} of=/dev/null bs=1000k ::: /dev/sd?

And they were all fine. If the failing drive had actually failed (i.e.
bad sector), then I would use GNU ddrescue to copy the failing drive
to a new drive. ddrescue can read forwards on a drive, but can also
read backwards. Even though backwards reading is slower, you can use
that to approach the failing sector from "the other side". This way
you can often get down to very few actually failing sectors.

With only a few failing sectors (if any) I figured that very little
would be lost by forcing the failing drive online. Remove the spare
drive, and force the remaining online:

  mdadm -A --scan --force

This should not cause any rebuild to happen as you have removed the spare.

See: http://serverfault.com/questions/443763/linux-software-raid6-3-drives-offline-how-to-force-online

Next step is to do fsck. Since fsck will write to the disk (and thus
be impossible to revert from) I put an overlay on the md-device, so
that nothing was written to the disks - instead changes were simply
written to a file.

See: http://unix.stackexchange.com/questions/67678/gnu-linux-overlay-block-device-stackable-block-device

This overlayed device I then ran fsck on. Then I checked everything
was OK. When everything was OK, I removed the overlay and did the fsck
on the real drives.

Thinking back it might even have made sense to overlay every
underlying block device, thus ensuring that nothing (not even the
md-driver) wrote anything to the devices before I as ready to commit.

/Ole

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-03  8:38 ` Ole Tange
@ 2013-05-04 11:13   ` Stefan Borggraefe
  2013-05-06  6:31     ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-04 11:13 UTC (permalink / raw)
  To: Ole Tange; +Cc: linux-raid

Am Freitag, 3. Mai 2013, 10:38:52 schrieben Sie:
> On Thu, May 2, 2013 at 2:24 PM, Stefan Borggraefe <stefan@spybot.info> 
wrote:
> > I am using a RAID5 software RAID on Ubuntu 12.04
> > 
> > It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file
> > system.
> > 
> > When I returned to this server this morning, the array was in the
> > following
> > state:
> > 
> > md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
> > 
> >       19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> > 
> > [U_U_UU]
> > 
> > sdc is the newly added hard disk, but now also sdd failed. :( It would be
> > great if there was a way to have the this RAID5 working again. Perhaps
> > sdc1
> > can then be fully added to the array and after this drive sdd also
> > exchanged.
> I have had a few raid6 fail in a similar fashion: the 3rd drive
> faliing during rebuild (Also 4 TB Hitachi by the way).
> 
> I tested if the drives were fine:
> 
>   parallel dd if={} of=/dev/null bs=1000k ::: /dev/sd?
> 
> And they were all fine. 

Same for me.

> With only a few failing sectors (if any) I figured that very little
> would be lost by forcing the failing drive online. Remove the spare
> drive, and force the remaining online:
> 
>   mdadm -A --scan --force

I removed the spare /dev/sdc1 from /dev/md126

with

mdadm /dev/md126 --remove /dev/sdc1

After mdadm -A --scan --force the array is now in this state

md126 : active raid5 sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
      19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4] 
[U_U_UU]
 
> Next step is to do fsck.

I think this is not possible yet at this point. Don't I need to reassemble the 
array using the --assume-clean option and with one missing drive first? Some 
step is missing here.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-04 11:13   ` Stefan Borggraefe
@ 2013-05-06  6:31     ` NeilBrown
  2013-05-06  8:12       ` Stefan Borggraefe
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2013-05-06  6:31 UTC (permalink / raw)
  To: Stefan Borggraefe; +Cc: Ole Tange, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2383 bytes --]

On Sat, 04 May 2013 13:13:27 +0200 Stefan Borggraefe <stefan@spybot.info>
wrote:

> Am Freitag, 3. Mai 2013, 10:38:52 schrieben Sie:
> > On Thu, May 2, 2013 at 2:24 PM, Stefan Borggraefe <stefan@spybot.info> 
> wrote:
> > > I am using a RAID5 software RAID on Ubuntu 12.04
> > > 
> > > It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file
> > > system.
> > > 
> > > When I returned to this server this morning, the array was in the
> > > following
> > > state:
> > > 
> > > md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
> > > 
> > >       19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> > > 
> > > [U_U_UU]
> > > 
> > > sdc is the newly added hard disk, but now also sdd failed. :( It would be
> > > great if there was a way to have the this RAID5 working again. Perhaps
> > > sdc1
> > > can then be fully added to the array and after this drive sdd also
> > > exchanged.
> > I have had a few raid6 fail in a similar fashion: the 3rd drive
> > faliing during rebuild (Also 4 TB Hitachi by the way).
> > 
> > I tested if the drives were fine:
> > 
> >   parallel dd if={} of=/dev/null bs=1000k ::: /dev/sd?
> > 
> > And they were all fine. 
> 
> Same for me.
> 
> > With only a few failing sectors (if any) I figured that very little
> > would be lost by forcing the failing drive online. Remove the spare
> > drive, and force the remaining online:
> > 
> >   mdadm -A --scan --force
> 
> I removed the spare /dev/sdc1 from /dev/md126
> 
> with
> 
> mdadm /dev/md126 --remove /dev/sdc1
> 
> After mdadm -A --scan --force the array is now in this state
> 
> md126 : active raid5 sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
>       19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4] 
> [U_U_UU]

Did you stop the array first?
  i.e.
    mdadm --stop /dev/md126
    mdadm -Asfvv

NeilBrown


>  
> > Next step is to do fsck.
> 
> I think this is not possible yet at this point. Don't I need to reassemble the 
> array using the --assume-clean option and with one missing drive first? Some 
> step is missing here.
> 
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-06  6:31     ` NeilBrown
@ 2013-05-06  8:12       ` Stefan Borggraefe
  2013-05-10 10:14         ` Stefan Borggraefe
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-06  8:12 UTC (permalink / raw)
  To: NeilBrown, linux-raid

Am Montag, 6. Mai 2013, 16:31:02 schrieb NeilBrown:
> On Sat, 04 May 2013 13:13:27 +0200 Stefan Borggraefe <stefan@spybot.info>
> 
> wrote:
> > Am Freitag, 3. Mai 2013, 10:38:52 schrieben Sie:
> > > On Thu, May 2, 2013 at 2:24 PM, Stefan Borggraefe <stefan@spybot.info>
> > 
> > wrote:
> > > > I am using a RAID5 software RAID on Ubuntu 12.04
> > > > 
> > > > It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file
> > > > system.
> > > > 
> > > > When I returned to this server this morning, the array was in the
> > > > following
> > > > state:
> > > > 
> > > > md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6]
> > > > sdf1[2]
> > > > 
> > > >       19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2
> > > >       [6/4]
> > > > 
> > > > [U_U_UU]
> > > > 
> > > > sdc is the newly added hard disk, but now also sdd failed. :( It would
> > > > be
> > > > great if there was a way to have the this RAID5 working again. Perhaps
> > > > sdc1
> > > > can then be fully added to the array and after this drive sdd also
> > > > exchanged.
> > > 
> > > I have had a few raid6 fail in a similar fashion: the 3rd drive
> > > faliing during rebuild (Also 4 TB Hitachi by the way).
> > > 
> > > I tested if the drives were fine:
> > >   parallel dd if={} of=/dev/null bs=1000k ::: /dev/sd?
> > > 
> > > And they were all fine.
> > 
> > Same for me.
> > 
> > > With only a few failing sectors (if any) I figured that very little
> > > would be lost by forcing the failing drive online. Remove the spare
> > > 
> > > drive, and force the remaining online:
> > >   mdadm -A --scan --force
> > 
> > I removed the spare /dev/sdc1 from /dev/md126
> > 
> > with
> > 
> > mdadm /dev/md126 --remove /dev/sdc1
> > 
> > After mdadm -A --scan --force the array is now in this state
> > 
> > md126 : active raid5 sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
> > 
> >       19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> > 
> > [U_U_UU]
> 
> Did you stop the array first?
>   i.e.
>     mdadm --stop /dev/md126
>     mdadm -Asfvv
> 
> NeilBrown

Thank you Neil, yes this was my mistake. I realised it in the meantime and am 
currently checking the file system using overlay files as suggested in

https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Force_assembly
-- 
Best regards,
Stefan Borggraefe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-06  8:12       ` Stefan Borggraefe
@ 2013-05-10 10:14         ` Stefan Borggraefe
  2013-05-10 10:48           ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-10 10:14 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown, Ole Tange

Am Montag, 6. Mai 2013, 10:12:42 schrieb Stefan Borggraefe:
> Am Montag, 6. Mai 2013, 16:31:02 schrieb NeilBrown:
> > Did you stop the array first?
> > 
> >   i.e.
> >   
> >     mdadm --stop /dev/md126
> >     mdadm -Asfvv
> > 
> > NeilBrown
> 
> Thank you Neil, yes this was my mistake. I realised it in the meantime and
> am currently checking the file system using overlay files as suggested in
> 
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID

fsck.ext4 -y -C 0 /dev/md126

ended with

[7543439.065625] Pid: 13706, comm: fsck.ext4 Tainted: G        W  O 3.2.0-37-
generic #58-Ubuntu
[7543439.065627] Call Trace:
[7543439.065629]  [<ffffffff81067f0f>] warn_slowpath_common+0x7f/0xc0
[7543439.065631]  [<ffffffff81067f6a>] warn_slowpath_null+0x1a/0x20
[7543439.065634]  [<ffffffffa00aea35>] init_stripe+0x245/0x270 [raid456]
[7543439.065637]  [<ffffffffa00b2772>] get_active_stripe+0x3a2/0x3c0 [raid456]
[7543439.065639]  [<ffffffff814e67d9>] ? mddev_bio_destructor+0x19/0x20
[7543439.065641]  [<ffffffff8108bbc0>] ? prepare_to_wait+0x60/0x90
[7543439.065644]  [<ffffffffa00b5b94>] make_request+0x194/0x430 [raid456]
[7543439.065646]  [<ffffffff813030d6>] ? throtl_find_tg+0x46/0x60
[7543439.065647]  [<ffffffff8108bd20>] ? add_wait_queue+0x60/0x60
[7543439.065650]  [<ffffffff814e5f10>] md_make_request+0xd0/0x200
[7543439.065652]  [<ffffffff8111b145>] ? mempool_alloc_slab+0x15/0x20
[7543439.065654]  [<ffffffff812f2264>] generic_make_request.part.50+0x74/0xb0
[7543439.065656]  [<ffffffff812f2678>] generic_make_request+0x68/0x70
[7543439.065658]  [<ffffffff812f2705>] submit_bio+0x85/0x110
[7543439.065660]  [<ffffffff811afa2a>] ? bio_alloc_bioset+0x5a/0xf0
[7543439.065662]  [<ffffffff811a997b>] submit_bh+0xeb/0x120
[7543439.065664]  [<ffffffff811ac615>] block_read_full_page+0x225/0x390
[7543439.065666]  [<ffffffff811b1700>] ? blkdev_get_blocks+0xd0/0xd0
[7543439.065668]  [<ffffffff811190e5>] ? add_to_page_cache_locked+0x85/0xa0
[7543439.065670]  [<ffffffff811b0dc8>] blkdev_readpage+0x18/0x20
[7543439.065672]  [<ffffffff8111967d>] 
do_generic_file_read.constprop.33+0x10d/0x440
[7543439.065675]  [<ffffffff8111a74f>] generic_file_aio_read+0xef/0x280
[7543439.065677]  [<ffffffff813bb14e>] ? tty_wakeup+0x3e/0x80
[7543439.065679]  [<ffffffff81178d8a>] do_sync_read+0xda/0x120
[7543439.065681]  [<ffffffff8129ee13>] ? security_file_permission+0x93/0xb0
[7543439.065683]  [<ffffffff81179211>] ? rw_verify_area+0x61/0xf0
[7543439.065684]  [<ffffffff811796f0>] vfs_read+0xb0/0x180
[7543439.065686]  [<ffffffff8117980a>] sys_read+0x4a/0x90
[7543439.065688]  [<ffffffff81665842>] system_call_fastpath+0x16/0x1b
[7543439.065689] ---[ end trace cce84f1e6de88596 ]---
[7543441.630145] quiet_error: 230464 callbacks suppressed

I think I will give up here and just start again with a clean RAID6 to copy 
the data onto anew.
-- 
Best regards,
Stefan Borggraefe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help with recovering a RAID5 array
  2013-05-10 10:14         ` Stefan Borggraefe
@ 2013-05-10 10:48           ` NeilBrown
  0 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2013-05-10 10:48 UTC (permalink / raw)
  To: Stefan Borggraefe; +Cc: linux-raid, Ole Tange

[-- Attachment #1: Type: text/plain, Size: 3237 bytes --]

On Fri, 10 May 2013 12:14:36 +0200 Stefan Borggraefe <stefan@spybot.info>
wrote:

> Am Montag, 6. Mai 2013, 10:12:42 schrieb Stefan Borggraefe:
> > Am Montag, 6. Mai 2013, 16:31:02 schrieb NeilBrown:
> > > Did you stop the array first?
> > > 
> > >   i.e.
> > >   
> > >     mdadm --stop /dev/md126
> > >     mdadm -Asfvv
> > > 
> > > NeilBrown
> > 
> > Thank you Neil, yes this was my mistake. I realised it in the meantime and
> > am currently checking the file system using overlay files as suggested in
> > 
> > https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
> 
> fsck.ext4 -y -C 0 /dev/md126
> 
> ended with
> 
> [7543439.065625] Pid: 13706, comm: fsck.ext4 Tainted: G        W  O 3.2.0-37-
> generic #58-Ubuntu
> [7543439.065627] Call Trace:
> [7543439.065629]  [<ffffffff81067f0f>] warn_slowpath_common+0x7f/0xc0
> [7543439.065631]  [<ffffffff81067f6a>] warn_slowpath_null+0x1a/0x20
> [7543439.065634]  [<ffffffffa00aea35>] init_stripe+0x245/0x270 [raid456]
> [7543439.065637]  [<ffffffffa00b2772>] get_active_stripe+0x3a2/0x3c0 [raid456]
> [7543439.065639]  [<ffffffff814e67d9>] ? mddev_bio_destructor+0x19/0x20
> [7543439.065641]  [<ffffffff8108bbc0>] ? prepare_to_wait+0x60/0x90
> [7543439.065644]  [<ffffffffa00b5b94>] make_request+0x194/0x430 [raid456]
> [7543439.065646]  [<ffffffff813030d6>] ? throtl_find_tg+0x46/0x60
> [7543439.065647]  [<ffffffff8108bd20>] ? add_wait_queue+0x60/0x60
> [7543439.065650]  [<ffffffff814e5f10>] md_make_request+0xd0/0x200
> [7543439.065652]  [<ffffffff8111b145>] ? mempool_alloc_slab+0x15/0x20
> [7543439.065654]  [<ffffffff812f2264>] generic_make_request.part.50+0x74/0xb0
> [7543439.065656]  [<ffffffff812f2678>] generic_make_request+0x68/0x70
> [7543439.065658]  [<ffffffff812f2705>] submit_bio+0x85/0x110
> [7543439.065660]  [<ffffffff811afa2a>] ? bio_alloc_bioset+0x5a/0xf0
> [7543439.065662]  [<ffffffff811a997b>] submit_bh+0xeb/0x120
> [7543439.065664]  [<ffffffff811ac615>] block_read_full_page+0x225/0x390
> [7543439.065666]  [<ffffffff811b1700>] ? blkdev_get_blocks+0xd0/0xd0
> [7543439.065668]  [<ffffffff811190e5>] ? add_to_page_cache_locked+0x85/0xa0
> [7543439.065670]  [<ffffffff811b0dc8>] blkdev_readpage+0x18/0x20
> [7543439.065672]  [<ffffffff8111967d>] 
> do_generic_file_read.constprop.33+0x10d/0x440
> [7543439.065675]  [<ffffffff8111a74f>] generic_file_aio_read+0xef/0x280
> [7543439.065677]  [<ffffffff813bb14e>] ? tty_wakeup+0x3e/0x80
> [7543439.065679]  [<ffffffff81178d8a>] do_sync_read+0xda/0x120
> [7543439.065681]  [<ffffffff8129ee13>] ? security_file_permission+0x93/0xb0
> [7543439.065683]  [<ffffffff81179211>] ? rw_verify_area+0x61/0xf0
> [7543439.065684]  [<ffffffff811796f0>] vfs_read+0xb0/0x180
> [7543439.065686]  [<ffffffff8117980a>] sys_read+0x4a/0x90
> [7543439.065688]  [<ffffffff81665842>] system_call_fastpath+0x16/0x1b
> [7543439.065689] ---[ end trace cce84f1e6de88596 ]---
> [7543441.630145] quiet_error: 230464 callbacks suppressed
> 
> I think I will give up here and just start again with a clean RAID6 to copy 
> the data onto anew.


You've missed some important context there.  What are the dozen or so lines
around this in the logs?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-05-10 10:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-02 12:24 Help with recovering a RAID5 array Stefan Borggraefe
2013-05-02 12:30 ` Mathias Burén
2013-05-02 13:14   ` Stefan Borggraefe
2013-05-02 13:17     ` Mathias Burén
2013-05-02 13:29       ` Stefan Borggraefe
2013-05-02 13:49         ` Mathias Burén
2013-05-02 14:17           ` Stefan Borggraefe
2013-05-03  8:38 ` Ole Tange
2013-05-04 11:13   ` Stefan Borggraefe
2013-05-06  6:31     ` NeilBrown
2013-05-06  8:12       ` Stefan Borggraefe
2013-05-10 10:14         ` Stefan Borggraefe
2013-05-10 10:48           ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).