* Help with recovering a RAID5 array
@ 2013-05-02 12:24 Stefan Borggraefe
2013-05-02 12:30 ` Mathias Burén
2013-05-03 8:38 ` Ole Tange
0 siblings, 2 replies; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-02 12:24 UTC (permalink / raw)
To: linux-raid
Hi,
I am using a RAID5 software RAID on Ubuntu 12.04 (kernel
3.2.0-37-generic x86_64).
It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file system.
There are no spare devices.
Yesterday evening I exchanged a drive that showed SMART errors and the
array started rebuilding its redundancy normally.
When I returned to this server this morning, the array was in the following
state:
md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
[U_U_UU]
sdc is the newly added hard disk, but now also sdd failed. :( It would be
great if there was a way to have the this RAID5 working again. Perhaps sdc1
can then be fully added to the array and after this drive sdd also exchanged.
I have not started experimenting or changing this array in any way, but wanted
to ask here for assistance first. Thank you for your help!
mdadm --examine /dev/sd[cdegfh]1 | egrep 'Event|/dev/sd'
shows
/dev/sdc1:
Events : 494
/dev/sdd1:
Events : 478
/dev/sde1:
Events : 494
/dev/sdf1:
Events : 494
/dev/sdg1:
Events : 494
/dev/sdh1:
Events : 494
mdadm --examine /dev/sd[cdegfh]1
showsThank you for your help! :)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 13051471:fba5785f:4365dea1:0670be37
Name : teraturm:2 (local to host teraturm)
Creation Time : Tue Feb 5 14:23:06 2013
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 7433213e:0dd2e5ed:073dd59d:bf1f83d8
Update Time : Tue Apr 30 10:06:55 2013
Checksum : 9e83f72 - correct
Events : 494
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 13051471:fba5785f:4365dea1:0670be37
Name : teraturm:2 (local to host teraturm)
Creation Time : Tue Feb 5 14:23:06 2013
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : c2e5423f:6d91a061:c3f55aa7:6d1cec87
Update Time : Mon Apr 29 17:24:26 2013
Checksum : 37b97776 - correct
Events : 478
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 13051471:fba5785f:4365dea1:0670be37
Name : teraturm:2 (local to host teraturm)
Creation Time : Tue Feb 5 14:23:06 2013
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 68207885:02c05297:8ef62633:65b83839
Update Time : Tue Apr 30 10:06:55 2013
Checksum : f0b36c7f - correct
Events : 494
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 13051471:fba5785f:4365dea1:0670be37
Name : teraturm:2 (local to host teraturm)
Creation Time : Tue Feb 5 14:23:06 2013
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 7d328a98:6c02f550:ab1837c0:cb773ac1
Update Time : Tue Apr 30 10:06:55 2013
Checksum : d2799f34 - correct
Events : 494
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 13051471:fba5785f:4365dea1:0670be37
Name : teraturm:2 (local to host teraturm)
Creation Time : Tue Feb 5 14:23:06 2013
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 76b683b1:58e053ff:57ac0cfc:be114f75
Update Time : Tue Apr 30 10:06:55 2013
Checksum : 89bc2e05 - correct
Events : 494
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 13051471:fba5785f:4365dea1:0670be37
Name : teraturm:2 (local to host teraturm)
Creation Time : Tue Feb 5 14:23:06 2013
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3c88705f:9f3add0e:d58d46a7:b40d02d7
Update Time : Tue Apr 30 10:06:55 2013
Checksum : 541f3913 - correct
Events : 494
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : A.A.AA ('A' == active, '.' == missing)
This is the dmesg output from when the failure happened:
[6669459.855352] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855362] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855368] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 2a 00 00 08
00
[6669459.855387] end_request: I/O error, dev sdd, sector 590910506
[6669459.855456] raid5_end_read_request: 14 callbacks suppressed
[6669459.855463] md/raid:md126: read error not correctable (sector 590910472
on sdd1).
[6669459.855490] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855496] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855501] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 32 00 00 08
00
[6669459.855515] end_request: I/O error, dev sdd, sector 590910514
[6669459.855594] md/raid:md126: read error not correctable (sector 590910480
on sdd1).
[6669459.855608] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855611] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855620] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 3a 00 00 08
00
[6669459.855648] end_request: I/O error, dev sdd, sector 590910522
[6669459.855710] md/raid:md126: read error not correctable (sector 590910488
on sdd1).
[6669459.855720] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855723] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855727] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 42 00 00 08
00
[6669459.855737] end_request: I/O error, dev sdd, sector 590910530
[6669459.855796] md/raid:md126: read error not correctable (sector 590910496
on sdd1).
[6669459.855814] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855817] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855821] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 4a 00 00 08
00
[6669459.855831] end_request: I/O error, dev sdd, sector 590910538
[6669459.855889] md/raid:md126: read error not correctable (sector 590910504
on sdd1).
[6669459.855907] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855910] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855914] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 52 00 00 08
00
[6669459.855924] end_request: I/O error, dev sdd, sector 590910546
[6669459.855982] md/raid:md126: read error not correctable (sector 590910512
on sdd1).
[6669459.855990] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855992] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855996] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 5a 00 00 08
00
[6669459.856004] end_request: I/O error, dev sdd, sector 590910554
[6669459.856062] md/raid:md126: read error not correctable (sector 590910520
on sdd1).
[6669459.856072] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856075] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856079] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 62 00 00 08
00
[6669459.856088] end_request: I/O error, dev sdd, sector 590910562
[6669459.856153] md/raid:md126: read error not correctable (sector 590910528
on sdd1).
[6669459.856171] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856174] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 6a 00 00 08
00
[6669459.856188] end_request: I/O error, dev sdd, sector 590910570
[6669459.856256] md/raid:md126: read error not correctable (sector 590910536
on sdd1).
[6669459.856265] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856268] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856272] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 72 00 00 08
00
[6669459.856281] end_request: I/O error, dev sdd, sector 590910578
[6669459.856346] md/raid:md126: read error not correctable (sector 590910544
on sdd1).
[6669459.856364] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856368] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856374] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 7a 00 00 08
00
[6669459.856385] end_request: I/O error, dev sdd, sector 590910586
[6669459.856445] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856449] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856456] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 82 00 00 08
00
[6669459.856466] end_request: I/O error, dev sdd, sector 590910594
[6669459.856526] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856530] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856537] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 8a 00 00 08
00
[6669459.856547] end_request: I/O error, dev sdd, sector 590910602
[6669459.856607] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856611] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856617] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 92 00 00 08
00
[6669459.856628] end_request: I/O error, dev sdd, sector 590910610
[6669459.856687] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856691] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856697] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 9a 00 00 08
00
[6669459.856707] end_request: I/O error, dev sdd, sector 590910618
[6669459.856767] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856772] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856778] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 a2 00 00 08
00
[6669459.856788] end_request: I/O error, dev sdd, sector 590910626
[6669459.856847] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856851] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856859] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 aa 00 00 08
00
[6669459.856869] end_request: I/O error, dev sdd, sector 590910634
[6669459.856928] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856932] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856938] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 b2 00 00 08
00
[6669459.856949] end_request: I/O error, dev sdd, sector 590910642
[6669459.857008] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857011] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857018] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ba 00 00 08
00
[6669459.857028] end_request: I/O error, dev sdd, sector 590910650
[6669459.857088] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857092] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857098] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 c2 00 00 08
00
[6669459.857109] end_request: I/O error, dev sdd, sector 590910658
[6669459.857168] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857171] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ca 00 00 08
00
[6669459.857188] end_request: I/O error, dev sdd, sector 590910666
[6669459.857248] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857251] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857258] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 d2 00 00 08
00
[6669459.857269] end_request: I/O error, dev sdd, sector 590910674
[6669459.857328] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857333] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857339] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 da 00 00 08
00
[6669459.857349] end_request: I/O error, dev sdd, sector 590910682
[6669459.857408] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857412] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857418] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 e2 00 00 08
00
[6669459.857429] end_request: I/O error, dev sdd, sector 590910690
[6669459.857488] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857492] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857499] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 93 4a 00 00 08
00
[6669459.857509] end_request: I/O error, dev sdd, sector 590910282
[6669459.857569] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857573] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857579] sd 6:1:10:0: [sdd] CDB:
[6669459.857585] aacraid: Host adapter abort request (6,1,10,0)
[6669459.857639] Read(10): 28 00 23 38 93 42 00 00 08 00
[6669459.857648] end_request: I/O error, dev sdd, sector 590910274
[6669459.857844] aacraid: Host adapter reset request. SCSI hang ?
[6669470.028090] RAID conf printout:
[6669470.028097] --- level:5 rd:6 wd:4
[6669470.028101] disk 0, o:1, dev:sde1
[6669470.028105] disk 1, o:1, dev:sdc1
[6669470.028109] disk 2, o:1, dev:sdf1
[6669470.028112] disk 3, o:0, dev:sdd1
[6669470.028115] disk 4, o:1, dev:sdh1
[6669470.028118] disk 5, o:1, dev:sdg1
[6669470.034462] RAID conf printout:
[6669470.034464] --- level:5 rd:6 wd:4
[6669470.034465] disk 0, o:1, dev:sde1
[6669470.034466] disk 2, o:1, dev:sdf1
[6669470.034467] disk 3, o:0, dev:sdd1
[6669470.034468] disk 4, o:1, dev:sdh1
[6669470.034469] disk 5, o:1, dev:sdg1
[6669470.034484] RAID conf printout:
[6669470.034486] --- level:5 rd:6 wd:4
[6669470.034489] disk 0, o:1, dev:sde1
[6669470.034491] disk 2, o:1, dev:sdf1
[6669470.034494] disk 3, o:0, dev:sdd1
[6669470.034496] disk 4, o:1, dev:sdh1
[6669470.034499] disk 5, o:1, dev:sdg1
[6669470.034571] RAID conf printout:
[6669470.034577] --- level:5 rd:6 wd:4
[6669470.034581] disk 0, o:1, dev:sde1
[6669470.034584] disk 2, o:1, dev:sdf1
[6669470.034587] disk 4, o:1, dev:sdh1
[6669470.034589] disk 5, o:1, dev:sdg1
Please let me know if you need any more information.
--
Best regards,
Stefan Borggraefe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-02 12:24 Help with recovering a RAID5 array Stefan Borggraefe
@ 2013-05-02 12:30 ` Mathias Burén
2013-05-02 13:14 ` Stefan Borggraefe
2013-05-03 8:38 ` Ole Tange
1 sibling, 1 reply; 13+ messages in thread
From: Mathias Burén @ 2013-05-02 12:30 UTC (permalink / raw)
To: Stefan Borggraefe; +Cc: Linux-RAID
On 2 May 2013 13:24, Stefan Borggraefe <stefan@spybot.info> wrote:
> Hi,
>
> I am using a RAID5 software RAID on Ubuntu 12.04 (kernel
> 3.2.0-37-generic x86_64).
>
> It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file system.
> There are no spare devices.
>
> Yesterday evening I exchanged a drive that showed SMART errors and the
> array started rebuilding its redundancy normally.
>
> When I returned to this server this morning, the array was in the following
> state:
>
> md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
> 19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> [U_U_UU]
>
> sdc is the newly added hard disk, but now also sdd failed. :( It would be
> great if there was a way to have the this RAID5 working again. Perhaps sdc1
> can then be fully added to the array and after this drive sdd also exchanged.
>
> I have not started experimenting or changing this array in any way, but wanted
> to ask here for assistance first. Thank you for your help!
>
> mdadm --examine /dev/sd[cdegfh]1 | egrep 'Event|/dev/sd'
>
> shows
>
> /dev/sdc1:
> Events : 494
> /dev/sdd1:
> Events : 478
> /dev/sde1:
> Events : 494
> /dev/sdf1:
> Events : 494
> /dev/sdg1:
> Events : 494
> /dev/sdh1:
> Events : 494
>
>
>
> mdadm --examine /dev/sd[cdegfh]1
>
> showsThank you for your help! :)
>
> /dev/sdc1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 13051471:fba5785f:4365dea1:0670be37
> Name : teraturm:2 (local to host teraturm)
> Creation Time : Tue Feb 5 14:23:06 2013
> Raid Level : raid5
> Raid Devices : 6
>
> Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
> Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
> Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 7433213e:0dd2e5ed:073dd59d:bf1f83d8
>
> Update Time : Tue Apr 30 10:06:55 2013
> Checksum : 9e83f72 - correct
> Events : 494
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : spare
> Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdd1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 13051471:fba5785f:4365dea1:0670be37
> Name : teraturm:2 (local to host teraturm)
> Creation Time : Tue Feb 5 14:23:06 2013
> Raid Level : raid5
> Raid Devices : 6
>
> Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
> Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
> Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : c2e5423f:6d91a061:c3f55aa7:6d1cec87
>
> Update Time : Mon Apr 29 17:24:26 2013
> Checksum : 37b97776 - correct
> Events : 478
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 3
> Array State : AAAAAA ('A' == active, '.' == missing)
> /dev/sde1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 13051471:fba5785f:4365dea1:0670be37
> Name : teraturm:2 (local to host teraturm)
> Creation Time : Tue Feb 5 14:23:06 2013
> Raid Level : raid5
> Raid Devices : 6
>
> Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
> Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
> Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 68207885:02c05297:8ef62633:65b83839
>
> Update Time : Tue Apr 30 10:06:55 2013
> Checksum : f0b36c7f - correct
> Events : 494
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 0
> Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdf1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 13051471:fba5785f:4365dea1:0670be37
> Name : teraturm:2 (local to host teraturm)
> Creation Time : Tue Feb 5 14:23:06 2013
> Raid Level : raid5
> Raid Devices : 6
>
> Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
> Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
> Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 7d328a98:6c02f550:ab1837c0:cb773ac1
>
> Update Time : Tue Apr 30 10:06:55 2013
> Checksum : d2799f34 - correct
> Events : 494
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 2
> Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdg1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 13051471:fba5785f:4365dea1:0670be37
> Name : teraturm:2 (local to host teraturm)
> Creation Time : Tue Feb 5 14:23:06 2013
> Raid Level : raid5
> Raid Devices : 6
>
> Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
> Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
> Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 76b683b1:58e053ff:57ac0cfc:be114f75
>
> Update Time : Tue Apr 30 10:06:55 2013
> Checksum : 89bc2e05 - correct
> Events : 494
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 5
> Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdh1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 13051471:fba5785f:4365dea1:0670be37
> Name : teraturm:2 (local to host teraturm)
> Creation Time : Tue Feb 5 14:23:06 2013
> Raid Level : raid5
> Raid Devices : 6
>
> Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
> Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
> Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 3c88705f:9f3add0e:d58d46a7:b40d02d7
>
> Update Time : Tue Apr 30 10:06:55 2013
> Checksum : 541f3913 - correct
> Events : 494
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 4
> Array State : A.A.AA ('A' == active, '.' == missing)
>
> This is the dmesg output from when the failure happened:
>
> [6669459.855352] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855362] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855368] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 2a 00 00 08
> 00
> [6669459.855387] end_request: I/O error, dev sdd, sector 590910506
> [6669459.855456] raid5_end_read_request: 14 callbacks suppressed
> [6669459.855463] md/raid:md126: read error not correctable (sector 590910472
> on sdd1).
> [6669459.855490] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855496] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855501] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 32 00 00 08
> 00
> [6669459.855515] end_request: I/O error, dev sdd, sector 590910514
> [6669459.855594] md/raid:md126: read error not correctable (sector 590910480
> on sdd1).
> [6669459.855608] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855611] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855620] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 3a 00 00 08
> 00
> [6669459.855648] end_request: I/O error, dev sdd, sector 590910522
> [6669459.855710] md/raid:md126: read error not correctable (sector 590910488
> on sdd1).
> [6669459.855720] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855723] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855727] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 42 00 00 08
> 00
> [6669459.855737] end_request: I/O error, dev sdd, sector 590910530
> [6669459.855796] md/raid:md126: read error not correctable (sector 590910496
> on sdd1).
> [6669459.855814] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855817] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855821] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 4a 00 00 08
> 00
> [6669459.855831] end_request: I/O error, dev sdd, sector 590910538
> [6669459.855889] md/raid:md126: read error not correctable (sector 590910504
> on sdd1).
> [6669459.855907] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855910] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855914] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 52 00 00 08
> 00
> [6669459.855924] end_request: I/O error, dev sdd, sector 590910546
> [6669459.855982] md/raid:md126: read error not correctable (sector 590910512
> on sdd1).
> [6669459.855990] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855992] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855996] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 5a 00 00 08
> 00
> [6669459.856004] end_request: I/O error, dev sdd, sector 590910554
> [6669459.856062] md/raid:md126: read error not correctable (sector 590910520
> on sdd1).
> [6669459.856072] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856075] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856079] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 62 00 00 08
> 00
> [6669459.856088] end_request: I/O error, dev sdd, sector 590910562
> [6669459.856153] md/raid:md126: read error not correctable (sector 590910528
> on sdd1).
> [6669459.856171] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856174] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 6a 00 00 08
> 00
> [6669459.856188] end_request: I/O error, dev sdd, sector 590910570
> [6669459.856256] md/raid:md126: read error not correctable (sector 590910536
> on sdd1).
> [6669459.856265] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856268] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856272] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 72 00 00 08
> 00
> [6669459.856281] end_request: I/O error, dev sdd, sector 590910578
> [6669459.856346] md/raid:md126: read error not correctable (sector 590910544
> on sdd1).
> [6669459.856364] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856368] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856374] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 7a 00 00 08
> 00
> [6669459.856385] end_request: I/O error, dev sdd, sector 590910586
> [6669459.856445] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856449] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856456] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 82 00 00 08
> 00
> [6669459.856466] end_request: I/O error, dev sdd, sector 590910594
> [6669459.856526] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856530] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856537] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 8a 00 00 08
> 00
> [6669459.856547] end_request: I/O error, dev sdd, sector 590910602
> [6669459.856607] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856611] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856617] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 92 00 00 08
> 00
> [6669459.856628] end_request: I/O error, dev sdd, sector 590910610
> [6669459.856687] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856691] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856697] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 9a 00 00 08
> 00
> [6669459.856707] end_request: I/O error, dev sdd, sector 590910618
> [6669459.856767] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856772] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856778] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 a2 00 00 08
> 00
> [6669459.856788] end_request: I/O error, dev sdd, sector 590910626
> [6669459.856847] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856851] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856859] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 aa 00 00 08
> 00
> [6669459.856869] end_request: I/O error, dev sdd, sector 590910634
> [6669459.856928] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856932] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856938] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 b2 00 00 08
> 00
> [6669459.856949] end_request: I/O error, dev sdd, sector 590910642
> [6669459.857008] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857011] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857018] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ba 00 00 08
> 00
> [6669459.857028] end_request: I/O error, dev sdd, sector 590910650
> [6669459.857088] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857092] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857098] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 c2 00 00 08
> 00
> [6669459.857109] end_request: I/O error, dev sdd, sector 590910658
> [6669459.857168] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857171] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ca 00 00 08
> 00
> [6669459.857188] end_request: I/O error, dev sdd, sector 590910666
> [6669459.857248] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857251] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857258] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 d2 00 00 08
> 00
> [6669459.857269] end_request: I/O error, dev sdd, sector 590910674
> [6669459.857328] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857333] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857339] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 da 00 00 08
> 00
> [6669459.857349] end_request: I/O error, dev sdd, sector 590910682
> [6669459.857408] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857412] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857418] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 e2 00 00 08
> 00
> [6669459.857429] end_request: I/O error, dev sdd, sector 590910690
> [6669459.857488] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857492] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857499] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 93 4a 00 00 08
> 00
> [6669459.857509] end_request: I/O error, dev sdd, sector 590910282
> [6669459.857569] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857573] sd 6:1:10:0: [sdd] Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857579] sd 6:1:10:0: [sdd] CDB:
> [6669459.857585] aacraid: Host adapter abort request (6,1,10,0)
> [6669459.857639] Read(10): 28 00 23 38 93 42 00 00 08 00
> [6669459.857648] end_request: I/O error, dev sdd, sector 590910274
> [6669459.857844] aacraid: Host adapter reset request. SCSI hang ?
> [6669470.028090] RAID conf printout:
> [6669470.028097] --- level:5 rd:6 wd:4
> [6669470.028101] disk 0, o:1, dev:sde1
> [6669470.028105] disk 1, o:1, dev:sdc1
> [6669470.028109] disk 2, o:1, dev:sdf1
> [6669470.028112] disk 3, o:0, dev:sdd1
> [6669470.028115] disk 4, o:1, dev:sdh1
> [6669470.028118] disk 5, o:1, dev:sdg1
> [6669470.034462] RAID conf printout:
> [6669470.034464] --- level:5 rd:6 wd:4
> [6669470.034465] disk 0, o:1, dev:sde1
> [6669470.034466] disk 2, o:1, dev:sdf1
> [6669470.034467] disk 3, o:0, dev:sdd1
> [6669470.034468] disk 4, o:1, dev:sdh1
> [6669470.034469] disk 5, o:1, dev:sdg1
> [6669470.034484] RAID conf printout:
> [6669470.034486] --- level:5 rd:6 wd:4
> [6669470.034489] disk 0, o:1, dev:sde1
> [6669470.034491] disk 2, o:1, dev:sdf1
> [6669470.034494] disk 3, o:0, dev:sdd1
> [6669470.034496] disk 4, o:1, dev:sdh1
> [6669470.034499] disk 5, o:1, dev:sdg1
> [6669470.034571] RAID conf printout:
> [6669470.034577] --- level:5 rd:6 wd:4
> [6669470.034581] disk 0, o:1, dev:sde1
> [6669470.034584] disk 2, o:1, dev:sdf1
> [6669470.034587] disk 4, o:1, dev:sdh1
> [6669470.034589] disk 5, o:1, dev:sdg1
>
> Please let me know if you need any more information.
> --
> Best regards,
> Stefan Borggraefe
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
I won't scold you for using RAID5 instead of RAID6 with this number of
if drives and especially the size of the drives.
Could you please post the output of smartctl -a for each device? (from
smartmontools)
That way we can verify which HDDs are broken, before proceeding.
Mathias
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-02 12:30 ` Mathias Burén
@ 2013-05-02 13:14 ` Stefan Borggraefe
2013-05-02 13:17 ` Mathias Burén
0 siblings, 1 reply; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-02 13:14 UTC (permalink / raw)
To: Mathias Burén; +Cc: Linux-RAID
Am Donnerstag, 2. Mai 2013, 13:30:22 schrieb Mathias Burén:
> I won't scold you for using RAID5 instead of RAID6 with this number of
> if drives and especially the size of the drives.
>
> Could you please post the output of smartctl -a for each device? (from
> smartmontools)
>
> That way we can verify which HDDs are broken, before proceeding.
>
> Mathias
Hello Mathias,
RAID6 would have been the safer option clearly, but we needed the
extra-space and only had this number of drives available.
Here the requested output:
smartctl -a /dev/sdc
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Vendor: Hitachi
Product: HUS724040ALE640
Revision: MJAO
User Capacity: 4.000.787.030.016 bytes [4,00 TB]
Logical block size: 512 bytes
Logical Unit id: 0x5000cca22bd08a85
Serial number: PK2331PAH5D0YT
Device type: disk
Local Time is: Thu May 2 15:09:16 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
Current Drive Temperature: <not available>
Error Counter logging not supported
Device does not support Self Test logging
smartctl -a /dev/sdd
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Vendor: Hitachi
Product: HUS724040ALE640
Revision: MJAO
User Capacity: 4.000.787.030.016 bytes [4,00 TB]
Logical block size: 512 bytes
Logical Unit id: 0x5000cca22bc3effb
Serial number: PK2331PAG8NHZT
Device type: disk
Local Time is: Thu May 2 15:09:19 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
Current Drive Temperature: <not available>
Error Counter logging not supported
Device does not support Self Test logging
smartctl -a /dev/sde
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Vendor: Hitachi
Product: HUS724040ALE640
Revision: MJAO
User Capacity: 4.000.787.030.016 bytes [4,00 TB]
Logical block size: 512 bytes
Logical Unit id: 0x5000cca22bc3ff79
Serial number: PK2331PAG8TMXT
Device type: disk
Local Time is: Thu May 2 15:09:23 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
Current Drive Temperature: <not available>
Error Counter logging not supported
Device does not support Self Test logging
smartctl -a /dev/sdf
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Vendor: Hitachi
Product: HUS724040ALE640
Revision: MJAO
User Capacity: 4.000.787.030.016 bytes [4,00 TB]
Logical block size: 512 bytes
^[[3~Logical Unit id: 0x5000cca22bc419ef
Serial number: PK2331PAG90PET
Device type: disk
Local Time is: Thu May 2 15:09:25 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
Current Drive Temperature: <not available>
Error Counter logging not supported
Device does not support Self Test logging
smartctl -a /dev/sdg
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Vendor: Hitachi
Product: HUS724040ALE640
Revision: MJAO
User Capacity: 4.000.787.030.016 bytes [4,00 TB]
Logical block size: 512 bytes
Logical Unit id: 0x5000cca22bc2fe51
Serial number: PK2331PAG6L49T
Device type: disk
Local Time is: Thu May 2 15:09:27 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
Current Drive Temperature: <not available>
Error Counter logging not supported
Device does not support Self Test logging
smartctl -a /dev/sdh
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Vendor: Hitachi
Product: HUS724040ALE640
Revision: MJAO
User Capacity: 4.000.787.030.016 bytes [4,00 TB]
Logical block size: 512 bytes
Logical Unit id: 0x5000cca22bc2fe2d
Serial number: PK2331PAG6L34T
Device type: disk
Local Time is: Thu May 2 15:09:30 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
Current Drive Temperature: <not available>
Error Counter logging not supported
Device does not support Self Test logging
--
Best regards,
Stefan Borggraefe
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-02 13:14 ` Stefan Borggraefe
@ 2013-05-02 13:17 ` Mathias Burén
2013-05-02 13:29 ` Stefan Borggraefe
0 siblings, 1 reply; 13+ messages in thread
From: Mathias Burén @ 2013-05-02 13:17 UTC (permalink / raw)
To: Stefan Borggraefe; +Cc: Linux-RAID
On 2 May 2013 14:14, Stefan Borggraefe <stefan@spybot.info> wrote:
> Am Donnerstag, 2. Mai 2013, 13:30:22 schrieb Mathias Burén:
>> I won't scold you for using RAID5 instead of RAID6 with this number of
>> if drives and especially the size of the drives.
>>
>> Could you please post the output of smartctl -a for each device? (from
>> smartmontools)
>>
>> That way we can verify which HDDs are broken, before proceeding.
>>
>> Mathias
>
> Hello Mathias,
>
> RAID6 would have been the safer option clearly, but we needed the
> extra-space and only had this number of drives available.
>
> Here the requested output:
>
> smartctl -a /dev/sdc
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor: Hitachi
> Product: HUS724040ALE640
> Revision: MJAO
> User Capacity: 4.000.787.030.016 bytes [4,00 TB]
> Logical block size: 512 bytes
> Logical Unit id: 0x5000cca22bd08a85
> Serial number: PK2331PAH5D0YT
> Device type: disk
> Local Time is: Thu May 2 15:09:16 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature: <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
>
> smartctl -a /dev/sdd
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor: Hitachi
> Product: HUS724040ALE640
> Revision: MJAO
> User Capacity: 4.000.787.030.016 bytes [4,00 TB]
> Logical block size: 512 bytes
> Logical Unit id: 0x5000cca22bc3effb
> Serial number: PK2331PAG8NHZT
> Device type: disk
> Local Time is: Thu May 2 15:09:19 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature: <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
>
> smartctl -a /dev/sde
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor: Hitachi
> Product: HUS724040ALE640
> Revision: MJAO
> User Capacity: 4.000.787.030.016 bytes [4,00 TB]
> Logical block size: 512 bytes
> Logical Unit id: 0x5000cca22bc3ff79
> Serial number: PK2331PAG8TMXT
> Device type: disk
> Local Time is: Thu May 2 15:09:23 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature: <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
>
> smartctl -a /dev/sdf
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor: Hitachi
> Product: HUS724040ALE640
> Revision: MJAO
> User Capacity: 4.000.787.030.016 bytes [4,00 TB]
> Logical block size: 512 bytes
> ^[[3~Logical Unit id: 0x5000cca22bc419ef
> Serial number: PK2331PAG90PET
> Device type: disk
> Local Time is: Thu May 2 15:09:25 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature: <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
>
> smartctl -a /dev/sdg
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor: Hitachi
> Product: HUS724040ALE640
> Revision: MJAO
> User Capacity: 4.000.787.030.016 bytes [4,00 TB]
> Logical block size: 512 bytes
> Logical Unit id: 0x5000cca22bc2fe51
> Serial number: PK2331PAG6L49T
> Device type: disk
> Local Time is: Thu May 2 15:09:27 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature: <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
>
> smartctl -a /dev/sdh
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor: Hitachi
> Product: HUS724040ALE640
> Revision: MJAO
> User Capacity: 4.000.787.030.016 bytes [4,00 TB]
> Logical block size: 512 bytes
> Logical Unit id: 0x5000cca22bc2fe2d
> Serial number: PK2331PAG6L34T
> Device type: disk
> Local Time is: Thu May 2 15:09:30 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature: <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
> --
> Best regards,
> Stefan Borggraefe
Hm are these behind some controller of sorts? What about smartctl -x ?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-02 13:17 ` Mathias Burén
@ 2013-05-02 13:29 ` Stefan Borggraefe
2013-05-02 13:49 ` Mathias Burén
0 siblings, 1 reply; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-02 13:29 UTC (permalink / raw)
To: Mathias Burén; +Cc: Linux-RAID
Am Donnerstag, 2. Mai 2013, 14:17:04 schrieb Mathias Burén:
> On 2 May 2013 14:14, Stefan Borggraefe <stefan@spybot.info> wrote:
> > Am Donnerstag, 2. Mai 2013, 13:30:22 schrieb Mathias Burén:
> >> I won't scold you for using RAID5 instead of RAID6 with this number of
> >> if drives and especially the size of the drives.
> >>
> >> Could you please post the output of smartctl -a for each device? (from
> >> smartmontools)
> >>
> >> That way we can verify which HDDs are broken, before proceeding.
> >>
> >> Mathias
> >
> > Hello Mathias,
> >
> > RAID6 would have been the safer option clearly, but we needed the
> > extra-space and only had this number of drives available.
> >
> > Here the requested output:
> >
> > smartctl -a /dev/sdc
> > smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local
> > build) Copyright (C) 2002-11 by Bruce Allen,
> > http://smartmontools.sourceforge.net
> >
> > Vendor: Hitachi
> > Product: HUS724040ALE640
> > Revision: MJAO
> > User Capacity: 4.000.787.030.016 bytes [4,00 TB]
> > Logical block size: 512 bytes
> > Logical Unit id: 0x5000cca22bd08a85
> > Serial number: PK2331PAH5D0YT
> > Device type: disk
> > Local Time is: Thu May 2 15:09:16 2013 CEST
> > Device supports SMART and is Enabled
> > Temperature Warning Disabled or Not Supported
> > SMART Health Status: OK
> >
> > [...]
>
> Hm are these behind some controller of sorts? What about smartctl -x ?
We use an Adaptec 71605 controller. smartctl -x does not provide any more
useful information, I suppose. I only post the output of one drive as an
example this time. They all give a similar result.
smartctl -x /dev/sdc
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Vendor: Hitachi
Product: HUS724040ALE640
Revision: MJAO
User Capacity: 4.000.787.030.016 bytes [4,00 TB]
Logical block size: 512 bytes
Logical Unit id: 0x5000cca22bd08a85
Serial number: PK2331PAH5D0YT
Device type: disk
Local Time is: Thu May 2 15:20:55 2013 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
Current Drive Temperature: <not available>
Error Counter logging not supported
Device does not support Self Test logging
Device does not support Background scan results logging
scsiPrintSasPhy Log Sense Failed [unsupported field in scsi command]
What about the state of the software RAID5? It would be great if I
could bring it back to a state where the filesystem on it is fully
working again without having to copy the 20 TB of data to it again
(copying this amount of data takes some time :( ).
--
Best regards,
Stefan Borggraefe
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-02 13:29 ` Stefan Borggraefe
@ 2013-05-02 13:49 ` Mathias Burén
2013-05-02 14:17 ` Stefan Borggraefe
0 siblings, 1 reply; 13+ messages in thread
From: Mathias Burén @ 2013-05-02 13:49 UTC (permalink / raw)
To: Stefan Borggraefe; +Cc: Linux-RAID
On 2 May 2013 14:29, Stefan Borggraefe <stefan@spybot.info> wrote:
> Am Donnerstag, 2. Mai 2013, 14:17:04 schrieb Mathias Burén:
>> On 2 May 2013 14:14, Stefan Borggraefe <stefan@spybot.info> wrote:
>> > Am Donnerstag, 2. Mai 2013, 13:30:22 schrieb Mathias Burén:
>> >> I won't scold you for using RAID5 instead of RAID6 with this number of
>> >> if drives and especially the size of the drives.
>> >>
>> >> Could you please post the output of smartctl -a for each device? (from
>> >> smartmontools)
>> >>
>> >> That way we can verify which HDDs are broken, before proceeding.
>> >>
>> >> Mathias
>> >
>> > Hello Mathias,
>> >
>> > RAID6 would have been the safer option clearly, but we needed the
>> > extra-space and only had this number of drives available.
>> >
>> > Here the requested output:
>> >
>> > smartctl -a /dev/sdc
>> > smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local
>> > build) Copyright (C) 2002-11 by Bruce Allen,
>> > http://smartmontools.sourceforge.net
>> >
>> > Vendor: Hitachi
>> > Product: HUS724040ALE640
>> > Revision: MJAO
>> > User Capacity: 4.000.787.030.016 bytes [4,00 TB]
>> > Logical block size: 512 bytes
>> > Logical Unit id: 0x5000cca22bd08a85
>> > Serial number: PK2331PAH5D0YT
>> > Device type: disk
>> > Local Time is: Thu May 2 15:09:16 2013 CEST
>> > Device supports SMART and is Enabled
>> > Temperature Warning Disabled or Not Supported
>> > SMART Health Status: OK
>> >
>> > [...]
>>
>> Hm are these behind some controller of sorts? What about smartctl -x ?
>
> We use an Adaptec 71605 controller. smartctl -x does not provide any more
> useful information, I suppose. I only post the output of one drive as an
> example this time. They all give a similar result.
>
> smartctl -x /dev/sdc
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-37-generic] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor: Hitachi
> Product: HUS724040ALE640
> Revision: MJAO
> User Capacity: 4.000.787.030.016 bytes [4,00 TB]
> Logical block size: 512 bytes
> Logical Unit id: 0x5000cca22bd08a85
> Serial number: PK2331PAH5D0YT
> Device type: disk
> Local Time is: Thu May 2 15:20:55 2013 CEST
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Current Drive Temperature: <not available>
>
> Error Counter logging not supported
> Device does not support Self Test logging
> Device does not support Background scan results logging
> scsiPrintSasPhy Log Sense Failed [unsupported field in scsi command]
>
> What about the state of the software RAID5? It would be great if I
> could bring it back to a state where the filesystem on it is fully
> working again without having to copy the 20 TB of data to it again
> (copying this amount of data takes some time :( ).
> --
> Best regards,
> Stefan Borggraefe
Ugh, Adaptec, not my favourite controller. Do you have arcconf
installed? You could to arcconf getconfig 1 (or whatever your
controller number is) to grab some information regarding your
controller and the HDDs connected to it.
Do you have /dev/sg? devices? If so, smartctl -a might work on them.
Re the software RAID, I would double check the health of your HDDs
before attempting anything.
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-02 13:49 ` Mathias Burén
@ 2013-05-02 14:17 ` Stefan Borggraefe
0 siblings, 0 replies; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-02 14:17 UTC (permalink / raw)
To: Mathias Burén; +Cc: Linux-RAID
Am Donnerstag, 2. Mai 2013, 14:49:39 schrieb Mathias Burén:
> Ugh, Adaptec, not my favourite controller. Do you have arcconf
> installed? You could to arcconf getconfig 1 (or whatever your
> controller number is) to grab some information regarding your
> controller and the HDDs connected to it.
arcconf getconfig 1
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status : Inaccessible
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
No logical devices configured
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Command completed successfully.
Could this be because we do not use the hardware RAID features of
the controller?
> Do you have /dev/sg? devices? If so, smartctl -a might work on them.
Yes, but using smartctl -a/-x on these devices gives the same output as
using it on the /dev/sd? devices. :(
> Re the software RAID, I would double check the health of your HDDs
> before attempting anything.
Ok, this makes sense.
--
Best regards,
Stefan Borggraefe
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-02 12:24 Help with recovering a RAID5 array Stefan Borggraefe
2013-05-02 12:30 ` Mathias Burén
@ 2013-05-03 8:38 ` Ole Tange
2013-05-04 11:13 ` Stefan Borggraefe
1 sibling, 1 reply; 13+ messages in thread
From: Ole Tange @ 2013-05-03 8:38 UTC (permalink / raw)
To: Stefan Borggraefe; +Cc: linux-raid
On Thu, May 2, 2013 at 2:24 PM, Stefan Borggraefe <stefan@spybot.info> wrote:
> I am using a RAID5 software RAID on Ubuntu 12.04
:
> It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file system.
>
> When I returned to this server this morning, the array was in the following
> state:
>
> md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
> 19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> [U_U_UU]
>
> sdc is the newly added hard disk, but now also sdd failed. :( It would be
> great if there was a way to have the this RAID5 working again. Perhaps sdc1
> can then be fully added to the array and after this drive sdd also exchanged.
I have had a few raid6 fail in a similar fashion: the 3rd drive
faliing during rebuild (Also 4 TB Hitachi by the way).
I tested if the drives were fine:
parallel dd if={} of=/dev/null bs=1000k ::: /dev/sd?
And they were all fine. If the failing drive had actually failed (i.e.
bad sector), then I would use GNU ddrescue to copy the failing drive
to a new drive. ddrescue can read forwards on a drive, but can also
read backwards. Even though backwards reading is slower, you can use
that to approach the failing sector from "the other side". This way
you can often get down to very few actually failing sectors.
With only a few failing sectors (if any) I figured that very little
would be lost by forcing the failing drive online. Remove the spare
drive, and force the remaining online:
mdadm -A --scan --force
This should not cause any rebuild to happen as you have removed the spare.
See: http://serverfault.com/questions/443763/linux-software-raid6-3-drives-offline-how-to-force-online
Next step is to do fsck. Since fsck will write to the disk (and thus
be impossible to revert from) I put an overlay on the md-device, so
that nothing was written to the disks - instead changes were simply
written to a file.
See: http://unix.stackexchange.com/questions/67678/gnu-linux-overlay-block-device-stackable-block-device
This overlayed device I then ran fsck on. Then I checked everything
was OK. When everything was OK, I removed the overlay and did the fsck
on the real drives.
Thinking back it might even have made sense to overlay every
underlying block device, thus ensuring that nothing (not even the
md-driver) wrote anything to the devices before I as ready to commit.
/Ole
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-03 8:38 ` Ole Tange
@ 2013-05-04 11:13 ` Stefan Borggraefe
2013-05-06 6:31 ` NeilBrown
0 siblings, 1 reply; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-04 11:13 UTC (permalink / raw)
To: Ole Tange; +Cc: linux-raid
Am Freitag, 3. Mai 2013, 10:38:52 schrieben Sie:
> On Thu, May 2, 2013 at 2:24 PM, Stefan Borggraefe <stefan@spybot.info>
wrote:
> > I am using a RAID5 software RAID on Ubuntu 12.04
> >
> > It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file
> > system.
> >
> > When I returned to this server this morning, the array was in the
> > following
> > state:
> >
> > md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
> >
> > 19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> >
> > [U_U_UU]
> >
> > sdc is the newly added hard disk, but now also sdd failed. :( It would be
> > great if there was a way to have the this RAID5 working again. Perhaps
> > sdc1
> > can then be fully added to the array and after this drive sdd also
> > exchanged.
> I have had a few raid6 fail in a similar fashion: the 3rd drive
> faliing during rebuild (Also 4 TB Hitachi by the way).
>
> I tested if the drives were fine:
>
> parallel dd if={} of=/dev/null bs=1000k ::: /dev/sd?
>
> And they were all fine.
Same for me.
> With only a few failing sectors (if any) I figured that very little
> would be lost by forcing the failing drive online. Remove the spare
> drive, and force the remaining online:
>
> mdadm -A --scan --force
I removed the spare /dev/sdc1 from /dev/md126
with
mdadm /dev/md126 --remove /dev/sdc1
After mdadm -A --scan --force the array is now in this state
md126 : active raid5 sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
[U_U_UU]
> Next step is to do fsck.
I think this is not possible yet at this point. Don't I need to reassemble the
array using the --assume-clean option and with one missing drive first? Some
step is missing here.
Stefan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-04 11:13 ` Stefan Borggraefe
@ 2013-05-06 6:31 ` NeilBrown
2013-05-06 8:12 ` Stefan Borggraefe
0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2013-05-06 6:31 UTC (permalink / raw)
To: Stefan Borggraefe; +Cc: Ole Tange, linux-raid
[-- Attachment #1: Type: text/plain, Size: 2383 bytes --]
On Sat, 04 May 2013 13:13:27 +0200 Stefan Borggraefe <stefan@spybot.info>
wrote:
> Am Freitag, 3. Mai 2013, 10:38:52 schrieben Sie:
> > On Thu, May 2, 2013 at 2:24 PM, Stefan Borggraefe <stefan@spybot.info>
> wrote:
> > > I am using a RAID5 software RAID on Ubuntu 12.04
> > >
> > > It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file
> > > system.
> > >
> > > When I returned to this server this morning, the array was in the
> > > following
> > > state:
> > >
> > > md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
> > >
> > > 19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> > >
> > > [U_U_UU]
> > >
> > > sdc is the newly added hard disk, but now also sdd failed. :( It would be
> > > great if there was a way to have the this RAID5 working again. Perhaps
> > > sdc1
> > > can then be fully added to the array and after this drive sdd also
> > > exchanged.
> > I have had a few raid6 fail in a similar fashion: the 3rd drive
> > faliing during rebuild (Also 4 TB Hitachi by the way).
> >
> > I tested if the drives were fine:
> >
> > parallel dd if={} of=/dev/null bs=1000k ::: /dev/sd?
> >
> > And they were all fine.
>
> Same for me.
>
> > With only a few failing sectors (if any) I figured that very little
> > would be lost by forcing the failing drive online. Remove the spare
> > drive, and force the remaining online:
> >
> > mdadm -A --scan --force
>
> I removed the spare /dev/sdc1 from /dev/md126
>
> with
>
> mdadm /dev/md126 --remove /dev/sdc1
>
> After mdadm -A --scan --force the array is now in this state
>
> md126 : active raid5 sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
> 19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> [U_U_UU]
Did you stop the array first?
i.e.
mdadm --stop /dev/md126
mdadm -Asfvv
NeilBrown
>
> > Next step is to do fsck.
>
> I think this is not possible yet at this point. Don't I need to reassemble the
> array using the --assume-clean option and with one missing drive first? Some
> step is missing here.
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-06 6:31 ` NeilBrown
@ 2013-05-06 8:12 ` Stefan Borggraefe
2013-05-10 10:14 ` Stefan Borggraefe
0 siblings, 1 reply; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-06 8:12 UTC (permalink / raw)
To: NeilBrown, linux-raid
Am Montag, 6. Mai 2013, 16:31:02 schrieb NeilBrown:
> On Sat, 04 May 2013 13:13:27 +0200 Stefan Borggraefe <stefan@spybot.info>
>
> wrote:
> > Am Freitag, 3. Mai 2013, 10:38:52 schrieben Sie:
> > > On Thu, May 2, 2013 at 2:24 PM, Stefan Borggraefe <stefan@spybot.info>
> >
> > wrote:
> > > > I am using a RAID5 software RAID on Ubuntu 12.04
> > > >
> > > > It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file
> > > > system.
> > > >
> > > > When I returned to this server this morning, the array was in the
> > > > following
> > > > state:
> > > >
> > > > md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6]
> > > > sdf1[2]
> > > >
> > > > 19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2
> > > > [6/4]
> > > >
> > > > [U_U_UU]
> > > >
> > > > sdc is the newly added hard disk, but now also sdd failed. :( It would
> > > > be
> > > > great if there was a way to have the this RAID5 working again. Perhaps
> > > > sdc1
> > > > can then be fully added to the array and after this drive sdd also
> > > > exchanged.
> > >
> > > I have had a few raid6 fail in a similar fashion: the 3rd drive
> > > faliing during rebuild (Also 4 TB Hitachi by the way).
> > >
> > > I tested if the drives were fine:
> > > parallel dd if={} of=/dev/null bs=1000k ::: /dev/sd?
> > >
> > > And they were all fine.
> >
> > Same for me.
> >
> > > With only a few failing sectors (if any) I figured that very little
> > > would be lost by forcing the failing drive online. Remove the spare
> > >
> > > drive, and force the remaining online:
> > > mdadm -A --scan --force
> >
> > I removed the spare /dev/sdc1 from /dev/md126
> >
> > with
> >
> > mdadm /dev/md126 --remove /dev/sdc1
> >
> > After mdadm -A --scan --force the array is now in this state
> >
> > md126 : active raid5 sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
> >
> > 19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> >
> > [U_U_UU]
>
> Did you stop the array first?
> i.e.
> mdadm --stop /dev/md126
> mdadm -Asfvv
>
> NeilBrown
Thank you Neil, yes this was my mistake. I realised it in the meantime and am
currently checking the file system using overlay files as suggested in
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Force_assembly
--
Best regards,
Stefan Borggraefe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-06 8:12 ` Stefan Borggraefe
@ 2013-05-10 10:14 ` Stefan Borggraefe
2013-05-10 10:48 ` NeilBrown
0 siblings, 1 reply; 13+ messages in thread
From: Stefan Borggraefe @ 2013-05-10 10:14 UTC (permalink / raw)
To: linux-raid; +Cc: NeilBrown, Ole Tange
Am Montag, 6. Mai 2013, 10:12:42 schrieb Stefan Borggraefe:
> Am Montag, 6. Mai 2013, 16:31:02 schrieb NeilBrown:
> > Did you stop the array first?
> >
> > i.e.
> >
> > mdadm --stop /dev/md126
> > mdadm -Asfvv
> >
> > NeilBrown
>
> Thank you Neil, yes this was my mistake. I realised it in the meantime and
> am currently checking the file system using overlay files as suggested in
>
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
fsck.ext4 -y -C 0 /dev/md126
ended with
[7543439.065625] Pid: 13706, comm: fsck.ext4 Tainted: G W O 3.2.0-37-
generic #58-Ubuntu
[7543439.065627] Call Trace:
[7543439.065629] [<ffffffff81067f0f>] warn_slowpath_common+0x7f/0xc0
[7543439.065631] [<ffffffff81067f6a>] warn_slowpath_null+0x1a/0x20
[7543439.065634] [<ffffffffa00aea35>] init_stripe+0x245/0x270 [raid456]
[7543439.065637] [<ffffffffa00b2772>] get_active_stripe+0x3a2/0x3c0 [raid456]
[7543439.065639] [<ffffffff814e67d9>] ? mddev_bio_destructor+0x19/0x20
[7543439.065641] [<ffffffff8108bbc0>] ? prepare_to_wait+0x60/0x90
[7543439.065644] [<ffffffffa00b5b94>] make_request+0x194/0x430 [raid456]
[7543439.065646] [<ffffffff813030d6>] ? throtl_find_tg+0x46/0x60
[7543439.065647] [<ffffffff8108bd20>] ? add_wait_queue+0x60/0x60
[7543439.065650] [<ffffffff814e5f10>] md_make_request+0xd0/0x200
[7543439.065652] [<ffffffff8111b145>] ? mempool_alloc_slab+0x15/0x20
[7543439.065654] [<ffffffff812f2264>] generic_make_request.part.50+0x74/0xb0
[7543439.065656] [<ffffffff812f2678>] generic_make_request+0x68/0x70
[7543439.065658] [<ffffffff812f2705>] submit_bio+0x85/0x110
[7543439.065660] [<ffffffff811afa2a>] ? bio_alloc_bioset+0x5a/0xf0
[7543439.065662] [<ffffffff811a997b>] submit_bh+0xeb/0x120
[7543439.065664] [<ffffffff811ac615>] block_read_full_page+0x225/0x390
[7543439.065666] [<ffffffff811b1700>] ? blkdev_get_blocks+0xd0/0xd0
[7543439.065668] [<ffffffff811190e5>] ? add_to_page_cache_locked+0x85/0xa0
[7543439.065670] [<ffffffff811b0dc8>] blkdev_readpage+0x18/0x20
[7543439.065672] [<ffffffff8111967d>]
do_generic_file_read.constprop.33+0x10d/0x440
[7543439.065675] [<ffffffff8111a74f>] generic_file_aio_read+0xef/0x280
[7543439.065677] [<ffffffff813bb14e>] ? tty_wakeup+0x3e/0x80
[7543439.065679] [<ffffffff81178d8a>] do_sync_read+0xda/0x120
[7543439.065681] [<ffffffff8129ee13>] ? security_file_permission+0x93/0xb0
[7543439.065683] [<ffffffff81179211>] ? rw_verify_area+0x61/0xf0
[7543439.065684] [<ffffffff811796f0>] vfs_read+0xb0/0x180
[7543439.065686] [<ffffffff8117980a>] sys_read+0x4a/0x90
[7543439.065688] [<ffffffff81665842>] system_call_fastpath+0x16/0x1b
[7543439.065689] ---[ end trace cce84f1e6de88596 ]---
[7543441.630145] quiet_error: 230464 callbacks suppressed
I think I will give up here and just start again with a clean RAID6 to copy
the data onto anew.
--
Best regards,
Stefan Borggraefe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help with recovering a RAID5 array
2013-05-10 10:14 ` Stefan Borggraefe
@ 2013-05-10 10:48 ` NeilBrown
0 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2013-05-10 10:48 UTC (permalink / raw)
To: Stefan Borggraefe; +Cc: linux-raid, Ole Tange
[-- Attachment #1: Type: text/plain, Size: 3237 bytes --]
On Fri, 10 May 2013 12:14:36 +0200 Stefan Borggraefe <stefan@spybot.info>
wrote:
> Am Montag, 6. Mai 2013, 10:12:42 schrieb Stefan Borggraefe:
> > Am Montag, 6. Mai 2013, 16:31:02 schrieb NeilBrown:
> > > Did you stop the array first?
> > >
> > > i.e.
> > >
> > > mdadm --stop /dev/md126
> > > mdadm -Asfvv
> > >
> > > NeilBrown
> >
> > Thank you Neil, yes this was my mistake. I realised it in the meantime and
> > am currently checking the file system using overlay files as suggested in
> >
> > https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
>
> fsck.ext4 -y -C 0 /dev/md126
>
> ended with
>
> [7543439.065625] Pid: 13706, comm: fsck.ext4 Tainted: G W O 3.2.0-37-
> generic #58-Ubuntu
> [7543439.065627] Call Trace:
> [7543439.065629] [<ffffffff81067f0f>] warn_slowpath_common+0x7f/0xc0
> [7543439.065631] [<ffffffff81067f6a>] warn_slowpath_null+0x1a/0x20
> [7543439.065634] [<ffffffffa00aea35>] init_stripe+0x245/0x270 [raid456]
> [7543439.065637] [<ffffffffa00b2772>] get_active_stripe+0x3a2/0x3c0 [raid456]
> [7543439.065639] [<ffffffff814e67d9>] ? mddev_bio_destructor+0x19/0x20
> [7543439.065641] [<ffffffff8108bbc0>] ? prepare_to_wait+0x60/0x90
> [7543439.065644] [<ffffffffa00b5b94>] make_request+0x194/0x430 [raid456]
> [7543439.065646] [<ffffffff813030d6>] ? throtl_find_tg+0x46/0x60
> [7543439.065647] [<ffffffff8108bd20>] ? add_wait_queue+0x60/0x60
> [7543439.065650] [<ffffffff814e5f10>] md_make_request+0xd0/0x200
> [7543439.065652] [<ffffffff8111b145>] ? mempool_alloc_slab+0x15/0x20
> [7543439.065654] [<ffffffff812f2264>] generic_make_request.part.50+0x74/0xb0
> [7543439.065656] [<ffffffff812f2678>] generic_make_request+0x68/0x70
> [7543439.065658] [<ffffffff812f2705>] submit_bio+0x85/0x110
> [7543439.065660] [<ffffffff811afa2a>] ? bio_alloc_bioset+0x5a/0xf0
> [7543439.065662] [<ffffffff811a997b>] submit_bh+0xeb/0x120
> [7543439.065664] [<ffffffff811ac615>] block_read_full_page+0x225/0x390
> [7543439.065666] [<ffffffff811b1700>] ? blkdev_get_blocks+0xd0/0xd0
> [7543439.065668] [<ffffffff811190e5>] ? add_to_page_cache_locked+0x85/0xa0
> [7543439.065670] [<ffffffff811b0dc8>] blkdev_readpage+0x18/0x20
> [7543439.065672] [<ffffffff8111967d>]
> do_generic_file_read.constprop.33+0x10d/0x440
> [7543439.065675] [<ffffffff8111a74f>] generic_file_aio_read+0xef/0x280
> [7543439.065677] [<ffffffff813bb14e>] ? tty_wakeup+0x3e/0x80
> [7543439.065679] [<ffffffff81178d8a>] do_sync_read+0xda/0x120
> [7543439.065681] [<ffffffff8129ee13>] ? security_file_permission+0x93/0xb0
> [7543439.065683] [<ffffffff81179211>] ? rw_verify_area+0x61/0xf0
> [7543439.065684] [<ffffffff811796f0>] vfs_read+0xb0/0x180
> [7543439.065686] [<ffffffff8117980a>] sys_read+0x4a/0x90
> [7543439.065688] [<ffffffff81665842>] system_call_fastpath+0x16/0x1b
> [7543439.065689] ---[ end trace cce84f1e6de88596 ]---
> [7543441.630145] quiet_error: 230464 callbacks suppressed
>
> I think I will give up here and just start again with a clean RAID6 to copy
> the data onto anew.
You've missed some important context there. What are the dozen or so lines
around this in the logs?
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-05-10 10:48 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-02 12:24 Help with recovering a RAID5 array Stefan Borggraefe
2013-05-02 12:30 ` Mathias Burén
2013-05-02 13:14 ` Stefan Borggraefe
2013-05-02 13:17 ` Mathias Burén
2013-05-02 13:29 ` Stefan Borggraefe
2013-05-02 13:49 ` Mathias Burén
2013-05-02 14:17 ` Stefan Borggraefe
2013-05-03 8:38 ` Ole Tange
2013-05-04 11:13 ` Stefan Borggraefe
2013-05-06 6:31 ` NeilBrown
2013-05-06 8:12 ` Stefan Borggraefe
2013-05-10 10:14 ` Stefan Borggraefe
2013-05-10 10:48 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).