linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* help needed restoring data
@ 2013-08-03 13:59 Uwe Wächter
  2013-08-04 14:54 ` Phil Turmel
  0 siblings, 1 reply; 2+ messages in thread
From: Uwe Wächter @ 2013-08-03 13:59 UTC (permalink / raw)
  To: linux-raid

Hi list,
I have a problem with my raid 5 array based on 3 harddisks. I need
help to find the best way to recover my data.
So what happens ....
First of all 1 harddisk fails with the message :

A Fail event had been detected on md device /dev/md/0.

It could be related to component device /dev/sde1.


Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sde1[4](F) sdc1[3] sdd1[0]
      1953517568 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [U_U]

unused devices: <none>


No problem at this time, the raid is still running with 2 harddisks.
The reason for the error was the following message of sde1
Jul 27 06:38:31 fed1 kernel: [987606.852610] md/raid:md0: Disk failure
on sde1, disabling device.
Jul 27 06:38:31 fed1 kernel: [987606.852610] md/raid:md0: Operation
continuing on 2 devices.
Jul 27 06:38:31 fed1 kernel: [987606.852613] sd 4:0:0:0: [sde]
Jul 27 06:38:31 fed1 kernel: [987606.852615] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 27 06:38:31 fed1 kernel: [987606.852617] sd 4:0:0:0: [sde] CDB:
Jul 27 06:38:31 fed1 kernel: [987606.852618] Read(10): 28 00 36 d4 f0
3f 00 02 98 00
Jul 27 06:38:31 fed1 kernel: [987606.852625] end_request: I/O error,
dev sde, sector 919924799
Jul 27 06:38:31 fed1 kernel: [987606.852629] md/raid:md0: read error
not correctable (sector 919924736 on sde1).
Jul 27 06:38:31 fed1 kernel: [987606.852638] md/raid:md0: read error
not correctable (sector 919924744 on sde1).
Jul 27 06:38:31 fed1 kernel: [987606.852641] md/raid:md0: read error
not correctable (sector 919924752 on sde1).
Jul 27 06:38:31 fed1 kernel: [987606.852643] md/raid:md0: read error
not correctable (sector 919924760 on sde1).
Jul 27 06:38:31 fed1 kernel: [987606.852645] md/raid:md0: read error
not correctable (sector 919924768 on sde1).


A couple of hours later, just when I wanted to start backup the data
partition the sdd also fails :-( with similar failures :

A Fail event had been detected on md device /dev/md/0.

It could be related to component device /dev/sdd1.


Personalities : [raid6] [raid5] [raid4]

md0 : active raid5 sde1[4](F) sdc1[3] sdd1[0](F)
      1953517568 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/1] [__U]

unused devices: <none>


Jul 27 11:24:30 fed1 kernel: [1004773.836964] Add. Sense: Unrecovered
read error - auto reallocate failed
Jul 27 11:24:30 fed1 kernel: [1004773.836966] sd 3:0:0:0: [sdd] CDB:
Jul 27 11:24:30 fed1 kernel: [1004773.836967] Read(10): 28 00 32 f7 0b
6f 00 00 08 00
Jul 27 11:24:30 fed1 kernel: [1004773.836975] end_request: I/O error,
dev sdd, sector 855051119
Jul 27 11:24:30 fed1 kernel: [1004773.836980] raid5_end_read_request:
104 callbacks suppressed
Jul 27 11:24:30 fed1 kernel: [1004773.836982] md/raid:md0: read error
not correctable (sector 855051056 on sdd1).
Jul 27 11:24:30 fed1 kernel: [1004773.836986] md/raid:md0: Disk
failure on sdd1, disabling device.
Jul 27 11:24:30 fed1 kernel: [1004773.836986] md/raid:md0: Operation
continuing on 1 devices.
Jul 27 11:24:30 fed1 kernel: [1004773.837009] ata4: EH complete
Jul 27 11:24:30 fed1 kernel: [1004773.853284] Buffer I/O error on
device dm-2, logical block 213806870
Jul 27 11:24:30 fed1 kernel: [1004773.853291] Buffer I/O error on
device dm-2, logical block 213806871
Jul 27 11:24:30 fed1 kernel: [1004773.853296] EXT4-fs warning (device
dm-2): ext4_end_bio:294: I/O error writing to inode 60040


So I have 2 device sdd and sde with unreadable sectors. sdc has no errors.
smartd[690]: Device: /dev/sde [SAT], 14 Currently unreadable (pending) sectors
smartd[690]: Device: /dev/sdd [SAT], 11 Currently unreadable (pending) sectors

What is the best way to rebuild the array with sdc and sdd only ?
Can I correct unreadable sectors before rebuild the array or does it
destroy the array.
Do I need to change the 2 harddisks ?

I hope you can help me restoring much data as possible.
best regards Uwe

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: help needed restoring data
  2013-08-03 13:59 help needed restoring data Uwe Wächter
@ 2013-08-04 14:54 ` Phil Turmel
  0 siblings, 0 replies; 2+ messages in thread
From: Phil Turmel @ 2013-08-04 14:54 UTC (permalink / raw)
  To: Uwe Wächter; +Cc: linux-raid

Good morning Uwe,

I see that you've gone unanswered for a while here...

On 08/03/2013 09:59 AM, Uwe Wächter wrote:
> Hi list,
> I have a problem with my raid 5 array based on 3 harddisks. I need
> help to find the best way to recover my data.
> So what happens ....
> First of all 1 harddisk fails with the message :
> 
> A Fail event had been detected on md device /dev/md/0.
> 
> It could be related to component device /dev/sde1.
> 
> 
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sde1[4](F) sdc1[3] sdd1[0]
>       1953517568 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [U_U]
> 
> unused devices: <none>
> 
> 
> No problem at this time, the raid is still running with 2 harddisks.
> The reason for the error was the following message of sde1
> Jul 27 06:38:31 fed1 kernel: [987606.852610] md/raid:md0: Disk failure
> on sde1, disabling device.
> Jul 27 06:38:31 fed1 kernel: [987606.852610] md/raid:md0: Operation
> continuing on 2 devices.
> Jul 27 06:38:31 fed1 kernel: [987606.852613] sd 4:0:0:0: [sde]
> Jul 27 06:38:31 fed1 kernel: [987606.852615] Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> Jul 27 06:38:31 fed1 kernel: [987606.852617] sd 4:0:0:0: [sde] CDB:
> Jul 27 06:38:31 fed1 kernel: [987606.852618] Read(10): 28 00 36 d4 f0
> 3f 00 02 98 00
> Jul 27 06:38:31 fed1 kernel: [987606.852625] end_request: I/O error,
> dev sde, sector 919924799
> Jul 27 06:38:31 fed1 kernel: [987606.852629] md/raid:md0: read error
> not correctable (sector 919924736 on sde1).
> Jul 27 06:38:31 fed1 kernel: [987606.852638] md/raid:md0: read error
> not correctable (sector 919924744 on sde1).
> Jul 27 06:38:31 fed1 kernel: [987606.852641] md/raid:md0: read error
> not correctable (sector 919924752 on sde1).
> Jul 27 06:38:31 fed1 kernel: [987606.852643] md/raid:md0: read error
> not correctable (sector 919924760 on sde1).
> Jul 27 06:38:31 fed1 kernel: [987606.852645] md/raid:md0: read error
> not correctable (sector 919924768 on sde1).

It seems that your drive was kicked out of the array on a *read* error.
 Nowadays, that usually happens when a drive dies completely.  Your
drives haven't, which suggests you are using an old kernel.  Current
kernels tolerate some read errors in order to attempt to rewrite the
problem locations.

Please supply your platform details, kernel version, and mdadm version.

> A couple of hours later, just when I wanted to start backup the data
> partition the sdd also fails :-( with similar failures :
> 
> A Fail event had been detected on md device /dev/md/0.
> 
> It could be related to component device /dev/sdd1.

This sounds like an array that is lightly used and has never been scrubbed.

[trim /]

> So I have 2 device sdd and sde with unreadable sectors. sdc has no errors.
> smartd[690]: Device: /dev/sde [SAT], 14 Currently unreadable (pending) sectors
> smartd[690]: Device: /dev/sdd [SAT], 11 Currently unreadable (pending) sectors

Regular scrubbing would have prevented these from accumulating, as long
as your devices handle timeouts properly.  (You should search this
list's archives for combinations of "ERC" "error recovery control"
"scterc" and "device/timeout" for an education on this common problem.)

> What is the best way to rebuild the array with sdc and sdd only ?

A degraded raid5 cannot have any bad sectors, so you'd have to fix the
bad sectors on /dev/sdd.  This is typically done with dd_rescue onto a
spare drive.

If you avoided writing to the degraded array, this may not be your best
recovery choice.  Did you have *anything* writing into the array during
those couple hours?

> Can I correct unreadable sectors before rebuild the array or does it
> destroy the array.

It doesn't destroy the array if you *only* rewrite the problem sectors.
 But don't do this yet.  Providing more information will help us help
you better.

> Do I need to change the 2 harddisks ?

Not enough information.  Please supply the full output of "smartctl -x"
for all three drives.

> I hope you can help me restoring much data as possible.

I think we can help you save the vast majority of your data.

> best regards Uwe

Regards,

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-08-04 14:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-03 13:59 help needed restoring data Uwe Wächter
2013-08-04 14:54 ` Phil Turmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).