Raid recovery. Help wanted!

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Raid recovery. Help wanted!
@ 2013-04-21 16:16 Evgeny Koryanov
  2013-04-21 16:40 ` Mathias Burén
  0 siblings, 1 reply; 4+ messages in thread
From: Evgeny Koryanov @ 2013-04-21 16:16 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

Hello everybody!

Yesterday I met a problem with one of raid5 arrays build by mdadm on 
three (sd[bcd]) 1.5T devices.
I found array in degraded state with sdd fail. Drive becomes fail state 
after power jump.
Server supplied by UPS but this seems was not good enough - server was 
not rebooted but one drive as I said becomes fail state.
I simply reattaches it and array started rebuilding but fails after 
couple of %'s passed with sdc becomes fail!!!
I assemble array again with sd[bc] and tried to attach sdd again: 
picture repeated rebuild fails.
So I have sdb in sync state, sdc - failed and sdd spare. I checked 
SMARTs of drives to understand reason of such behavior and
found it clean on all devices. Than I tried to dd if=/dev/sd[bcd] 
of=/dev/null and found that dd also fails with IO error.
After dd bad blocks started appears in SMART :)
Finally I have:
sdb - sync
sdc - fail
sdd - spare
states and a number of bads on each hdd in random places...

Could any one suggest how can I assemble this array now in read-only 
mode to try to copy data?!
Theoretically data on sdd should not be rewritten and it still should be 
possible to try recover data (meaning that bads appears in quite 
different places)...
May be you know utility which helps recover data or the way how to start 
array in read-only mode preventing becomes it to degraded state
and force md device to try recover data using readable places from each 
devise???
Or any other ideas appreciated! Thanks, any way...

Best regards,
                 Evgeny.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Raid recovery. Help wanted!
  2013-04-21 16:16 Raid recovery. Help wanted! Evgeny Koryanov
@ 2013-04-21 16:40 ` Mathias Burén
  2013-04-21 17:27   ` Evgeny Koryanov
  0 siblings, 1 reply; 4+ messages in thread
From: Mathias Burén @ 2013-04-21 16:40 UTC (permalink / raw)
  To: Evgeny Koryanov; +Cc: linux-raid@vger.kernel.org

On 21 April 2013 17:16, Evgeny Koryanov <evgeny.koryanov@maris.no> wrote:
> Hello everybody!
>
> Yesterday I met a problem with one of raid5 arrays build by mdadm on three
> (sd[bcd]) 1.5T devices.
> I found array in degraded state with sdd fail. Drive becomes fail state
> after power jump.
> Server supplied by UPS but this seems was not good enough - server was not
> rebooted but one drive as I said becomes fail state.
> I simply reattaches it and array started rebuilding but fails after couple
> of %'s passed with sdc becomes fail!!!
> I assemble array again with sd[bc] and tried to attach sdd again: picture
> repeated rebuild fails.
> So I have sdb in sync state, sdc - failed and sdd spare. I checked SMARTs of
> drives to understand reason of such behavior and
> found it clean on all devices. Than I tried to dd if=/dev/sd[bcd]
> of=/dev/null and found that dd also fails with IO error.
> After dd bad blocks started appears in SMART :)
> Finally I have:
> sdb - sync
> sdc - fail
> sdd - spare
> states and a number of bads on each hdd in random places...
>
> Could any one suggest how can I assemble this array now in read-only mode to
> try to copy data?!
> Theoretically data on sdd should not be rewritten and it still should be
> possible to try recover data (meaning that bads appears in quite different
> places)...
> May be you know utility which helps recover data or the way how to start
> array in read-only mode preventing becomes it to degraded state
> and force md device to try recover data using readable places from each
> devise???
> Or any other ideas appreciated! Thanks, any way...
>
> Best regards,
>                 Evgeny.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi,

Could you post the smartctl -a output of all the drives? If 2 drives
are failling you might want to derescue them somewhere and assemble
the RAID from that.

Mathias

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Raid recovery. Help wanted!
  2013-04-21 16:40 ` Mathias Burén
@ 2013-04-21 17:27   ` Evgeny Koryanov
  2013-04-21 23:37     ` Sam Bingner
  0 siblings, 1 reply; 4+ messages in thread
From: Evgeny Koryanov @ 2013-04-21 17:27 UTC (permalink / raw)
  To: Mathias Burén; +Cc: linux-raid@vger.kernel.org


On 21.04.2013 20:40, Mathias Burén wrote:
> On 21 April 2013 17:16, Evgeny Koryanov <evgeny.koryanov@maris.no> wrote:
>> Hello everybody!
>>
>> Yesterday I met a problem with one of raid5 arrays build by mdadm on three
>> (sd[bcd]) 1.5T devices.
>> I found array in degraded state with sdd fail. Drive becomes fail state
>> after power jump.
>> Server supplied by UPS but this seems was not good enough - server was not
>> rebooted but one drive as I said becomes fail state.
>> I simply reattaches it and array started rebuilding but fails after couple
>> of %'s passed with sdc becomes fail!!!
>> I assemble array again with sd[bc] and tried to attach sdd again: picture
>> repeated rebuild fails.
>> So I have sdb in sync state, sdc - failed and sdd spare. I checked SMARTs of
>> drives to understand reason of such behavior and
>> found it clean on all devices. Than I tried to dd if=/dev/sd[bcd]
>> of=/dev/null and found that dd also fails with IO error.
>> After dd bad blocks started appears in SMART :)
>> Finally I have:
>> sdb - sync
>> sdc - fail
>> sdd - spare
>> states and a number of bads on each hdd in random places...
>>
>> Could any one suggest how can I assemble this array now in read-only mode to
>> try to copy data?!
>> Theoretically data on sdd should not be rewritten and it still should be
>> possible to try recover data (meaning that bads appears in quite different
>> places)...
>> May be you know utility which helps recover data or the way how to start
>> array in read-only mode preventing becomes it to degraded state
>> and force md device to try recover data using readable places from each
>> devise???
>> Or any other ideas appreciated! Thanks, any way...
>>
>> Best regards,
>>                  Evgeny.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Hi,
>
> Could you post the smartctl -a output of all the drives? If 2 drives
> are failling you might want to derescue them somewhere and assemble
> the RAID from that.
>
> Mathias
Hi, Mathis.

Will post it tomorrow - it's down now and I'm not around.
But as per your suggestion: still not clear for me is it good idea - as 
soon as I will copy valid data to another drive
information about bad's places will be lost (for md device driver) and 
bad blocks physically will be replaced by zeros.
And what will happen after assembling and trying to read places where 
bad blocks was (where array marked sync and blocks are
not consistent - redundant part zeroed) - will md read properly? will md 
mark drive fail as soon as find async (zeroed) block ans start resync... 
Actually I did not found in mdadm read-only assembling mode which will 
prevent such behavior!

Best regards,
                     Evgeny.

  * English - detected
  * English
  * Russian
  * Norwegian

  * English
  * Russian
  * Norwegian

<javascript:void(0);>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Raid recovery. Help wanted!
  2013-04-21 17:27   ` Evgeny Koryanov
@ 2013-04-21 23:37     ` Sam Bingner
  0 siblings, 0 replies; 4+ messages in thread
From: Sam Bingner @ 2013-04-21 23:37 UTC (permalink / raw)
  To: Evgeny Koryanov; +Cc: Mathias Burén, linux-raid@vger.kernel.org

On Apr 21, 2013, at 7:27 AM, "Evgeny Koryanov" <evgeny.koryanov@maris.no> wrote:

> 
> On 21.04.2013 20:40, Mathias Burén wrote:
>> On 21 April 2013 17:16, Evgeny Koryanov <evgeny.koryanov@maris.no> wrote:
>>> Hello everybody!
>>> 
>>> Yesterday I met a problem with one of raid5 arrays build by mdadm on three
>>> (sd[bcd]) 1.5T devices.
>>> I found array in degraded state with sdd fail. Drive becomes fail state
>>> after power jump.
>>> Server supplied by UPS but this seems was not good enough - server was not
>>> rebooted but one drive as I said becomes fail state.
>>> I simply reattaches it and array started rebuilding but fails after couple
>>> of %'s passed with sdc becomes fail!!!
>>> I assemble array again with sd[bc] and tried to attach sdd again: picture
>>> repeated rebuild fails.
>>> So I have sdb in sync state, sdc - failed and sdd spare. I checked SMARTs of
>>> drives to understand reason of such behavior and
>>> found it clean on all devices. Than I tried to dd if=/dev/sd[bcd]
>>> of=/dev/null and found that dd also fails with IO error.
>>> After dd bad blocks started appears in SMART :)
>>> Finally I have:
>>> sdb - sync
>>> sdc - fail
>>> sdd - spare
>>> states and a number of bads on each hdd in random places...
>>> 
>>> Could any one suggest how can I assemble this array now in read-only mode to
>>> try to copy data?!
>>> Theoretically data on sdd should not be rewritten and it still should be
>>> possible to try recover data (meaning that bads appears in quite different
>>> places)...
>>> May be you know utility which helps recover data or the way how to start
>>> array in read-only mode preventing becomes it to degraded state
>>> and force md device to try recover data using readable places from each
>>> devise???
>>> Or any other ideas appreciated! Thanks, any way...
>>> 
>>> Best regards,
>>>                 Evgeny.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Hi,
>> 
>> Could you post the smartctl -a output of all the drives? If 2 drives
>> are failling you might want to derescue them somewhere and assemble
>> the RAID from that.
>> 
>> Mathias
> Hi, Mathis.
> 
> Will post it tomorrow - it's down now and I'm not around.
> But as per your suggestion: still not clear for me is it good idea - as soon as I will copy valid data to another drive
> information about bad's places will be lost (for md device driver) and bad blocks physically will be replaced by zeros.
> And what will happen after assembling and trying to read places where bad blocks was (where array marked sync and blocks are
> not consistent - redundant part zeroed) - will md read properly? will md mark drive fail as soon as find async (zeroed) block ans start resync... Actually I did not found in mdadm read-only assembling mode which will prevent such behavior!
> 
> Best regards,
>                    Evgeny.
> 


You want to use "ddrescue" which includes a log of bad blocks... (Be sure to use a logfile when you run ddrescue for each drive) It can often get all data off of failed drives because it will retry the bad blocks, and if not it will hopefully only have a very small area.

When it is done, you could conceivably have some program to do a recovery including the data in the log file to rebuild any other missing data... But I don't think such a program is written yet.

Essentially your first step is to recover as much of the actual data to reliable disks as possible, then you will know what your situation really is...

Sam--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-04-21 23:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-21 16:16 Raid recovery. Help wanted! Evgeny Koryanov
2013-04-21 16:40 ` Mathias Burén
2013-04-21 17:27   ` Evgeny Koryanov
2013-04-21 23:37     ` Sam Bingner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox