2nd Faulty drive while rebuilding array on RAID5

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 2nd Faulty drive while rebuilding array on RAID5
@ 2015-10-24 22:27 Guillaume ALLEE
  2015-10-25 23:38 ` Phil Turmel
  0 siblings, 1 reply; 4+ messages in thread
From: Guillaume ALLEE @ 2015-10-24 22:27 UTC (permalink / raw)
  To: linux-raid

Hi all,


Context:
On my RAID5 sdc was faulty. I bought a new HD, format it and add it to
my raid array. However during the rebuilding sda was detected as
faulty. Now I am not sure what to do...

I followed the steps from the wiki. See at the end my question ;-)

Thanks for your help.

$ uname -a
Linux htpc 3.8.0-35-generic #52~precise1-Ubuntu SMP Thu Jan 30
17:27:28 UTC 2014 i686 i686 i386 GNU/Linux

$ mdadm --version
mdadm - v3.2.5 - 18th May 2012

$ sudo mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Sun Jan  2 08:47:01 2011
     Raid Level : raid5
     Array Size : 5842101312 (5571.46 GiB 5982.31 GB)
  Used Dev Size : 1947367104 (1857.15 GiB 1994.10 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sat Oct 24 23:36:30 2015
          State : clean, FAILED
 Active Devices : 2
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 426d71a2:5b25a168:a4e2eff2:d305f1c1
         Events : 0.20213

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       19        1      active sync   /dev/sdb3
       2       0        0        2      removed
       3       8       51        3      active sync   /dev/sdd3

       4       8       35        -      spare   /dev/sdc3
       5       8        3        -      faulty spare   /dev/sda3



$ mdadm --examine /dev/sd[abcdefghijklmn]3 >> raid.status
http://pastebin.com/qaP8bvna

$ sudo mdadm --examine /dev/sd[a-z] | egrep 'Event|/dev/sd'
/dev/sda:
/dev/sdb:
/dev/sdc:
/dev/sdd:
/dev/sde:

I seems that some blocks of sda are not readable.Cf from dmesg
"[17852.126321] end_request: I/O error, dev sda, sector 1958539264"
Full dmesg available at:
http://pastebin.com/bBfcYjkg

Is there some way to re-add this disk (sda) in the array without mdadm
thinging it is a new one ?

I have seen from the wiki that I could try to recreate the array with
--assume-clean but I want to do that only in last resort.

Thanks,

Guillaume

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2nd Faulty drive while rebuilding array on RAID5
  2015-10-24 22:27 2nd Faulty drive while rebuilding array on RAID5 Guillaume ALLEE
@ 2015-10-25 23:38 ` Phil Turmel
  2015-10-28 20:35   ` Guillaume ALLEE
  0 siblings, 1 reply; 4+ messages in thread
From: Phil Turmel @ 2015-10-25 23:38 UTC (permalink / raw)
  To: Guillaume ALLEE, linux-raid

Hi Guillaume,

On 10/24/2015 06:27 PM, Guillaume ALLEE wrote:
> Hi all,
> 
> Context:
> On my RAID5 sdc was faulty. I bought a new HD, format it and add it to
> my raid array. However during the rebuilding sda was detected as
> faulty. Now I am not sure what to do...

Unfortunately, you are suffering from classic timeout mismatch.  I've
put some links in the postscript for you to read.  Most likely, your
original sdc wasn't really bad.

[trim /]

> $ mdadm --examine /dev/sd[abcdefghijklmn]3 >> raid.status
> http://pastebin.com/qaP8bvna

Something's missing.  I see only sd[abcd]3.  Where's the report for
/dev/sde* ?

[trim /]

> Full dmesg available at:
> http://pastebin.com/bBfcYjkg

Yes, you have WD20EARS and ST3000DM001 drive models.  These are not safe
to use in raid arrays due to lack of error recovery control.

> Is there some way to re-add this disk (sda) in the array without mdadm
> thinging it is a new one ?

You need to stop the array and perform an '--assemble --force' with the
last four devices (exclude /dev/sdc).  The ones that have "Raid Device"
0, 1, 2, & 3.

If that fails, show mdadm's responses in your next reply.  If it works,
your array will be available to mount, but degraded.  You will not be
able to add your new sdc to the array while there are unresolved UREs,
so you will need to backup your important data from the degraded array.
 UREs can only be fixed by writing over them -- normally done
automatically by MD with proper drives.  You will have to overwrite
those spots with zeroes or use dd_rescue to move the data to fresh
drives (with zeroes in place of the unreadable spots).

> I have seen from the wiki that I could try to recreate the array with
> --assume-clean but I want to do that only in last resort.

Do *NOT* recreate the array.  (Unless you're starting over after backing
up the files from the degraded array.)

Phil

[1] http://marc.info/?l=linux-raid&m=139050322510249&w=2
[2] http://marc.info/?l=linux-raid&m=135863964624202&w=2
[3] http://marc.info/?l=linux-raid&m=135811522817345&w=1
[4] http://marc.info/?l=linux-raid&m=133761065622164&w=2
[5] http://marc.info/?l=linux-raid&m=132477199207506
[6] http://marc.info/?l=linux-raid&m=133665797115876&w=2
[7] http://marc.info/?l=linux-raid&m=142487508806844&w=3
[8] http://marc.info/?l=linux-raid&m=144535576302583&w=2

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2nd Faulty drive while rebuilding array on RAID5
  2015-10-25 23:38 ` Phil Turmel
@ 2015-10-28 20:35   ` Guillaume ALLEE
  2015-10-29 12:21     ` Phil Turmel
  0 siblings, 1 reply; 4+ messages in thread
From: Guillaume ALLEE @ 2015-10-28 20:35 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

Hi Phil,


Thanks for your answer. I have tested my sda3 with badblocks and it
seems really bad like 6 digits unreadable blocks...

> Something's missing.  I see only sd[abcd]3.  Where's the report for
> /dev/sde* ?
Yep because /dev/sde is not part of my /dec/md1 array.

"ARRAY /dev/md1 level=raid5 num-devices=4 metadata=00.90
UUID=426d71a2:5b25a168:a4e2eff2:d305f1c1
   devices=/dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3"
>
> Yes, you have WD20EARS and ST3000DM001 drive models.  These are not safe
> to use in raid arrays due to lack of error recovery control.
Okay I will look at that for my next HDs.


> You need to stop the array and perform an '--assemble --force' with the
> last four devices (exclude /dev/sdc).  The ones that have "Raid Device"
> 0, 1, 2, & 3.
Here is the assemble output.

#mdadm --assemble --force /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdd3
mdadm: no recogniseable superblock on /dev/sda3
mdadm: /dev/sda3 has no superblock - assembly aborted
>
> If that fails, show mdadm's responses in your next reply.

> Phil

Thanks I will read  those links

2015-10-26 0:38 GMT+01:00 Phil Turmel <philip@turmel.org>:
> Hi Guillaume,
>
> On 10/24/2015 06:27 PM, Guillaume ALLEE wrote:
>> Hi all,
>>
>> Context:
>> On my RAID5 sdc was faulty. I bought a new HD, format it and add it to
>> my raid array. However during the rebuilding sda was detected as
>> faulty. Now I am not sure what to do...
>
> Unfortunately, you are suffering from classic timeout mismatch.  I've
> put some links in the postscript for you to read.  Most likely, your
> original sdc wasn't really bad.
>
> [trim /]
>
>> $ mdadm --examine /dev/sd[abcdefghijklmn]3 >> raid.status
>> http://pastebin.com/qaP8bvna
>
> Something's missing.  I see only sd[abcd]3.  Where's the report for
> /dev/sde* ?
>
> [trim /]
>
>> Full dmesg available at:
>> http://pastebin.com/bBfcYjkg
>
> Yes, you have WD20EARS and ST3000DM001 drive models.  These are not safe
> to use in raid arrays due to lack of error recovery control.
>
>> Is there some way to re-add this disk (sda) in the array without mdadm
>> thinging it is a new one ?
>
> You need to stop the array and perform an '--assemble --force' with the
> last four devices (exclude /dev/sdc).  The ones that have "Raid Device"
> 0, 1, 2, & 3.
>
> If that fails, show mdadm's responses in your next reply.  If it works,
> your array will be available to mount, but degraded.  You will not be
> able to add your new sdc to the array while there are unresolved UREs,
> so you will need to backup your important data from the degraded array.
>  UREs can only be fixed by writing over them -- normally done
> automatically by MD with proper drives.  You will have to overwrite
> those spots with zeroes or use dd_rescue to move the data to fresh
> drives (with zeroes in place of the unreadable spots).
>
>> I have seen from the wiki that I could try to recreate the array with
>> --assume-clean but I want to do that only in last resort.
>
> Do *NOT* recreate the array.  (Unless you're starting over after backing
> up the files from the degraded array.)
>
> Phil
>
> [1] http://marc.info/?l=linux-raid&m=139050322510249&w=2
> [2] http://marc.info/?l=linux-raid&m=135863964624202&w=2
> [3] http://marc.info/?l=linux-raid&m=135811522817345&w=1
> [4] http://marc.info/?l=linux-raid&m=133761065622164&w=2
> [5] http://marc.info/?l=linux-raid&m=132477199207506
> [6] http://marc.info/?l=linux-raid&m=133665797115876&w=2
> [7] http://marc.info/?l=linux-raid&m=142487508806844&w=3
> [8] http://marc.info/?l=linux-raid&m=144535576302583&w=2
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2nd Faulty drive while rebuilding array on RAID5
  2015-10-28 20:35   ` Guillaume ALLEE
@ 2015-10-29 12:21     ` Phil Turmel
  0 siblings, 0 replies; 4+ messages in thread
From: Phil Turmel @ 2015-10-29 12:21 UTC (permalink / raw)
  To: Guillaume ALLEE; +Cc: linux-raid

Good morning Guillaume,

On 10/28/2015 04:35 PM, Guillaume ALLEE wrote:

> Thanks for your answer. I have tested my sda3 with badblocks and it
> seems really bad like 6 digits unreadable blocks...

Hmm.  Is there any chance your cabling or power supply is at fault?

>> Something's missing.  I see only sd[abcd]3.  Where's the report for
>> /dev/sde* ?
> Yep because /dev/sde is not part of my /dec/md1 array.

Ok.

> "ARRAY /dev/md1 level=raid5 num-devices=4 metadata=00.90
> UUID=426d71a2:5b25a168:a4e2eff2:d305f1c1
>    devices=/dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3"

Just for future reference:  you aren't expected to keep the output of
"mdadm -Es" as your permanent mdadm.conf.  Your system will be much more
robust if you trim that to just the MD device name and the UUID.

>> Yes, you have WD20EARS and ST3000DM001 drive models.  These are not safe
>> to use in raid arrays due to lack of error recovery control.
> Okay I will look at that for my next HDs.

Yes, it should guide your future purchasing, but you have a crisis on
your hands -- you *must* set your driver timeouts to deal with your
consumer-grade desktop drives or your array *will* crash again *soon*.

>> You need to stop the array and perform an '--assemble --force' with the
>> last four devices (exclude /dev/sdc).  The ones that have "Raid Device"
>> 0, 1, 2, & 3.
> Here is the assemble output.
> 
> #mdadm --assemble --force /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdd3
> mdadm: no recogniseable superblock on /dev/sda3
> mdadm: /dev/sda3 has no superblock - assembly aborted

Hmmm.  You need three readable drives with original data to get out of
this trap.  What happened to the original /dev/sdc?  Using it with a
carefully constructed --create --assume-clean may be your only chance to
recover anything.

Also, before you risk wiping anything else, generate mdadm -E reports
for each device along with a "smartctl -i -A -l scterc" report.  That
ties the array details (especially role) for each device to the drive
serial number.  As you plug and unplug devices, names can change.

Just post that information inline in your next reply -- no need to use
pastebin.

Phil

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-10-29 12:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-24 22:27 2nd Faulty drive while rebuilding array on RAID5 Guillaume ALLEE
2015-10-25 23:38 ` Phil Turmel
2015-10-28 20:35   ` Guillaume ALLEE
2015-10-29 12:21     ` Phil Turmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).