Multiple drive failure after stupid mistake. Help needed

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Multiple drive failure after stupid mistake. Help needed
@ 2014-10-19  9:45 Per-Ola Stenborg
  2014-10-19 10:56 ` Mikael Abrahamsson
  0 siblings, 1 reply; 6+ messages in thread
From: Per-Ola Stenborg @ 2014-10-19  9:45 UTC (permalink / raw)
  To: linux-raid

Hi all,

I have done something very stupid. After getting SMART warnings from one 
of my disks in a 4-disk RAID5 array I decided to be proactive and change 
the disk.
The array consists of /dev/sd[bcde]. The failing disk is /dev/sdc.

I ran fail and remove on the WRONG disk!

mdadm --manage /dev/md0 --fail /dev/sdb

/proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb[0](F) sde[4] sdd[2] sdc[1]
       5860538880 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[4/3] [_UUU]

mdadm --manage /dev/md0 --remove /dev/sdb

I exchanged the physical disk, the failing/right one, /dev/sdc.
When booting my server I noticed my error when the array did not come up.
I thought it was not a problem as the original /dev/sdc was readable so I
shut the server down and put the original disk back and re-added /dev/sdb

/proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sde[4] sdd[2]
       5860538880 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[4/3] [_UUU]

mdadm --manage /dev/md0 --add /dev/sdb

All seemed fine and the array was rebuilding. But when almost done 
/dev/sdc failed.

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb[0] sdc[1](F) sde[4] sdd[2]
       5860538880 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[4/2] [__UU]
       [===================>.]  recovery = 95.3% (1862844416/1953512960) 
finish=49.5min speed=30502K/sec

A few hours late I got:

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb[0](S) sdc[1](F) sde[4] sdd[2]
       5860538880 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[4/2] [__UU]


After reboot I now have

/proc/mdstat
Personalities :
md0 : inactive sdd[2](S) sdb[0](S) sde[4](S) sdc[1](S)
       7814054240 blocks super 1.2

unused devices: <none>

/dev/sdb:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : e3394a2b:77411a7d:a6f03a01:19f9b943
            Name : backuppc:0  (local to host backuppc)
   Creation Time : Mon Dec 19 17:43:44 2011
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
      Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : ed574f2e:b80a509b:b8a5e5a6:3d711e05

     Update Time : Fri Oct 17 01:00:05 2014
        Checksum : 4fe90596 - correct
          Events : 5072

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : spare
    Array State : ..AA ('A' == active, '.' == missing)

/dev/sdc:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : e3394a2b:77411a7d:a6f03a01:19f9b943
            Name : backuppc:0  (local to host backuppc)
   Creation Time : Mon Dec 19 17:43:44 2011
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
      Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 4ebf1b3b:6821832c:1b520e0e:d363aa4d

     Update Time : Fri Oct 17 00:04:20 2014
        Checksum : 9d9f1587 - correct
          Events : 5064

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 1
    Array State : AAAA ('A' == active, '.' == missing)

/dev/sdd:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : e3394a2b:77411a7d:a6f03a01:19f9b943
            Name : backuppc:0  (local to host backuppc)
   Creation Time : Mon Dec 19 17:43:44 2011
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
      Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : ffe21a6e:3256c3d5:8cb68394:1172eb5d

     Update Time : Fri Oct 17 01:00:05 2014
        Checksum : 1092edcd - correct
          Events : 5072

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 2
    Array State : ..AA ('A' == active, '.' == missing)

/dev/sde:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : e3394a2b:77411a7d:a6f03a01:19f9b943
            Name : backuppc:0  (local to host backuppc)
   Creation Time : Mon Dec 19 17:43:44 2011
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
      Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 5ca79fb0:09f51c20:f5c8a851:310f5c2a

     Update Time : Fri Oct 17 01:00:05 2014
        Checksum : 2707008b - correct
          Events : 5072

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 3
    Array State : ..AA ('A' == active, '.' == missing)


The /dev/sdc disk is tested with spinrite, and is verified readable.
I've tried forcing an assembly without luck. Did I do it right? What 
should i do now?

*** PLEASE advice ***

And off cause I have valuable data on the array without backup...

Best regards

Per-Ola
---

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Multiple drive failure after stupid mistake. Help needed
  2014-10-19  9:45 Multiple drive failure after stupid mistake. Help needed Per-Ola Stenborg
@ 2014-10-19 10:56 ` Mikael Abrahamsson
  2014-10-19 12:58   ` Per-Ola Stenborg
  0 siblings, 1 reply; 6+ messages in thread
From: Mikael Abrahamsson @ 2014-10-19 10:56 UTC (permalink / raw)
  To: Per-Ola Stenborg; +Cc: linux-raid

On Sun, 19 Oct 2014, Per-Ola Stenborg wrote:

> *** PLEASE advice ***

Please post dmesg output from when you do --assemble --force, and also 
please post your mdadm and kernel versions.

As a first step, compile mdadm from source and use that version, it often 
helps as distributions don't generally ship with the latest mdadm.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Multiple drive failure after stupid mistake. Help needed
  2014-10-19 10:56 ` Mikael Abrahamsson
@ 2014-10-19 12:58   ` Per-Ola Stenborg
       [not found]     ` <CAFE24U0GKPhkYe1faCBWohimJYL4O_PBYOJ+aLa_mSMKQCGGhw@mail.gmail.com>
  2014-10-19 17:06     ` Mikael Abrahamsson
  0 siblings, 2 replies; 6+ messages in thread
From: Per-Ola Stenborg @ 2014-10-19 12:58 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid

Hi, thanks for your answer. (Tack!)

My debian mdadm is ver v3.1.4 - 31st August 2010
mdadm --assemble --force /dev/md0 /dev/sd[bcde]
outputs
mdadm: cannot open device /dev/sdb: Device or resource busy
mdadm: /dev/sdb has no superblock - assembly aborted

I compiled the latest mdadm v3.3.2 - 21st August 2014
running
mdadm --assemble --force /dev/md0 /dev/sd[bcde]
outputs
mdadm: /dev/sdb is busy - skipping
mdadm: /dev/sdc is busy - skipping
mdadm: /dev/sdd is busy - skipping
mdadm: /dev/sde is busy - skipping

Strange. What does this mean? Is it flaged as in use in the kernel? The 
devices are readable, I tried to read data with
dd if=/dev/sdb of=dump bs=1024 count=1024
and it works, so the device is accressible.

dmesg shows nothing

/proc/mdstat
Personalities :
md0 : inactive sdd[2](S) sdb[0](S) sde[4](S) sdc[1](S)
       7814054240 blocks super 1.2


uname -a
Linux backuppc 2.6.32-5-686 #1 SMP Sat Jul 12 22:59:16 UTC 2014 i686 
GNU/Linux
System is Debian squeeze-lts

Best regards
´
Per-Ola Stenborg

Mikael Abrahamsson skrev 2014-10-19 12:56:
> On Sun, 19 Oct 2014, Per-Ola Stenborg wrote:
>
>> *** PLEASE advice ***
>
> Please post dmesg output from when you do --assemble --force, and also 
> please post your mdadm and kernel versions.
>
> As a first step, compile mdadm from source and use that version, it 
> often helps as distributions don't generally ship with the latest mdadm.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Multiple drive failure after stupid mistake. Help needed
       [not found]     ` <CAFE24U0GKPhkYe1faCBWohimJYL4O_PBYOJ+aLa_mSMKQCGGhw@mail.gmail.com>
@ 2014-10-19 15:56       ` Per-Ola Stenborg
  0 siblings, 0 replies; 6+ messages in thread
From: Per-Ola Stenborg @ 2014-10-19 15:56 UTC (permalink / raw)
  To: Weedy; +Cc: linux-raid, Mikael Abrahamsson

Weedy skrev 2014-10-19 15:41:
>
> Is sdb in another array when you try to assemble?
> Is /proc/mdstat empty while you're trying this?
>
No, here is the output

cat /proc/mdstat
Personalities :
md0 : inactive sdd[2](S) sdb[0](S) sde[4](S) sdc[1](S)
       7814054240 blocks super 1.2

unused devices: <none>

there is only one array in this machine.

/Per-Ola


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Multiple drive failure after stupid mistake. Help needed
  2014-10-19 12:58   ` Per-Ola Stenborg
       [not found]     ` <CAFE24U0GKPhkYe1faCBWohimJYL4O_PBYOJ+aLa_mSMKQCGGhw@mail.gmail.com>
@ 2014-10-19 17:06     ` Mikael Abrahamsson
  2014-10-19 19:00       ` Per-Ola Stenborg
  1 sibling, 1 reply; 6+ messages in thread
From: Mikael Abrahamsson @ 2014-10-19 17:06 UTC (permalink / raw)
  To: Per-Ola Stenborg; +Cc: linux-raid

On Sun, 19 Oct 2014, Per-Ola Stenborg wrote:

> I compiled the latest mdadm v3.3.2 - 21st August 2014
> running
> mdadm --assemble --force /dev/md0 /dev/sd[bcde]
> outputs
> mdadm: /dev/sdb is busy - skipping
> mdadm: /dev/sdc is busy - skipping
> mdadm: /dev/sdd is busy - skipping
> mdadm: /dev/sde is busy - skipping
>
> Strange. What does this mean? Is it flaged as in use in the kernel? The

Stop the array before you try to start it again.

Also consider upgrading to a newer kernel if there is a backports one 
(there should be for debian squeeze), a lot of things have happened since 
2.6.32.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Multiple drive failure after stupid mistake. Help needed
  2014-10-19 17:06     ` Mikael Abrahamsson
@ 2014-10-19 19:00       ` Per-Ola Stenborg
  0 siblings, 0 replies; 6+ messages in thread
From: Per-Ola Stenborg @ 2014-10-19 19:00 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid

Yes! It works! Thanks so much. Lets hope the sdc drive survives the 
rebuild this time.
At least I got the opportunity to backup the 24GB important (not backed 
up data).

The Linux raid usually  works so well that you never have to worry. And 
when things go wrong
you therefor never have the experience to help fix the problem.

Thanks again!

Best regards

Per-Ola Stenborg


Mikael Abrahamsson skrev 2014-10-19 19:06:
> On Sun, 19 Oct 2014, Per-Ola Stenborg wrote:
>
>> I compiled the latest mdadm v3.3.2 - 21st August 2014
>> running
>> mdadm --assemble --force /dev/md0 /dev/sd[bcde]
>> outputs
>> mdadm: /dev/sdb is busy - skipping
>> mdadm: /dev/sdc is busy - skipping
>> mdadm: /dev/sdd is busy - skipping
>> mdadm: /dev/sde is busy - skipping
>>
>> Strange. What does this mean? Is it flaged as in use in the kernel? The
>
> Stop the array before you try to start it again.
>
> Also consider upgrading to a newer kernel if there is a backports one 
> (there should be for debian squeeze), a lot of things have happened 
> since 2.6.32.
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-10-19 19:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-19  9:45 Multiple drive failure after stupid mistake. Help needed Per-Ola Stenborg
2014-10-19 10:56 ` Mikael Abrahamsson
2014-10-19 12:58   ` Per-Ola Stenborg
     [not found]     ` <CAFE24U0GKPhkYe1faCBWohimJYL4O_PBYOJ+aLa_mSMKQCGGhw@mail.gmail.com>
2014-10-19 15:56       ` Per-Ola Stenborg
2014-10-19 17:06     ` Mikael Abrahamsson
2014-10-19 19:00       ` Per-Ola Stenborg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).