* Multiple drive failure after stupid mistake. Help needed
@ 2014-10-19 9:45 Per-Ola Stenborg
2014-10-19 10:56 ` Mikael Abrahamsson
0 siblings, 1 reply; 6+ messages in thread
From: Per-Ola Stenborg @ 2014-10-19 9:45 UTC (permalink / raw)
To: linux-raid
Hi all,
I have done something very stupid. After getting SMART warnings from one
of my disks in a 4-disk RAID5 array I decided to be proactive and change
the disk.
The array consists of /dev/sd[bcde]. The failing disk is /dev/sdc.
I ran fail and remove on the WRONG disk!
mdadm --manage /dev/md0 --fail /dev/sdb
/proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb[0](F) sde[4] sdd[2] sdc[1]
5860538880 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/3] [_UUU]
mdadm --manage /dev/md0 --remove /dev/sdb
I exchanged the physical disk, the failing/right one, /dev/sdc.
When booting my server I noticed my error when the array did not come up.
I thought it was not a problem as the original /dev/sdc was readable so I
shut the server down and put the original disk back and re-added /dev/sdb
/proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sde[4] sdd[2]
5860538880 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/3] [_UUU]
mdadm --manage /dev/md0 --add /dev/sdb
All seemed fine and the array was rebuilding. But when almost done
/dev/sdc failed.
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb[0] sdc[1](F) sde[4] sdd[2]
5860538880 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/2] [__UU]
[===================>.] recovery = 95.3% (1862844416/1953512960)
finish=49.5min speed=30502K/sec
A few hours late I got:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb[0](S) sdc[1](F) sde[4] sdd[2]
5860538880 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/2] [__UU]
After reboot I now have
/proc/mdstat
Personalities :
md0 : inactive sdd[2](S) sdb[0](S) sde[4](S) sdc[1](S)
7814054240 blocks super 1.2
unused devices: <none>
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : e3394a2b:77411a7d:a6f03a01:19f9b943
Name : backuppc:0 (local to host backuppc)
Creation Time : Mon Dec 19 17:43:44 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : ed574f2e:b80a509b:b8a5e5a6:3d711e05
Update Time : Fri Oct 17 01:00:05 2014
Checksum : 4fe90596 - correct
Events : 5072
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : ..AA ('A' == active, '.' == missing)
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : e3394a2b:77411a7d:a6f03a01:19f9b943
Name : backuppc:0 (local to host backuppc)
Creation Time : Mon Dec 19 17:43:44 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 4ebf1b3b:6821832c:1b520e0e:d363aa4d
Update Time : Fri Oct 17 00:04:20 2014
Checksum : 9d9f1587 - correct
Events : 5064
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAA ('A' == active, '.' == missing)
/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : e3394a2b:77411a7d:a6f03a01:19f9b943
Name : backuppc:0 (local to host backuppc)
Creation Time : Mon Dec 19 17:43:44 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : ffe21a6e:3256c3d5:8cb68394:1172eb5d
Update Time : Fri Oct 17 01:00:05 2014
Checksum : 1092edcd - correct
Events : 5072
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : ..AA ('A' == active, '.' == missing)
/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : e3394a2b:77411a7d:a6f03a01:19f9b943
Name : backuppc:0 (local to host backuppc)
Creation Time : Mon Dec 19 17:43:44 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 5ca79fb0:09f51c20:f5c8a851:310f5c2a
Update Time : Fri Oct 17 01:00:05 2014
Checksum : 2707008b - correct
Events : 5072
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : ..AA ('A' == active, '.' == missing)
The /dev/sdc disk is tested with spinrite, and is verified readable.
I've tried forcing an assembly without luck. Did I do it right? What
should i do now?
*** PLEASE advice ***
And off cause I have valuable data on the array without backup...
Best regards
Per-Ola
---
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Multiple drive failure after stupid mistake. Help needed
2014-10-19 9:45 Multiple drive failure after stupid mistake. Help needed Per-Ola Stenborg
@ 2014-10-19 10:56 ` Mikael Abrahamsson
2014-10-19 12:58 ` Per-Ola Stenborg
0 siblings, 1 reply; 6+ messages in thread
From: Mikael Abrahamsson @ 2014-10-19 10:56 UTC (permalink / raw)
To: Per-Ola Stenborg; +Cc: linux-raid
On Sun, 19 Oct 2014, Per-Ola Stenborg wrote:
> *** PLEASE advice ***
Please post dmesg output from when you do --assemble --force, and also
please post your mdadm and kernel versions.
As a first step, compile mdadm from source and use that version, it often
helps as distributions don't generally ship with the latest mdadm.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Multiple drive failure after stupid mistake. Help needed
2014-10-19 10:56 ` Mikael Abrahamsson
@ 2014-10-19 12:58 ` Per-Ola Stenborg
[not found] ` <CAFE24U0GKPhkYe1faCBWohimJYL4O_PBYOJ+aLa_mSMKQCGGhw@mail.gmail.com>
2014-10-19 17:06 ` Mikael Abrahamsson
0 siblings, 2 replies; 6+ messages in thread
From: Per-Ola Stenborg @ 2014-10-19 12:58 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: linux-raid
Hi, thanks for your answer. (Tack!)
My debian mdadm is ver v3.1.4 - 31st August 2010
mdadm --assemble --force /dev/md0 /dev/sd[bcde]
outputs
mdadm: cannot open device /dev/sdb: Device or resource busy
mdadm: /dev/sdb has no superblock - assembly aborted
I compiled the latest mdadm v3.3.2 - 21st August 2014
running
mdadm --assemble --force /dev/md0 /dev/sd[bcde]
outputs
mdadm: /dev/sdb is busy - skipping
mdadm: /dev/sdc is busy - skipping
mdadm: /dev/sdd is busy - skipping
mdadm: /dev/sde is busy - skipping
Strange. What does this mean? Is it flaged as in use in the kernel? The
devices are readable, I tried to read data with
dd if=/dev/sdb of=dump bs=1024 count=1024
and it works, so the device is accressible.
dmesg shows nothing
/proc/mdstat
Personalities :
md0 : inactive sdd[2](S) sdb[0](S) sde[4](S) sdc[1](S)
7814054240 blocks super 1.2
uname -a
Linux backuppc 2.6.32-5-686 #1 SMP Sat Jul 12 22:59:16 UTC 2014 i686
GNU/Linux
System is Debian squeeze-lts
Best regards
´
Per-Ola Stenborg
Mikael Abrahamsson skrev 2014-10-19 12:56:
> On Sun, 19 Oct 2014, Per-Ola Stenborg wrote:
>
>> *** PLEASE advice ***
>
> Please post dmesg output from when you do --assemble --force, and also
> please post your mdadm and kernel versions.
>
> As a first step, compile mdadm from source and use that version, it
> often helps as distributions don't generally ship with the latest mdadm.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Multiple drive failure after stupid mistake. Help needed
[not found] ` <CAFE24U0GKPhkYe1faCBWohimJYL4O_PBYOJ+aLa_mSMKQCGGhw@mail.gmail.com>
@ 2014-10-19 15:56 ` Per-Ola Stenborg
0 siblings, 0 replies; 6+ messages in thread
From: Per-Ola Stenborg @ 2014-10-19 15:56 UTC (permalink / raw)
To: Weedy; +Cc: linux-raid, Mikael Abrahamsson
Weedy skrev 2014-10-19 15:41:
>
> Is sdb in another array when you try to assemble?
> Is /proc/mdstat empty while you're trying this?
>
No, here is the output
cat /proc/mdstat
Personalities :
md0 : inactive sdd[2](S) sdb[0](S) sde[4](S) sdc[1](S)
7814054240 blocks super 1.2
unused devices: <none>
there is only one array in this machine.
/Per-Ola
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Multiple drive failure after stupid mistake. Help needed
2014-10-19 12:58 ` Per-Ola Stenborg
[not found] ` <CAFE24U0GKPhkYe1faCBWohimJYL4O_PBYOJ+aLa_mSMKQCGGhw@mail.gmail.com>
@ 2014-10-19 17:06 ` Mikael Abrahamsson
2014-10-19 19:00 ` Per-Ola Stenborg
1 sibling, 1 reply; 6+ messages in thread
From: Mikael Abrahamsson @ 2014-10-19 17:06 UTC (permalink / raw)
To: Per-Ola Stenborg; +Cc: linux-raid
On Sun, 19 Oct 2014, Per-Ola Stenborg wrote:
> I compiled the latest mdadm v3.3.2 - 21st August 2014
> running
> mdadm --assemble --force /dev/md0 /dev/sd[bcde]
> outputs
> mdadm: /dev/sdb is busy - skipping
> mdadm: /dev/sdc is busy - skipping
> mdadm: /dev/sdd is busy - skipping
> mdadm: /dev/sde is busy - skipping
>
> Strange. What does this mean? Is it flaged as in use in the kernel? The
Stop the array before you try to start it again.
Also consider upgrading to a newer kernel if there is a backports one
(there should be for debian squeeze), a lot of things have happened since
2.6.32.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Multiple drive failure after stupid mistake. Help needed
2014-10-19 17:06 ` Mikael Abrahamsson
@ 2014-10-19 19:00 ` Per-Ola Stenborg
0 siblings, 0 replies; 6+ messages in thread
From: Per-Ola Stenborg @ 2014-10-19 19:00 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: linux-raid
Yes! It works! Thanks so much. Lets hope the sdc drive survives the
rebuild this time.
At least I got the opportunity to backup the 24GB important (not backed
up data).
The Linux raid usually works so well that you never have to worry. And
when things go wrong
you therefor never have the experience to help fix the problem.
Thanks again!
Best regards
Per-Ola Stenborg
Mikael Abrahamsson skrev 2014-10-19 19:06:
> On Sun, 19 Oct 2014, Per-Ola Stenborg wrote:
>
>> I compiled the latest mdadm v3.3.2 - 21st August 2014
>> running
>> mdadm --assemble --force /dev/md0 /dev/sd[bcde]
>> outputs
>> mdadm: /dev/sdb is busy - skipping
>> mdadm: /dev/sdc is busy - skipping
>> mdadm: /dev/sdd is busy - skipping
>> mdadm: /dev/sde is busy - skipping
>>
>> Strange. What does this mean? Is it flaged as in use in the kernel? The
>
> Stop the array before you try to start it again.
>
> Also consider upgrading to a newer kernel if there is a backports one
> (there should be for debian squeeze), a lot of things have happened
> since 2.6.32.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-10-19 19:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-19 9:45 Multiple drive failure after stupid mistake. Help needed Per-Ola Stenborg
2014-10-19 10:56 ` Mikael Abrahamsson
2014-10-19 12:58 ` Per-Ola Stenborg
[not found] ` <CAFE24U0GKPhkYe1faCBWohimJYL4O_PBYOJ+aLa_mSMKQCGGhw@mail.gmail.com>
2014-10-19 15:56 ` Per-Ola Stenborg
2014-10-19 17:06 ` Mikael Abrahamsson
2014-10-19 19:00 ` Per-Ola Stenborg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).