* disk failed, operator error: Now can't use RAID
@ 2005-07-13 21:57 Hank Barta
2005-07-14 0:29 ` Hank Barta
0 siblings, 1 reply; 4+ messages in thread
From: Hank Barta @ 2005-07-13 21:57 UTC (permalink / raw)
To: linux-raid
I experienced a disk failure on a raid5 array that had one 6 disks
including one spare. For reasons I couldn't determine, the spare was
not used automatically. I added the spare in using:
mdadm -add /dev/md0 /dev/sda1
And the raid started rebuilding using the spare drive.
Not satisfied ( ;) ) I tried to remove the failed drive (/dev/hdg1)
using the command
mdadm /dev/md0 -r /dev/sdg1
Then I realized that I had meant to type /dev/hdg1 and repeated the
command accordingly. My raid originally consisted of /dev/sd[a|b|c|d]1
and /dev/hd[e|g]1 and /dev/hde1 was the spare disk. Looking at the
status now, it appeared that there was a problem with /dev/sda1. Still
not satisfied, I decided it would be a good idea to reboot the system
and when I did, the raid did not come up.
I've fiddled some more and still not gotten the raid to work. I have
added /dev/sda1 back, but the device information in the other drives
does not seem to reflect it. I have run cfdisk on all devices to
verify that the system sees them and that seems to be the case.
Examining drive /dev/sda1 I get:
oak:~# mdadm -Q --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : a7cc80af:206de849:dd30336a:6ea23e69
Creation Time : Sun Dec 26 21:51:39 2004
Raid Level : raid5
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Update Time : Sat Jul 9 11:27:33 2005
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 1
Spare Devices : 1
Checksum : 4da3ec1f - correct
Events : 0.1271893
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/.static/dev/sda1
0 0 8 1 0 active sync /dev/.static/dev/sda1
1 1 8 17 1 active sync /dev/.static/dev/sdb1
2 2 8 33 2 active sync /dev/.static/dev/sdc1
3 3 8 49 3 active sync /dev/.static/dev/sdd1
4 4 0 0 4 faulty removed
5 5 33 1 5 spare /dev/.static/dev/hde1
oak:~#
and examining /dev/sdb1 I see:
oak:~# mdadm -Q --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : a7cc80af:206de849:dd30336a:6ea23e69
Creation Time : Sun Dec 26 21:51:39 2004
Raid Level : raid5
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Update Time : Sat Jul 9 12:22:25 2005
State : clean
Active Devices : 3
Working Devices : 4
Failed Devices : 2
Spare Devices : 1
Checksum : 4dd319d4 - correct
Events : 0.2816178
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/.static/dev/sdb1
0 0 0 0 0 removed
1 1 8 17 1 active sync /dev/.static/dev/sdb1
2 2 8 33 2 active sync /dev/.static/dev/sdc1
3 3 8 49 3 active sync /dev/.static/dev/sdd1
4 4 0 0 4 faulty removed
5 5 33 1 4 spare /dev/.static/dev/hde1
oak:~#
So it seems like /dev/sdb1 (and the other raid devices) does not list /dev/sda1.
Other "interesting files are:
oak:~# cat /etc/mdadm/mdadm.conf
DEVICE /dev/hd*[0-9] /dev/sd*[0-9]
ARRAY /dev/md0 level=raid5 num-devices=5
UUID=a7cc80af:206de849:dd30336a:6ea23e69
devices=/dev/hde1,/dev/sdd1,/dev/sdc1,/dev/sdb1,/dev/sda1
oak:~# cat /proc/mdstat
Personalities : [raid5]
md0 : inactive sda1[0] sdb1[1] hde1[5] sdd1[3] sdc1[2]
976791680 blocks
unused devices: <none>
oak:~#
If I try to run the raid, I get:
oak:/var/log# mdadm -R /dev/md0
mdadm: failed to run array /dev/md0: Invalid argument
oak:/var/log#
In the log I filJul 13 16:48:54 localhost kernel: raid5: device sdb1
operational as raid disk 1
Jul 13 16:48:54 localhost kernel: raid5: device sdd1 operational as raid disk 3
Jul 13 16:48:54 localhost kernel: raid5: device sdc1 operational as raid disk 2
Jul 13 16:48:54 localhost kernel: RAID5 conf printout:
Jul 13 16:48:54 localhost kernel: --- rd:5 wd:3 fd:2
Jul 13 16:48:54 localhost kernel: disk 0, o:1, dev:sda1
Jul 13 16:48:54 localhost kernel: disk 1, o:1, dev:sdb1
Jul 13 16:48:54 localhost kernel: disk 2, o:1, dev:sdc1
Jul 13 16:48:54 localhost kernel: disk 3, o:1, dev:sdd1
Elsewhere in the log I find:
Jul 13 13:30:16 localhost kernel: disk 2, o:1, dev:sdc1
Jul 13 13:30:16 localhost kernel: disk 3, o:1, dev:sdd1
Jul 13 13:34:03 localhost kernel: md: error, md_import_device() returned -16
Jul 13 13:35:00 localhost kernel: md: error, md_import_device() returned -16
Jul 13 13:36:21 localhost kernel: raid5: device sdb1 operational as raid disk 1
Jul 13 13:36:21 localhost kernel: raid5: device sdd1 operational as raid disk 3
Jul 13 13:36:21 localhost kernel: raid5: device sdc1 operational as raid disk 2
I would very much appreciate suggestions on how to get the raid running again.
I have a replacement drive, but don't want to put it in until I get
this issue resolved.
I'm running Debian testing (386) with kernel 2.6.8-1-386 and mdadm
tools 1.9.0-4.1.
thanks,
hank
--
Beautiful Sunny Winfield, Illinois
^ permalink raw reply [flat|nested] 4+ messages in thread
* disk failed, operator error: Now can't use RAID
2005-07-13 21:57 disk failed, operator error: Now can't use RAID Hank Barta
@ 2005-07-14 0:29 ` Hank Barta
2005-07-14 7:03 ` Neil Brown
0 siblings, 1 reply; 4+ messages in thread
From: Hank Barta @ 2005-07-14 0:29 UTC (permalink / raw)
To: linux-raid
I experienced a disk failure on a raid5 array that had one 6 disks
including one spare. For reasons I couldn't determine, the spare was
not used automatically. I added the spare in using:
mdadm -add /dev/md0 /dev/sda1
And the raid started rebuilding using the spare drive.
Not satisfied ( ;) ) I tried to remove the failed drive (/dev/hdg1)
using the command
mdadm /dev/md0 -r /dev/sdg1
Then I realized that I had meant to type /dev/hdg1 and repeated the
command accordingly. My raid originally consisted of /dev/sd[a|b|c|d]1
and /dev/hd[e|g]1 and /dev/hde1 was the spare disk. Looking at the
status now, it appeared that there was a problem with /dev/sda1. Still
not satisfied, I decided it would be a good idea to reboot the system
and when I did, the raid did not come up.
I've fiddled some more and still not gotten the raid to work. I have
added /dev/sda1 back, but the device information in the other drives
does not seem to reflect it. I have run cfdisk on all devices to
verify that the system sees them and that seems to be the case.
Examining drive /dev/sda1 I get:
oak:~# mdadm -Q --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : a7cc80af:206de849:dd30336a:6ea23e69
Creation Time : Sun Dec 26 21:51:39 2004
Raid Level : raid5
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Update Time : Sat Jul 9 11:27:33 2005
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 1
Spare Devices : 1
Checksum : 4da3ec1f - correct
Events : 0.1271893
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/.static/dev/sda1
0 0 8 1 0 active sync /dev/.static/dev/sda1
1 1 8 17 1 active sync /dev/.static/dev/sdb1
2 2 8 33 2 active sync /dev/.static/dev/sdc1
3 3 8 49 3 active sync /dev/.static/dev/sdd1
4 4 0 0 4 faulty removed
5 5 33 1 5 spare /dev/.static/dev/hde1
oak:~#
and examining /dev/sdb1 I see:
oak:~# mdadm -Q --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : a7cc80af:206de849:dd30336a:6ea23e69
Creation Time : Sun Dec 26 21:51:39 2004
Raid Level : raid5
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Update Time : Sat Jul 9 12:22:25 2005
State : clean
Active Devices : 3
Working Devices : 4
Failed Devices : 2
Spare Devices : 1
Checksum : 4dd319d4 - correct
Events : 0.2816178
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/.static/dev/sdb1
0 0 0 0 0 removed
1 1 8 17 1 active sync /dev/.static/dev/sdb1
2 2 8 33 2 active sync /dev/.static/dev/sdc1
3 3 8 49 3 active sync /dev/.static/dev/sdd1
4 4 0 0 4 faulty removed
5 5 33 1 4 spare /dev/.static/dev/hde1
oak:~#
So it seems like /dev/sdb1 (and the other raid devices) does not list /dev/sda1.
Other "interesting files are:
oak:~# cat /etc/mdadm/mdadm.conf
DEVICE /dev/hd*[0-9] /dev/sd*[0-9]
ARRAY /dev/md0 level=raid5 num-devices=5
UUID=a7cc80af:206de849:dd30336a:6ea23e69
devices=/dev/hde1,/dev/sdd1,/dev/sdc1,/dev/sdb1,/dev/sda1
oak:~# cat /proc/mdstat
Personalities : [raid5]
md0 : inactive sda1[0] sdb1[1] hde1[5] sdd1[3] sdc1[2]
976791680 blocks
unused devices: <none>
oak:~#
If I try to run the raid, I get:
oak:/var/log# mdadm -R /dev/md0
mdadm: failed to run array /dev/md0: Invalid argument
oak:/var/log#
In the log I filJul 13 16:48:54 localhost kernel: raid5: device sdb1
operational as raid disk 1
Jul 13 16:48:54 localhost kernel: raid5: device sdd1 operational as raid disk 3
Jul 13 16:48:54 localhost kernel: raid5: device sdc1 operational as raid disk 2
Jul 13 16:48:54 localhost kernel: RAID5 conf printout:
Jul 13 16:48:54 localhost kernel: --- rd:5 wd:3 fd:2
Jul 13 16:48:54 localhost kernel: disk 0, o:1, dev:sda1
Jul 13 16:48:54 localhost kernel: disk 1, o:1, dev:sdb1
Jul 13 16:48:54 localhost kernel: disk 2, o:1, dev:sdc1
Jul 13 16:48:54 localhost kernel: disk 3, o:1, dev:sdd1
Elsewhere in the log I find:
Jul 13 13:30:16 localhost kernel: disk 2, o:1, dev:sdc1
Jul 13 13:30:16 localhost kernel: disk 3, o:1, dev:sdd1
Jul 13 13:34:03 localhost kernel: md: error, md_import_device() returned -16
Jul 13 13:35:00 localhost kernel: md: error, md_import_device() returned -16
Jul 13 13:36:21 localhost kernel: raid5: device sdb1 operational as raid disk 1
Jul 13 13:36:21 localhost kernel: raid5: device sdd1 operational as raid disk 3
Jul 13 13:36:21 localhost kernel: raid5: device sdc1 operational as raid disk 2
I would very much appreciate suggestions on how to get the raid running again.
I have a replacement drive, but don't want to put it in until I get
this issue resolved.
I'm running Debian testing (386) with kernel 2.6.8-1-386 and mdadm
tools 1.9.0-4.1.
thanks,
hank
--
Beautiful Sunny Winfield, Illinois
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: disk failed, operator error: Now can't use RAID
2005-07-14 0:29 ` Hank Barta
@ 2005-07-14 7:03 ` Neil Brown
2005-07-14 23:05 ` Hank Barta
0 siblings, 1 reply; 4+ messages in thread
From: Neil Brown @ 2005-07-14 7:03 UTC (permalink / raw)
To: Hank Barta; +Cc: linux-raid
On Wednesday July 13, hbarta@gmail.com wrote:
>
> I would very much appreciate suggestions on how to get the raid
> running again.
Remove the
> devices=/dev/hde1,/dev/sdd1,/dev/sdc1,/dev/sdb1,/dev/sda1
line from mdadm.conf (it is wrong and un-needed).
Then
mdadm -S /dev/md0 # just to be sure
mdadm -A /dev/md0 -f /dev/sd[abcd]1 /dev/hd[eg]1
and see if that works.
NeilBrown
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: disk failed, operator error: Now can't use RAID
2005-07-14 7:03 ` Neil Brown
@ 2005-07-14 23:05 ` Hank Barta
0 siblings, 0 replies; 4+ messages in thread
From: Hank Barta @ 2005-07-14 23:05 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
On 7/14/05, Neil Brown <neilb@cse.unsw.edu.au> wrote:
> On Wednesday July 13, hbarta@gmail.com wrote:
> >
> > I would very much appreciate suggestions on how to get the raid
> > running again.
>
> Remove the
> > devices=/dev/hde1,/dev/sdd1,/dev/sdc1,/dev/sdb1,/dev/sda1
>
> line from mdadm.conf (it is wrong and un-needed).
>
> Then
> mdadm -S /dev/md0 # just to be sure
> mdadm -A /dev/md0 -f /dev/sd[abcd]1 /dev/hd[eg]1
>
> and see if that works.
Yes, Thanks!
Results are:
oak:~# mdadm -S /dev/md0
oak:~# mdadm -A /dev/md0 -f /dev/sd[abcd]1 /dev/hd[eg]1
mdadm: forcing event count in /dev/sda1(0) from 1271893 upto 2816178
mdadm: /dev/md0 has been started with 4 drives (out of 5) and 1 spare.
oak:~# cat /proc/mdstat
Personalities : [raid5]
md0 : active raid5 sda1[0] hde1[5] sdd1[3] sdc1[2] sdb1[1]
781433344 blocks level 5, 32k chunk, algorithm 2 [5/4] [UUUU_]
[>....................] recovery = 0.1% (389320/195358336)
finish=280.4min speed=11585K/sec
unused devices: <none>
oak:~#
Now... After this is through rebuilding, I need to replace the failed
drive. (Creating one partition and setting it to type 0xFD (Linux raid
autodetect)
What's the best way to get this in service with one drive as a spare?
Can I convert my current spare (/dev/hde1) to a regular disk and add
the new disk as a spare?
Or should I add the new disk as an active drive and if so, will it be
rebuilt and the spare (/dev/hde1) be relegated back as a spare?
thanks again,
hank
--
Beautiful Sunny Winfield, Illinois
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-07-14 23:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-13 21:57 disk failed, operator error: Now can't use RAID Hank Barta
2005-07-14 0:29 ` Hank Barta
2005-07-14 7:03 ` Neil Brown
2005-07-14 23:05 ` Hank Barta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).