* Help recovering from failed disk on RAID 6
@ 2008-04-28 2:16 Joshua Johnson
2008-05-13 16:10 ` Steve Fairbairn
0 siblings, 1 reply; 4+ messages in thread
From: Joshua Johnson @ 2008-04-28 2:16 UTC (permalink / raw)
To: linux-raid
I am running a linux server with an 8 disk IDE/SATA RAID 6 array. One
of the disks is having a problem which caused the machine to freeze.
If I boot the machine without the problem disk the array fails to
start. If I boot with the problem disk the array starts correctly and
begins syncing, but the machine will soon freeze up again when the
disk drops out. My number one question is how to get the array back
online. It has a spare disk, but since the OS is freezing rather than
failing the disk that is having the problem, it never switched to the
new disk. When I try to start the array without the problem disk, I
get:
#mdadm --manage --run /dev/md0
raid5: device hda2 operational as raid disk 0
raid5: device sdb2 operational as raid disk 7
raid5: device sda1 operational as raid disk 6
raid5: device hdi2 operational as raid disk 5
raid5: device hdg2 operational as raid disk 3
raid5: device hde2 operational as raid disk 2
raid5: device hdk2 operational as raid disk 1
raid5: cannot start dirty degraded array for md0
RAID5 conf printout:
--- rd:8 wd:7
disk 0, o:1, dev:hda2
disk 1, o:1, dev:hdk2
disk 2, o:1, dev:hde2
disk 3, o:1, dev:hdg2
disk 5, o:1, dev:hdi2
disk 6, o:1, dev:sda1
disk 7, o:1, dev:sdb2
raid5: failed to run raid set md0
md: pers->run() failed ...
mdadm: failed to run array /dev/md0: Input/output error
/proc/mdstat contains:
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid1 hdg1[1] hda1[0]
4200896 blocks [2/2] [UU]
md0 : inactive hda2[0] sdc2[8](S) sdb2[7] sda1[6] hdi2[5] hdg2[3]
hde2[2] hdk2[1]
1529265920 blocks
So how do I get this array to run? I can't start it without the
problem disk and I can't sync it with the problem disk. I am running
RAID 6 to be able to recover from multiple disk failures so it is a
little vexing that a single disk going offline renders my array
unrunnable. Any help with this issue is greatly appreciated.
^ permalink raw reply [flat|nested] 4+ messages in thread* RE: Help recovering from failed disk on RAID 6
2008-04-28 2:16 Help recovering from failed disk on RAID 6 Joshua Johnson
@ 2008-05-13 16:10 ` Steve Fairbairn
2008-05-13 16:28 ` David Lethe
0 siblings, 1 reply; 4+ messages in thread
From: Steve Fairbairn @ 2008-05-13 16:10 UTC (permalink / raw)
To: 'Joshua Johnson', linux-raid
Hi,
It appears noone else has answered, so I'll try. First I'd attempt to
start the array with the --force parameter, which I believe will start
the dirty array without the failed drive in it.
The other option to try depends on how long you have before the OS
freezes, but is to start the array with the dodgy drive in it, but
immediately tell mdadm to fail the dodgy disk. This should have mdadm
start a resync with the spare drive.
Hope this helps,
Steve.
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Joshua Johnson
> Sent: 28 April 2008 03:17
> To: linux-raid@vger.kernel.org
> Subject: Help recovering from failed disk on RAID 6
>
>
> I am running a linux server with an 8 disk IDE/SATA RAID 6
> array. One of the disks is having a problem which caused the
> machine to freeze. If I boot the machine without the problem
> disk the array fails to start. If I boot with the problem
> disk the array starts correctly and begins syncing, but the
> machine will soon freeze up again when the disk drops out.
> My number one question is how to get the array back online.
> It has a spare disk, but since the OS is freezing rather than
> failing the disk that is having the problem, it never
> switched to the new disk. When I try to start the array
> without the problem disk, I
> get:
>
> #mdadm --manage --run /dev/md0
> raid5: device hda2 operational as raid disk 0
> raid5: device sdb2 operational as raid disk 7
> raid5: device sda1 operational as raid disk 6
> raid5: device hdi2 operational as raid disk 5
> raid5: device hdg2 operational as raid disk 3
> raid5: device hde2 operational as raid disk 2
> raid5: device hdk2 operational as raid disk 1
> raid5: cannot start dirty degraded array for md0
> RAID5 conf printout:
> --- rd:8 wd:7
> disk 0, o:1, dev:hda2
> disk 1, o:1, dev:hdk2
> disk 2, o:1, dev:hde2
> disk 3, o:1, dev:hdg2
> disk 5, o:1, dev:hdi2
> disk 6, o:1, dev:sda1
> disk 7, o:1, dev:sdb2
> raid5: failed to run raid set md0
> md: pers->run() failed ...
> mdadm: failed to run array /dev/md0: Input/output error
>
> /proc/mdstat contains:
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md1 : active raid1 hdg1[1] hda1[0]
> 4200896 blocks [2/2] [UU]
>
> md0 : inactive hda2[0] sdc2[8](S) sdb2[7] sda1[6] hdi2[5]
> hdg2[3] hde2[2] hdk2[1]
> 1529265920 blocks
>
>
> So how do I get this array to run? I can't start it without
> the problem disk and I can't sync it with the problem disk.
> I am running RAID 6 to be able to recover from multiple disk
> failures so it is a little vexing that a single disk going
> offline renders my array unrunnable. Any help with this
> issue is greatly appreciated.
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in the body of a message to
> majordomo@vger.kernel.org More majordomo info at
http://vger.kernel.org/majordomo-info.html
No virus found in this incoming message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date:
07/05/2008 07:46
No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.16/1429 - Release Date:
12/05/2008 18:14
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Help recovering from failed disk on RAID 6
2008-05-13 16:10 ` Steve Fairbairn
@ 2008-05-13 16:28 ` David Lethe
2008-05-13 20:11 ` Pascal Charest
0 siblings, 1 reply; 4+ messages in thread
From: David Lethe @ 2008-05-13 16:28 UTC (permalink / raw)
To: Steve Fairbairn, Joshua Johnson, linux-raid
I would also add to Steve's suggestion that you be prepared to
immediately disconnect the power to the dodgy disk once the rebuild
starts. That eliminates possibility that the bad disk will lock up the
system.
David
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steve Fairbairn
Sent: Tuesday, May 13, 2008 11:11 AM
To: 'Joshua Johnson'; linux-raid@vger.kernel.org
Subject: RE: Help recovering from failed disk on RAID 6
Hi,
It appears noone else has answered, so I'll try. First I'd attempt to
start the array with the --force parameter, which I believe will start
the dirty array without the failed drive in it.
The other option to try depends on how long you have before the OS
freezes, but is to start the array with the dodgy drive in it, but
immediately tell mdadm to fail the dodgy disk. This should have mdadm
start a resync with the spare drive.
Hope this helps,
Steve.
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Joshua Johnson
> Sent: 28 April 2008 03:17
> To: linux-raid@vger.kernel.org
> Subject: Help recovering from failed disk on RAID 6
>
>
> I am running a linux server with an 8 disk IDE/SATA RAID 6
> array. One of the disks is having a problem which caused the
> machine to freeze. If I boot the machine without the problem
> disk the array fails to start. If I boot with the problem
> disk the array starts correctly and begins syncing, but the
> machine will soon freeze up again when the disk drops out.
> My number one question is how to get the array back online.
> It has a spare disk, but since the OS is freezing rather than
> failing the disk that is having the problem, it never
> switched to the new disk. When I try to start the array
> without the problem disk, I
> get:
>
> #mdadm --manage --run /dev/md0
> raid5: device hda2 operational as raid disk 0
> raid5: device sdb2 operational as raid disk 7
> raid5: device sda1 operational as raid disk 6
> raid5: device hdi2 operational as raid disk 5
> raid5: device hdg2 operational as raid disk 3
> raid5: device hde2 operational as raid disk 2
> raid5: device hdk2 operational as raid disk 1
> raid5: cannot start dirty degraded array for md0
> RAID5 conf printout:
> --- rd:8 wd:7
> disk 0, o:1, dev:hda2
> disk 1, o:1, dev:hdk2
> disk 2, o:1, dev:hde2
> disk 3, o:1, dev:hdg2
> disk 5, o:1, dev:hdi2
> disk 6, o:1, dev:sda1
> disk 7, o:1, dev:sdb2
> raid5: failed to run raid set md0
> md: pers->run() failed ...
> mdadm: failed to run array /dev/md0: Input/output error
>
> /proc/mdstat contains:
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md1 : active raid1 hdg1[1] hda1[0]
> 4200896 blocks [2/2] [UU]
>
> md0 : inactive hda2[0] sdc2[8](S) sdb2[7] sda1[6] hdi2[5]
> hdg2[3] hde2[2] hdk2[1]
> 1529265920 blocks
>
>
> So how do I get this array to run? I can't start it without
> the problem disk and I can't sync it with the problem disk.
> I am running RAID 6 to be able to recover from multiple disk
> failures so it is a little vexing that a single disk going
> offline renders my array unrunnable. Any help with this
> issue is greatly appreciated.
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in the body of a message to
> majordomo@vger.kernel.org More majordomo info at
http://vger.kernel.org/majordomo-info.html
No virus found in this incoming message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date:
07/05/2008 07:46
No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.16/1429 - Release Date:
12/05/2008 18:14
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Help recovering from failed disk on RAID 6
2008-05-13 16:28 ` David Lethe
@ 2008-05-13 20:11 ` Pascal Charest
0 siblings, 0 replies; 4+ messages in thread
From: Pascal Charest @ 2008-05-13 20:11 UTC (permalink / raw)
To: linux-raid
Hum. Why not simply use the "fail" option to "fail" and thus put
offline the problematic drive ?
mdadm --fail /your/raid/device /the/drive/you/want/to/fail
You can "remove" the drive afterward with the "remove" command ;-). I
don't think you should do any "physical" operation like disconnecting
power supply of a live disk - even if it is a dodgy disk. "This
eliminates possibility that the bad disk will lock up the system" but
"create the possibility of a short circuit and having no more system
at all".
Pascal Charest
--
Pascal Charest, Free software consultant {GNU/Linux}
http://blog.pacharest.com
On Tue, May 13, 2008 at 12:28 PM, David Lethe <david@santools.com> wrote:
> I would also add to Steve's suggestion that you be prepared to
> immediately disconnect the power to the dodgy disk once the rebuild
> starts. That eliminates possibility that the bad disk will lock up the
> system.
>
> David
>
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
>
>
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steve Fairbairn
> Sent: Tuesday, May 13, 2008 11:11 AM
> To: 'Joshua Johnson'; linux-raid@vger.kernel.org
> Subject: RE: Help recovering from failed disk on RAID 6
>
> Hi,
>
> It appears noone else has answered, so I'll try. First I'd attempt to
> start the array with the --force parameter, which I believe will start
> the dirty array without the failed drive in it.
>
> The other option to try depends on how long you have before the OS
> freezes, but is to start the array with the dodgy drive in it, but
> immediately tell mdadm to fail the dodgy disk. This should have mdadm
> start a resync with the spare drive.
>
> Hope this helps,
>
> Steve.
>
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org
> > [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Joshua Johnson
> > Sent: 28 April 2008 03:17
> > To: linux-raid@vger.kernel.org
> > Subject: Help recovering from failed disk on RAID 6
> >
> >
> > I am running a linux server with an 8 disk IDE/SATA RAID 6
> > array. One of the disks is having a problem which caused the
> > machine to freeze. If I boot the machine without the problem
> > disk the array fails to start. If I boot with the problem
> > disk the array starts correctly and begins syncing, but the
> > machine will soon freeze up again when the disk drops out.
> > My number one question is how to get the array back online.
> > It has a spare disk, but since the OS is freezing rather than
> > failing the disk that is having the problem, it never
> > switched to the new disk. When I try to start the array
> > without the problem disk, I
> > get:
> >
> > #mdadm --manage --run /dev/md0
> > raid5: device hda2 operational as raid disk 0
> > raid5: device sdb2 operational as raid disk 7
> > raid5: device sda1 operational as raid disk 6
> > raid5: device hdi2 operational as raid disk 5
> > raid5: device hdg2 operational as raid disk 3
> > raid5: device hde2 operational as raid disk 2
> > raid5: device hdk2 operational as raid disk 1
> > raid5: cannot start dirty degraded array for md0
> > RAID5 conf printout:
> > --- rd:8 wd:7
> > disk 0, o:1, dev:hda2
> > disk 1, o:1, dev:hdk2
> > disk 2, o:1, dev:hde2
> > disk 3, o:1, dev:hdg2
> > disk 5, o:1, dev:hdi2
> > disk 6, o:1, dev:sda1
> > disk 7, o:1, dev:sdb2
> > raid5: failed to run raid set md0
> > md: pers->run() failed ...
> > mdadm: failed to run array /dev/md0: Input/output error
> >
> > /proc/mdstat contains:
> > Personalities : [raid1] [raid6] [raid5] [raid4]
> > md1 : active raid1 hdg1[1] hda1[0]
> > 4200896 blocks [2/2] [UU]
> >
> > md0 : inactive hda2[0] sdc2[8](S) sdb2[7] sda1[6] hdi2[5]
> > hdg2[3] hde2[2] hdk2[1]
> > 1529265920 blocks
> >
> >
> > So how do I get this array to run? I can't start it without
> > the problem disk and I can't sync it with the problem disk.
> > I am running RAID 6 to be able to recover from multiple disk
> > failures so it is a little vexing that a single disk going
> > offline renders my array unrunnable. Any help with this
> > issue is greatly appreciated.
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-raid" in the body of a message to
> > majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date:
> 07/05/2008 07:46
>
>
> No virus found in this outgoing message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.16/1429 - Release Date:
> 12/05/2008 18:14
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Pascal Charest, Free software consultant {GNU/Linux}
http://blog.pacharest.com
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-05-13 20:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-28 2:16 Help recovering from failed disk on RAID 6 Joshua Johnson
2008-05-13 16:10 ` Steve Fairbairn
2008-05-13 16:28 ` David Lethe
2008-05-13 20:11 ` Pascal Charest
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.