Re: RE: RAID5 Not coming back up after crash

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: RE: RAID5 Not coming back up after crash
@ 2004-11-29 20:56 BERNARD JOHN ZOLP
  2004-11-29 22:29 ` Guy
  0 siblings, 1 reply; 10+ messages in thread
From: BERNARD JOHN ZOLP @ 2004-11-29 20:56 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

Just a few follow up questions before I dive into this.  Will mdadm work
with a RAID setup created with the older raidtools package that came
with my SuSE installation?
  Assuming the drive with bad blocks is not getting worse, dont think it
is -- but you never know, could I map them out by writing to those
sectors with dd and then running the command to bring the array back
online?  Or should I wait for the RMA of the flakey drive and dd_rescue
to the new one and bring that up?

Thanks again,
bjz

----- Original Message -----
From: Guy <bugzilla@watkins-home.com>
Date: Monday, November 29, 2004 11:40 am
Subject: RE: RAID5 Not coming back up after crash

> You can recover, but not with bad blocks.
> 
> This command should get your array back on-line:
> mdadm -A /dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1 
> /dev/hdj1
> But, as soon as md reads a bad block it will fail the disk and your 
> arraywill be off-line.
> 
> If you have an extra disk, you could attempt to copy the disk 
> first, then
> replace the disk with the read error with the copy.
> 
> dd_rescue can copy a disk with read errors.
> 
> Also, it is common for a disk to grow bad spots over time.  These 
> bad spots
> (sectors) can be re-mapped by the drive to a spare sector.  This re-
> mappingwill occur when an attempt is made to write to the bad 
> sector.  So, you can
> repair your disk by writing to the bad sectors.  But, be careful 
> not to
> overwrite good data.  I have done this using dd.  First I found the 
> badsector with dd, then I wrote to the 1 bad sector with dd.  I 
> would need to
> refer to the man page to do it again, so I can't explain it here at 
> thistime.  It is not really hard, but 1 small mistake, and "that's 
> it man, game
> over man, game over".
> 
> Guy
> 
> 
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of B. J. Zolp
> Sent: Monday, November 29, 2004 11:33 AM
> To: linux-raid@vger.kernel.org
> Subject: RAID5 Not coming back up after crash
> 
> I have a RAID5 setup on my fileserver using disks hda1 hdb1 hdc1 
> hdd1 
> hdi1 and hdj1.  Yesterday I started moving a large chunk of files 
> ~80GB 
> from this array to a stand alone drive in the system and about 
> halfway 
> through the mv I got a ton of PERMISSION DENIED errors some of the 
> remaining files left to be moved and the move process quit.  I did 
> a ls 
> of the raid directory and got PERMISSION DENIED on the same files 
> that 
> errored out on the mv while some of the other files looked fine.  I 
> figured it might be a good idea to take the raid down and back up 
> again 
> (probably a mistake) and I could not reboot the machine without 
> physically turning it off as some processes were hung.  Upon 
> booting 
> back up, the raid did not come online stating that hdj1 was kicked 
> due 
> to inconsistency.  Additionally hdb1 is listed as offline too.  So 
> I 
> have 2 drives that are not cooperating.  I have a hunch hdb1 might 
> have 
> not been working for some time.
> 
> I found some info stating that if you mark the drive that failed 
> first 
> as "failed-drive" and try a  "mkraid --force --dangerous-no-resync 
> /dev/md0" then I might have some luck getting my files back.  From 
> my 
> logs I can see that all the working drives have event counter: 
> 00000022 
> and hdj1 has event counter: 00000021 and hdb1 has event counter: 
> 00000001.  Does this mean that hdb1 failed a log time ago or is 
> this 
> difference in event counters likely within a few minutes fo each 
> other?  
> I just ran badblocks on both hdb1 and hdj1 and found 1 bad block on 
> hdb1 
> and about 15 on hdj1, would that be enough to cause my raid to get 
> this 
> out of whack?  In any case I plan to replace those drives, but 
> would the 
> method above be the best route once I have copied the raw data to 
> the 
> new drives in order to bring my raid back up?
> 
> 
> Thanks,
> 
> bjz
> 
> here is my log from when I run raidstart /dev/md0:
> 
> Nov 29 10:10:19 orion kernel:  [events: 00000022]
> Nov 29 10:10:19 orion last message repeated 3 times
> Nov 29 10:10:19 orion kernel:  [events: 00000021]
> Nov 29 10:10:19 orion kernel: md: autorun ...
> Nov 29 10:10:19 orion kernel: md: considering hdj1 ...
> Nov 29 10:10:19 orion kernel: md:  adding hdj1 ...
> Nov 29 10:10:19 orion kernel: md:  adding hdi1 ...
> Nov 29 10:10:19 orion kernel: md:  adding hdd1 ...
> Nov 29 10:10:19 orion kernel: md:  adding hdc1 ...
> Nov 29 10:10:19 orion kernel: md:  adding hda1 ...
> Nov 29 10:10:19 orion kernel: md: created md0
> Nov 29 10:10:19 orion kernel: md: bind<hda1,1>
> Nov 29 10:10:19 orion kernel: md: bind<hdc1,2>
> Nov 29 10:10:19 orion kernel: md: bind<hdd1,3>
> Nov 29 10:10:19 orion kernel: md: bind<hdi1,4>
> Nov 29 10:10:19 orion kernel: md: bind<hdj1,5>
> Nov 29 10:10:19 orion kernel: md: running: 
> <hdj1><hdi1><hdd1><hdc1><hda1>Nov 29 10:10:19 orion kernel: md: 
> hdj1's event counter: 00000021
> Nov 29 10:10:19 orion kernel: md: hdi1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hdd1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hdc1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hda1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: superblock update time 
> inconsistency 
> -- using the most recent one
> Nov 29 10:10:19 orion kernel: md: freshest: hdi1
> Nov 29 10:10:19 orion kernel: md0: kicking faulty hdj1!
> Nov 29 10:10:19 orion kernel: md: unbind<hdj1,4>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdj1)
> Nov 29 10:10:19 orion kernel: md: md0: raid array is not clean -- 
> starting background reconstruction
> Nov 29 10:10:19 orion kernel: md0: max total readahead window set 
> to 2560k
> Nov 29 10:10:19 orion kernel: md0: 5 data-disks, max readahead per 
> data-disk: 512k
> Nov 29 10:10:19 orion kernel: raid5: device hdi1 operational as 
> raid disk 4
> Nov 29 10:10:19 orion kernel: raid5: device hdd1 operational as 
> raid disk 3
> Nov 29 10:10:19 orion kernel: raid5: device hdc1 operational as 
> raid disk 2
> Nov 29 10:10:19 orion kernel: raid5: device hda1 operational as 
> raid disk 0
> Nov 29 10:10:19 orion kernel: raid5: not enough operational devices 
> for 
> md0 (2/6 failed)
> Nov 29 10:10:19 orion kernel: RAID5 conf printout:
> Nov 29 10:10:19 orion kernel:  --- rd:6 wd:4 fd:2
> Nov 29 10:10:19 orion kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 
> dev:hda1Nov 29 10:10:19 orion kernel:  disk 1, s:0, o:0, n:1 rd:1 
> us:1 dev:[dev 
> 00:00]
> Nov 29 10:10:19 orion kernel:  disk 2, s:0, o:1, n:2 rd:2 us:1 
> dev:hdc1Nov 29 10:10:19 orion kernel:  disk 3, s:0, o:1, n:3 rd:3 
> us:1 dev:hdd1
> Nov 29 10:10:19 orion kernel:  disk 4, s:0, o:1, n:4 rd:4 us:1 
> dev:hdi1Nov 29 10:10:19 orion kernel:  disk 5, s:0, o:0, n:5 rd:5 
> us:1 dev:[dev 
> 00:00]
> Nov 29 10:10:19 orion kernel: raid5: failed to run raid set md0
> Nov 29 10:10:19 orion kernel: md: pers->run() failed ...
> Nov 29 10:10:19 orion kernel: md :do_md_run() returned -22
> Nov 29 10:10:19 orion kernel: md: md0 stopped.
> Nov 29 10:10:19 orion kernel: md: unbind<hdi1,3>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdi1)
> Nov 29 10:10:19 orion kernel: md: unbind<hdd1,2>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdd1)
> Nov 29 10:10:19 orion kernel: md: unbind<hdc1,1>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdc1)
> Nov 29 10:10:19 orion kernel: md: unbind<hda1,0>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hda1)
> Nov 29 10:10:19 orion kernel: md: ... autorun DONE.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> No virus found in this incoming message.
> Checked by AVG Anti-Virus.
> Version: 7.0.289 / Virus Database: 265.4.3 - Release Date: 11/26/2004
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: RE: RAID5 Not coming back up after crash
  2004-11-29 20:56 RE: RAID5 Not coming back up after crash BERNARD JOHN ZOLP
@ 2004-11-29 22:29 ` Guy
  2004-11-30  5:38   ` B. J. Zolp
  0 siblings, 1 reply; 10+ messages in thread
From: Guy @ 2004-11-29 22:29 UTC (permalink / raw)
  To: 'BERNARD JOHN ZOLP'; +Cc: linux-raid

If you are sure you can overwrite the correct bad sectors, then do it.

mdadm is much better than raidtools.  From what I have read, yes it is
compatible.

The below info is not required.
Who makes your 6 disk drives?  And how old are they?  Any bets anyone?

Guy

-----Original Message-----
From: BERNARD JOHN ZOLP [mailto:bjzolp@students.wisc.edu] 
Sent: Monday, November 29, 2004 3:57 PM
To: Guy
Cc: linux-raid@vger.kernel.org
Subject: Re: RE: RAID5 Not coming back up after crash

Just a few follow up questions before I dive into this.  Will mdadm work
with a RAID setup created with the older raidtools package that came
with my SuSE installation?
  Assuming the drive with bad blocks is not getting worse, dont think it
is -- but you never know, could I map them out by writing to those
sectors with dd and then running the command to bring the array back
online?  Or should I wait for the RMA of the flakey drive and dd_rescue
to the new one and bring that up?

Thanks again,
bjz

----- Original Message -----
From: Guy <bugzilla@watkins-home.com>
Date: Monday, November 29, 2004 11:40 am
Subject: RE: RAID5 Not coming back up after crash

> You can recover, but not with bad blocks.
> 
> This command should get your array back on-line:
> mdadm -A /dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1 
> /dev/hdj1
> But, as soon as md reads a bad block it will fail the disk and your 
> arraywill be off-line.
> 
> If you have an extra disk, you could attempt to copy the disk 
> first, then
> replace the disk with the read error with the copy.
> 
> dd_rescue can copy a disk with read errors.
> 
> Also, it is common for a disk to grow bad spots over time.  These 
> bad spots
> (sectors) can be re-mapped by the drive to a spare sector.  This re-
> mappingwill occur when an attempt is made to write to the bad 
> sector.  So, you can
> repair your disk by writing to the bad sectors.  But, be careful 
> not to
> overwrite good data.  I have done this using dd.  First I found the 
> badsector with dd, then I wrote to the 1 bad sector with dd.  I 
> would need to
> refer to the man page to do it again, so I can't explain it here at 
> thistime.  It is not really hard, but 1 small mistake, and "that's 
> it man, game
> over man, game over".
> 
> Guy
> 
> 
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of B. J. Zolp
> Sent: Monday, November 29, 2004 11:33 AM
> To: linux-raid@vger.kernel.org
> Subject: RAID5 Not coming back up after crash
> 
> I have a RAID5 setup on my fileserver using disks hda1 hdb1 hdc1 
> hdd1 
> hdi1 and hdj1.  Yesterday I started moving a large chunk of files 
> ~80GB 
> from this array to a stand alone drive in the system and about 
> halfway 
> through the mv I got a ton of PERMISSION DENIED errors some of the 
> remaining files left to be moved and the move process quit.  I did 
> a ls 
> of the raid directory and got PERMISSION DENIED on the same files 
> that 
> errored out on the mv while some of the other files looked fine.  I 
> figured it might be a good idea to take the raid down and back up 
> again 
> (probably a mistake) and I could not reboot the machine without 
> physically turning it off as some processes were hung.  Upon 
> booting 
> back up, the raid did not come online stating that hdj1 was kicked 
> due 
> to inconsistency.  Additionally hdb1 is listed as offline too.  So 
> I 
> have 2 drives that are not cooperating.  I have a hunch hdb1 might 
> have 
> not been working for some time.
> 
> I found some info stating that if you mark the drive that failed 
> first 
> as "failed-drive" and try a  "mkraid --force --dangerous-no-resync 
> /dev/md0" then I might have some luck getting my files back.  From 
> my 
> logs I can see that all the working drives have event counter: 
> 00000022 
> and hdj1 has event counter: 00000021 and hdb1 has event counter: 
> 00000001.  Does this mean that hdb1 failed a log time ago or is 
> this 
> difference in event counters likely within a few minutes fo each 
> other?  
> I just ran badblocks on both hdb1 and hdj1 and found 1 bad block on 
> hdb1 
> and about 15 on hdj1, would that be enough to cause my raid to get 
> this 
> out of whack?  In any case I plan to replace those drives, but 
> would the 
> method above be the best route once I have copied the raw data to 
> the 
> new drives in order to bring my raid back up?
> 
> 
> Thanks,
> 
> bjz
> 
> here is my log from when I run raidstart /dev/md0:
> 
> Nov 29 10:10:19 orion kernel:  [events: 00000022]
> Nov 29 10:10:19 orion last message repeated 3 times
> Nov 29 10:10:19 orion kernel:  [events: 00000021]
> Nov 29 10:10:19 orion kernel: md: autorun ...
> Nov 29 10:10:19 orion kernel: md: considering hdj1 ...
> Nov 29 10:10:19 orion kernel: md:  adding hdj1 ...
> Nov 29 10:10:19 orion kernel: md:  adding hdi1 ...
> Nov 29 10:10:19 orion kernel: md:  adding hdd1 ...
> Nov 29 10:10:19 orion kernel: md:  adding hdc1 ...
> Nov 29 10:10:19 orion kernel: md:  adding hda1 ...
> Nov 29 10:10:19 orion kernel: md: created md0
> Nov 29 10:10:19 orion kernel: md: bind<hda1,1>
> Nov 29 10:10:19 orion kernel: md: bind<hdc1,2>
> Nov 29 10:10:19 orion kernel: md: bind<hdd1,3>
> Nov 29 10:10:19 orion kernel: md: bind<hdi1,4>
> Nov 29 10:10:19 orion kernel: md: bind<hdj1,5>
> Nov 29 10:10:19 orion kernel: md: running: 
> <hdj1><hdi1><hdd1><hdc1><hda1>Nov 29 10:10:19 orion kernel: md: 
> hdj1's event counter: 00000021
> Nov 29 10:10:19 orion kernel: md: hdi1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hdd1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hdc1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hda1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: superblock update time 
> inconsistency 
> -- using the most recent one
> Nov 29 10:10:19 orion kernel: md: freshest: hdi1
> Nov 29 10:10:19 orion kernel: md0: kicking faulty hdj1!
> Nov 29 10:10:19 orion kernel: md: unbind<hdj1,4>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdj1)
> Nov 29 10:10:19 orion kernel: md: md0: raid array is not clean -- 
> starting background reconstruction
> Nov 29 10:10:19 orion kernel: md0: max total readahead window set 
> to 2560k
> Nov 29 10:10:19 orion kernel: md0: 5 data-disks, max readahead per 
> data-disk: 512k
> Nov 29 10:10:19 orion kernel: raid5: device hdi1 operational as 
> raid disk 4
> Nov 29 10:10:19 orion kernel: raid5: device hdd1 operational as 
> raid disk 3
> Nov 29 10:10:19 orion kernel: raid5: device hdc1 operational as 
> raid disk 2
> Nov 29 10:10:19 orion kernel: raid5: device hda1 operational as 
> raid disk 0
> Nov 29 10:10:19 orion kernel: raid5: not enough operational devices 
> for 
> md0 (2/6 failed)
> Nov 29 10:10:19 orion kernel: RAID5 conf printout:
> Nov 29 10:10:19 orion kernel:  --- rd:6 wd:4 fd:2
> Nov 29 10:10:19 orion kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 
> dev:hda1Nov 29 10:10:19 orion kernel:  disk 1, s:0, o:0, n:1 rd:1 
> us:1 dev:[dev 
> 00:00]
> Nov 29 10:10:19 orion kernel:  disk 2, s:0, o:1, n:2 rd:2 us:1 
> dev:hdc1Nov 29 10:10:19 orion kernel:  disk 3, s:0, o:1, n:3 rd:3 
> us:1 dev:hdd1
> Nov 29 10:10:19 orion kernel:  disk 4, s:0, o:1, n:4 rd:4 us:1 
> dev:hdi1Nov 29 10:10:19 orion kernel:  disk 5, s:0, o:0, n:5 rd:5 
> us:1 dev:[dev 
> 00:00]
> Nov 29 10:10:19 orion kernel: raid5: failed to run raid set md0
> Nov 29 10:10:19 orion kernel: md: pers->run() failed ...
> Nov 29 10:10:19 orion kernel: md :do_md_run() returned -22
> Nov 29 10:10:19 orion kernel: md: md0 stopped.
> Nov 29 10:10:19 orion kernel: md: unbind<hdi1,3>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdi1)
> Nov 29 10:10:19 orion kernel: md: unbind<hdd1,2>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdd1)
> Nov 29 10:10:19 orion kernel: md: unbind<hdc1,1>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdc1)
> Nov 29 10:10:19 orion kernel: md: unbind<hda1,0>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hda1)
> Nov 29 10:10:19 orion kernel: md: ... autorun DONE.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> No virus found in this incoming message.
> Checked by AVG Anti-Virus.
> Version: 7.0.289 / Virus Database: 265.4.3 - Release Date: 11/26/2004
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID5 Not coming back up after crash
  2004-11-29 22:29 ` Guy
@ 2004-11-30  5:38   ` B. J. Zolp
  2004-11-30  5:45     ` Neil Brown
  0 siblings, 1 reply; 10+ messages in thread
From: B. J. Zolp @ 2004-11-30  5:38 UTC (permalink / raw)
  To: Guy; +Cc: 'BERNARD JOHN ZOLP', linux-raid

I found a spare new drive that I copied hdj1 onto and put the new drive 
on the proper IDE cable for hdj.  Then tried running the mdadm -A 
/dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1 /dev/hdj1
But it seg faults.  Should I try the  mkraid --force 
--dangerous-no-resync /dev/md0   ???

Thanks,
bjz



Guy wrote:

>If you are sure you can overwrite the correct bad sectors, then do it.
>
>mdadm is much better than raidtools.  From what I have read, yes it is
>compatible.
>
>The below info is not required.
>Who makes your 6 disk drives?  And how old are they?  Any bets anyone?
>
>Guy
>
>-----Original Message-----
>From: BERNARD JOHN ZOLP [mailto:bjzolp@students.wisc.edu] 
>Sent: Monday, November 29, 2004 3:57 PM
>To: Guy
>Cc: linux-raid@vger.kernel.org
>Subject: Re: RE: RAID5 Not coming back up after crash
>
>Just a few follow up questions before I dive into this.  Will mdadm work
>with a RAID setup created with the older raidtools package that came
>with my SuSE installation?
>  Assuming the drive with bad blocks is not getting worse, dont think it
>is -- but you never know, could I map them out by writing to those
>sectors with dd and then running the command to bring the array back
>online?  Or should I wait for the RMA of the flakey drive and dd_rescue
>to the new one and bring that up?
>
>Thanks again,
>bjz
>
>----- Original Message -----
>From: Guy <bugzilla@watkins-home.com>
>Date: Monday, November 29, 2004 11:40 am
>Subject: RE: RAID5 Not coming back up after crash
>
>  
>
>>You can recover, but not with bad blocks.
>>
>>This command should get your array back on-line:
>>mdadm -A /dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1 
>>/dev/hdj1
>>But, as soon as md reads a bad block it will fail the disk and your 
>>arraywill be off-line.
>>
>>If you have an extra disk, you could attempt to copy the disk 
>>first, then
>>replace the disk with the read error with the copy.
>>
>>dd_rescue can copy a disk with read errors.
>>
>>Also, it is common for a disk to grow bad spots over time.  These 
>>bad spots
>>(sectors) can be re-mapped by the drive to a spare sector.  This re-
>>mappingwill occur when an attempt is made to write to the bad 
>>sector.  So, you can
>>repair your disk by writing to the bad sectors.  But, be careful 
>>not to
>>overwrite good data.  I have done this using dd.  First I found the 
>>badsector with dd, then I wrote to the 1 bad sector with dd.  I 
>>would need to
>>refer to the man page to do it again, so I can't explain it here at 
>>thistime.  It is not really hard, but 1 small mistake, and "that's 
>>it man, game
>>over man, game over".
>>
>>Guy
>>
>>
>>-----Original Message-----
>>From: linux-raid-owner@vger.kernel.org
>>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of B. J. Zolp
>>Sent: Monday, November 29, 2004 11:33 AM
>>To: linux-raid@vger.kernel.org
>>Subject: RAID5 Not coming back up after crash
>>
>>I have a RAID5 setup on my fileserver using disks hda1 hdb1 hdc1 
>>hdd1 
>>hdi1 and hdj1.  Yesterday I started moving a large chunk of files 
>>~80GB 
>>from this array to a stand alone drive in the system and about 
>>halfway 
>>through the mv I got a ton of PERMISSION DENIED errors some of the 
>>remaining files left to be moved and the move process quit.  I did 
>>a ls 
>>of the raid directory and got PERMISSION DENIED on the same files 
>>that 
>>errored out on the mv while some of the other files looked fine.  I 
>>figured it might be a good idea to take the raid down and back up 
>>again 
>>(probably a mistake) and I could not reboot the machine without 
>>physically turning it off as some processes were hung.  Upon 
>>booting 
>>back up, the raid did not come online stating that hdj1 was kicked 
>>due 
>>to inconsistency.  Additionally hdb1 is listed as offline too.  So 
>>I 
>>have 2 drives that are not cooperating.  I have a hunch hdb1 might 
>>have 
>>not been working for some time.
>>
>>I found some info stating that if you mark the drive that failed 
>>first 
>>as "failed-drive" and try a  "mkraid --force --dangerous-no-resync 
>>/dev/md0" then I might have some luck getting my files back.  From 
>>my 
>>logs I can see that all the working drives have event counter: 
>>00000022 
>>and hdj1 has event counter: 00000021 and hdb1 has event counter: 
>>00000001.  Does this mean that hdb1 failed a log time ago or is 
>>this 
>>difference in event counters likely within a few minutes fo each 
>>other?  
>>I just ran badblocks on both hdb1 and hdj1 and found 1 bad block on 
>>hdb1 
>>and about 15 on hdj1, would that be enough to cause my raid to get 
>>this 
>>out of whack?  In any case I plan to replace those drives, but 
>>would the 
>>method above be the best route once I have copied the raw data to 
>>the 
>>new drives in order to bring my raid back up?
>>
>>
>>Thanks,
>>
>>bjz
>>
>>here is my log from when I run raidstart /dev/md0:
>>
>>Nov 29 10:10:19 orion kernel:  [events: 00000022]
>>Nov 29 10:10:19 orion last message repeated 3 times
>>Nov 29 10:10:19 orion kernel:  [events: 00000021]
>>Nov 29 10:10:19 orion kernel: md: autorun ...
>>Nov 29 10:10:19 orion kernel: md: considering hdj1 ...
>>Nov 29 10:10:19 orion kernel: md:  adding hdj1 ...
>>Nov 29 10:10:19 orion kernel: md:  adding hdi1 ...
>>Nov 29 10:10:19 orion kernel: md:  adding hdd1 ...
>>Nov 29 10:10:19 orion kernel: md:  adding hdc1 ...
>>Nov 29 10:10:19 orion kernel: md:  adding hda1 ...
>>Nov 29 10:10:19 orion kernel: md: created md0
>>Nov 29 10:10:19 orion kernel: md: bind<hda1,1>
>>Nov 29 10:10:19 orion kernel: md: bind<hdc1,2>
>>Nov 29 10:10:19 orion kernel: md: bind<hdd1,3>
>>Nov 29 10:10:19 orion kernel: md: bind<hdi1,4>
>>Nov 29 10:10:19 orion kernel: md: bind<hdj1,5>
>>Nov 29 10:10:19 orion kernel: md: running: 
>><hdj1><hdi1><hdd1><hdc1><hda1>Nov 29 10:10:19 orion kernel: md: 
>>hdj1's event counter: 00000021
>>Nov 29 10:10:19 orion kernel: md: hdi1's event counter: 00000022
>>Nov 29 10:10:19 orion kernel: md: hdd1's event counter: 00000022
>>Nov 29 10:10:19 orion kernel: md: hdc1's event counter: 00000022
>>Nov 29 10:10:19 orion kernel: md: hda1's event counter: 00000022
>>Nov 29 10:10:19 orion kernel: md: superblock update time 
>>inconsistency 
>>-- using the most recent one
>>Nov 29 10:10:19 orion kernel: md: freshest: hdi1
>>Nov 29 10:10:19 orion kernel: md0: kicking faulty hdj1!
>>Nov 29 10:10:19 orion kernel: md: unbind<hdj1,4>
>>Nov 29 10:10:19 orion kernel: md: export_rdev(hdj1)
>>Nov 29 10:10:19 orion kernel: md: md0: raid array is not clean -- 
>>starting background reconstruction
>>Nov 29 10:10:19 orion kernel: md0: max total readahead window set 
>>to 2560k
>>Nov 29 10:10:19 orion kernel: md0: 5 data-disks, max readahead per 
>>data-disk: 512k
>>Nov 29 10:10:19 orion kernel: raid5: device hdi1 operational as 
>>raid disk 4
>>Nov 29 10:10:19 orion kernel: raid5: device hdd1 operational as 
>>raid disk 3
>>Nov 29 10:10:19 orion kernel: raid5: device hdc1 operational as 
>>raid disk 2
>>Nov 29 10:10:19 orion kernel: raid5: device hda1 operational as 
>>raid disk 0
>>Nov 29 10:10:19 orion kernel: raid5: not enough operational devices 
>>for 
>>md0 (2/6 failed)
>>Nov 29 10:10:19 orion kernel: RAID5 conf printout:
>>Nov 29 10:10:19 orion kernel:  --- rd:6 wd:4 fd:2
>>Nov 29 10:10:19 orion kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 
>>dev:hda1Nov 29 10:10:19 orion kernel:  disk 1, s:0, o:0, n:1 rd:1 
>>us:1 dev:[dev 
>>00:00]
>>Nov 29 10:10:19 orion kernel:  disk 2, s:0, o:1, n:2 rd:2 us:1 
>>dev:hdc1Nov 29 10:10:19 orion kernel:  disk 3, s:0, o:1, n:3 rd:3 
>>us:1 dev:hdd1
>>Nov 29 10:10:19 orion kernel:  disk 4, s:0, o:1, n:4 rd:4 us:1 
>>dev:hdi1Nov 29 10:10:19 orion kernel:  disk 5, s:0, o:0, n:5 rd:5 
>>us:1 dev:[dev 
>>00:00]
>>Nov 29 10:10:19 orion kernel: raid5: failed to run raid set md0
>>Nov 29 10:10:19 orion kernel: md: pers->run() failed ...
>>Nov 29 10:10:19 orion kernel: md :do_md_run() returned -22
>>Nov 29 10:10:19 orion kernel: md: md0 stopped.
>>Nov 29 10:10:19 orion kernel: md: unbind<hdi1,3>
>>Nov 29 10:10:19 orion kernel: md: export_rdev(hdi1)
>>Nov 29 10:10:19 orion kernel: md: unbind<hdd1,2>
>>Nov 29 10:10:19 orion kernel: md: export_rdev(hdd1)
>>Nov 29 10:10:19 orion kernel: md: unbind<hdc1,1>
>>Nov 29 10:10:19 orion kernel: md: export_rdev(hdc1)
>>Nov 29 10:10:19 orion kernel: md: unbind<hda1,0>
>>Nov 29 10:10:19 orion kernel: md: export_rdev(hda1)
>>Nov 29 10:10:19 orion kernel: md: ... autorun DONE.
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-
>>raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>-- 
>>No virus found in this incoming message.
>>Checked by AVG Anti-Virus.
>>Version: 7.0.289 / Virus Database: 265.4.3 - Release Date: 11/26/2004
>>
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-
>>raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>    
>>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>  
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID5 Not coming back up after crash
  2004-11-30  5:38   ` B. J. Zolp
@ 2004-11-30  5:45     ` Neil Brown
  2004-11-30  5:48       ` B. J. Zolp
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2004-11-30  5:45 UTC (permalink / raw)
  To: B. J. Zolp; +Cc: Guy, 'BERNARD JOHN ZOLP', linux-raid

On Monday November 29, bjzolp@wisc.edu wrote:
> I found a spare new drive that I copied hdj1 onto and put the new drive 
> on the proper IDE cable for hdj.  Then tried running the mdadm -A 
> /dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1 /dev/hdj1
> But it seg faults.  Should I try the  mkraid --force 
> --dangerous-no-resync /dev/md0   ???
> 

which version of mdadm segfaults? 
If you aren't using 1.8.0, use that.

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID5 Not coming back up after crash
  2004-11-30  5:45     ` Neil Brown
@ 2004-11-30  5:48       ` B. J. Zolp
  2004-11-30  5:54         ` Neil Brown
  0 siblings, 1 reply; 10+ messages in thread
From: B. J. Zolp @ 2004-11-30  5:48 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Its 1.8.1.

bjz

Neil Brown wrote:

>On Monday November 29, bjzolp@wisc.edu wrote:
>  
>
>>I found a spare new drive that I copied hdj1 onto and put the new drive 
>>on the proper IDE cable for hdj.  Then tried running the mdadm -A 
>>/dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1 /dev/hdj1
>>But it seg faults.  Should I try the  mkraid --force 
>>--dangerous-no-resync /dev/md0   ???
>>
>>    
>>
>
>which version of mdadm segfaults? 
>If you aren't using 1.8.0, use that.
>
>NeilBrown
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>  
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID5 Not coming back up after crash
  2004-11-30  5:48       ` B. J. Zolp
@ 2004-11-30  5:54         ` Neil Brown
  2004-11-30  6:33           ` THANKS!! was:Re: " B. J. Zolp
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2004-11-30  5:54 UTC (permalink / raw)
  To: B. J. Zolp; +Cc: linux-raid

On Monday November 29, bjzolp@wisc.edu wrote:
> Its 1.8.1.

That is development code.  It is buggy.  Only use it for testing and
giving me feedback (like it says in the release notes).

: This is a "development" release of mdadm.  It should *not* be
: considered stable and should be used primarily for testing.
: The current "stable" version is 1.8.0.

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* THANKS!! was:Re: RAID5 Not coming back up after crash
  2004-11-30  5:54         ` Neil Brown
@ 2004-11-30  6:33           ` B. J. Zolp
  0 siblings, 0 replies; 10+ messages in thread
From: B. J. Zolp @ 2004-11-30  6:33 UTC (permalink / raw)
  To: linux-raid

I am sure this does not get said enough, so I just wanted to thank Neil 
Brown and Guy for their help and speedy replies on this list.  I was 
able to get my raid up in no time due to their help.

Thanks,

bjz


Neil Brown wrote:

>On Monday November 29, bjzolp@wisc.edu wrote:
>  
>
>>Its 1.8.1.
>>    
>>
>
>That is development code.  It is buggy.  Only use it for testing and
>giving me feedback (like it says in the release notes).
>
>
>: This is a "development" release of mdadm.  It should *not* be
>: considered stable and should be used primarily for testing.
>: The current "stable" version is 1.8.0.
>
>NeilBrown
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>  
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RAID5 Not coming back up after crash
@ 2004-11-29 16:33 B. J. Zolp
  2004-11-29 17:40 ` Guy
  0 siblings, 1 reply; 10+ messages in thread
From: B. J. Zolp @ 2004-11-29 16:33 UTC (permalink / raw)
  To: linux-raid

I have a RAID5 setup on my fileserver using disks hda1 hdb1 hdc1 hdd1 
hdi1 and hdj1.  Yesterday I started moving a large chunk of files ~80GB 
from this array to a stand alone drive in the system and about halfway 
through the mv I got a ton of PERMISSION DENIED errors some of the 
remaining files left to be moved and the move process quit.  I did a ls 
of the raid directory and got PERMISSION DENIED on the same files that 
errored out on the mv while some of the other files looked fine.  I 
figured it might be a good idea to take the raid down and back up again 
(probably a mistake) and I could not reboot the machine without 
physically turning it off as some processes were hung.  Upon booting 
back up, the raid did not come online stating that hdj1 was kicked due 
to inconsistency.  Additionally hdb1 is listed as offline too.  So I 
have 2 drives that are not cooperating.  I have a hunch hdb1 might have 
not been working for some time.

I found some info stating that if you mark the drive that failed first 
as "failed-drive" and try a  "mkraid --force --dangerous-no-resync 
/dev/md0" then I might have some luck getting my files back.  From my 
logs I can see that all the working drives have event counter: 00000022 
and hdj1 has event counter: 00000021 and hdb1 has event counter: 
00000001.  Does this mean that hdb1 failed a log time ago or is this 
difference in event counters likely within a few minutes fo each other?  
I just ran badblocks on both hdb1 and hdj1 and found 1 bad block on hdb1 
and about 15 on hdj1, would that be enough to cause my raid to get this 
out of whack?  In any case I plan to replace those drives, but would the 
method above be the best route once I have copied the raw data to the 
new drives in order to bring my raid back up?

Thanks,

bjz

here is my log from when I run raidstart /dev/md0:

Nov 29 10:10:19 orion kernel:  [events: 00000022]
Nov 29 10:10:19 orion last message repeated 3 times
Nov 29 10:10:19 orion kernel:  [events: 00000021]
Nov 29 10:10:19 orion kernel: md: autorun ...
Nov 29 10:10:19 orion kernel: md: considering hdj1 ...
Nov 29 10:10:19 orion kernel: md:  adding hdj1 ...
Nov 29 10:10:19 orion kernel: md:  adding hdi1 ...
Nov 29 10:10:19 orion kernel: md:  adding hdd1 ...
Nov 29 10:10:19 orion kernel: md:  adding hdc1 ...
Nov 29 10:10:19 orion kernel: md:  adding hda1 ...
Nov 29 10:10:19 orion kernel: md: created md0
Nov 29 10:10:19 orion kernel: md: bind<hda1,1>
Nov 29 10:10:19 orion kernel: md: bind<hdc1,2>
Nov 29 10:10:19 orion kernel: md: bind<hdd1,3>
Nov 29 10:10:19 orion kernel: md: bind<hdi1,4>
Nov 29 10:10:19 orion kernel: md: bind<hdj1,5>
Nov 29 10:10:19 orion kernel: md: running: <hdj1><hdi1><hdd1><hdc1><hda1>
Nov 29 10:10:19 orion kernel: md: hdj1's event counter: 00000021
Nov 29 10:10:19 orion kernel: md: hdi1's event counter: 00000022
Nov 29 10:10:19 orion kernel: md: hdd1's event counter: 00000022
Nov 29 10:10:19 orion kernel: md: hdc1's event counter: 00000022
Nov 29 10:10:19 orion kernel: md: hda1's event counter: 00000022
Nov 29 10:10:19 orion kernel: md: superblock update time inconsistency 
-- using the most recent one
Nov 29 10:10:19 orion kernel: md: freshest: hdi1
Nov 29 10:10:19 orion kernel: md0: kicking faulty hdj1!
Nov 29 10:10:19 orion kernel: md: unbind<hdj1,4>
Nov 29 10:10:19 orion kernel: md: export_rdev(hdj1)
Nov 29 10:10:19 orion kernel: md: md0: raid array is not clean -- 
starting background reconstruction
Nov 29 10:10:19 orion kernel: md0: max total readahead window set to 2560k
Nov 29 10:10:19 orion kernel: md0: 5 data-disks, max readahead per 
data-disk: 512k
Nov 29 10:10:19 orion kernel: raid5: device hdi1 operational as raid disk 4
Nov 29 10:10:19 orion kernel: raid5: device hdd1 operational as raid disk 3
Nov 29 10:10:19 orion kernel: raid5: device hdc1 operational as raid disk 2
Nov 29 10:10:19 orion kernel: raid5: device hda1 operational as raid disk 0
Nov 29 10:10:19 orion kernel: raid5: not enough operational devices for 
md0 (2/6 failed)
Nov 29 10:10:19 orion kernel: RAID5 conf printout:
Nov 29 10:10:19 orion kernel:  --- rd:6 wd:4 fd:2
Nov 29 10:10:19 orion kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:hda1
Nov 29 10:10:19 orion kernel:  disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 
00:00]
Nov 29 10:10:19 orion kernel:  disk 2, s:0, o:1, n:2 rd:2 us:1 dev:hdc1
Nov 29 10:10:19 orion kernel:  disk 3, s:0, o:1, n:3 rd:3 us:1 dev:hdd1
Nov 29 10:10:19 orion kernel:  disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdi1
Nov 29 10:10:19 orion kernel:  disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 
00:00]
Nov 29 10:10:19 orion kernel: raid5: failed to run raid set md0
Nov 29 10:10:19 orion kernel: md: pers->run() failed ...
Nov 29 10:10:19 orion kernel: md :do_md_run() returned -22
Nov 29 10:10:19 orion kernel: md: md0 stopped.
Nov 29 10:10:19 orion kernel: md: unbind<hdi1,3>
Nov 29 10:10:19 orion kernel: md: export_rdev(hdi1)
Nov 29 10:10:19 orion kernel: md: unbind<hdd1,2>
Nov 29 10:10:19 orion kernel: md: export_rdev(hdd1)
Nov 29 10:10:19 orion kernel: md: unbind<hdc1,1>
Nov 29 10:10:19 orion kernel: md: export_rdev(hdc1)
Nov 29 10:10:19 orion kernel: md: unbind<hda1,0>
Nov 29 10:10:19 orion kernel: md: export_rdev(hda1)
Nov 29 10:10:19 orion kernel: md: ... autorun DONE.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: RAID5 Not coming back up after crash
  2004-11-29 16:33 B. J. Zolp
@ 2004-11-29 17:40 ` Guy
  2004-11-30 21:29   ` Frank van Maarseveen
  0 siblings, 1 reply; 10+ messages in thread
From: Guy @ 2004-11-29 17:40 UTC (permalink / raw)
  To: 'B. J. Zolp', linux-raid

You can recover, but not with bad blocks.

This command should get your array back on-line:
mdadm -A /dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1 /dev/hdj1

But, as soon as md reads a bad block it will fail the disk and your array
will be off-line.

If you have an extra disk, you could attempt to copy the disk first, then
replace the disk with the read error with the copy.

dd_rescue can copy a disk with read errors.

Also, it is common for a disk to grow bad spots over time.  These bad spots
(sectors) can be re-mapped by the drive to a spare sector.  This re-mapping
will occur when an attempt is made to write to the bad sector.  So, you can
repair your disk by writing to the bad sectors.  But, be careful not to
overwrite good data.  I have done this using dd.  First I found the bad
sector with dd, then I wrote to the 1 bad sector with dd.  I would need to
refer to the man page to do it again, so I can't explain it here at this
time.  It is not really hard, but 1 small mistake, and "that's it man, game
over man, game over".

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of B. J. Zolp
Sent: Monday, November 29, 2004 11:33 AM
To: linux-raid@vger.kernel.org
Subject: RAID5 Not coming back up after crash

I have a RAID5 setup on my fileserver using disks hda1 hdb1 hdc1 hdd1 
hdi1 and hdj1.  Yesterday I started moving a large chunk of files ~80GB 
from this array to a stand alone drive in the system and about halfway 
through the mv I got a ton of PERMISSION DENIED errors some of the 
remaining files left to be moved and the move process quit.  I did a ls 
of the raid directory and got PERMISSION DENIED on the same files that 
errored out on the mv while some of the other files looked fine.  I 
figured it might be a good idea to take the raid down and back up again 
(probably a mistake) and I could not reboot the machine without 
physically turning it off as some processes were hung.  Upon booting 
back up, the raid did not come online stating that hdj1 was kicked due 
to inconsistency.  Additionally hdb1 is listed as offline too.  So I 
have 2 drives that are not cooperating.  I have a hunch hdb1 might have 
not been working for some time.

I found some info stating that if you mark the drive that failed first 
as "failed-drive" and try a  "mkraid --force --dangerous-no-resync 
/dev/md0" then I might have some luck getting my files back.  From my 
logs I can see that all the working drives have event counter: 00000022 
and hdj1 has event counter: 00000021 and hdb1 has event counter: 
00000001.  Does this mean that hdb1 failed a log time ago or is this 
difference in event counters likely within a few minutes fo each other?  
I just ran badblocks on both hdb1 and hdj1 and found 1 bad block on hdb1 
and about 15 on hdj1, would that be enough to cause my raid to get this 
out of whack?  In any case I plan to replace those drives, but would the 
method above be the best route once I have copied the raw data to the 
new drives in order to bring my raid back up?

Thanks,

bjz

here is my log from when I run raidstart /dev/md0:

Nov 29 10:10:19 orion kernel:  [events: 00000022]
Nov 29 10:10:19 orion last message repeated 3 times
Nov 29 10:10:19 orion kernel:  [events: 00000021]
Nov 29 10:10:19 orion kernel: md: autorun ...
Nov 29 10:10:19 orion kernel: md: considering hdj1 ...
Nov 29 10:10:19 orion kernel: md:  adding hdj1 ...
Nov 29 10:10:19 orion kernel: md:  adding hdi1 ...
Nov 29 10:10:19 orion kernel: md:  adding hdd1 ...
Nov 29 10:10:19 orion kernel: md:  adding hdc1 ...
Nov 29 10:10:19 orion kernel: md:  adding hda1 ...
Nov 29 10:10:19 orion kernel: md: created md0
Nov 29 10:10:19 orion kernel: md: bind<hda1,1>
Nov 29 10:10:19 orion kernel: md: bind<hdc1,2>
Nov 29 10:10:19 orion kernel: md: bind<hdd1,3>
Nov 29 10:10:19 orion kernel: md: bind<hdi1,4>
Nov 29 10:10:19 orion kernel: md: bind<hdj1,5>
Nov 29 10:10:19 orion kernel: md: running: <hdj1><hdi1><hdd1><hdc1><hda1>
Nov 29 10:10:19 orion kernel: md: hdj1's event counter: 00000021
Nov 29 10:10:19 orion kernel: md: hdi1's event counter: 00000022
Nov 29 10:10:19 orion kernel: md: hdd1's event counter: 00000022
Nov 29 10:10:19 orion kernel: md: hdc1's event counter: 00000022
Nov 29 10:10:19 orion kernel: md: hda1's event counter: 00000022
Nov 29 10:10:19 orion kernel: md: superblock update time inconsistency 
-- using the most recent one
Nov 29 10:10:19 orion kernel: md: freshest: hdi1
Nov 29 10:10:19 orion kernel: md0: kicking faulty hdj1!
Nov 29 10:10:19 orion kernel: md: unbind<hdj1,4>
Nov 29 10:10:19 orion kernel: md: export_rdev(hdj1)
Nov 29 10:10:19 orion kernel: md: md0: raid array is not clean -- 
starting background reconstruction
Nov 29 10:10:19 orion kernel: md0: max total readahead window set to 2560k
Nov 29 10:10:19 orion kernel: md0: 5 data-disks, max readahead per 
data-disk: 512k
Nov 29 10:10:19 orion kernel: raid5: device hdi1 operational as raid disk 4
Nov 29 10:10:19 orion kernel: raid5: device hdd1 operational as raid disk 3
Nov 29 10:10:19 orion kernel: raid5: device hdc1 operational as raid disk 2
Nov 29 10:10:19 orion kernel: raid5: device hda1 operational as raid disk 0
Nov 29 10:10:19 orion kernel: raid5: not enough operational devices for 
md0 (2/6 failed)
Nov 29 10:10:19 orion kernel: RAID5 conf printout:
Nov 29 10:10:19 orion kernel:  --- rd:6 wd:4 fd:2
Nov 29 10:10:19 orion kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:hda1
Nov 29 10:10:19 orion kernel:  disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 
00:00]
Nov 29 10:10:19 orion kernel:  disk 2, s:0, o:1, n:2 rd:2 us:1 dev:hdc1
Nov 29 10:10:19 orion kernel:  disk 3, s:0, o:1, n:3 rd:3 us:1 dev:hdd1
Nov 29 10:10:19 orion kernel:  disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdi1
Nov 29 10:10:19 orion kernel:  disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 
00:00]
Nov 29 10:10:19 orion kernel: raid5: failed to run raid set md0
Nov 29 10:10:19 orion kernel: md: pers->run() failed ...
Nov 29 10:10:19 orion kernel: md :do_md_run() returned -22
Nov 29 10:10:19 orion kernel: md: md0 stopped.
Nov 29 10:10:19 orion kernel: md: unbind<hdi1,3>
Nov 29 10:10:19 orion kernel: md: export_rdev(hdi1)
Nov 29 10:10:19 orion kernel: md: unbind<hdd1,2>
Nov 29 10:10:19 orion kernel: md: export_rdev(hdd1)
Nov 29 10:10:19 orion kernel: md: unbind<hdc1,1>
Nov 29 10:10:19 orion kernel: md: export_rdev(hdc1)
Nov 29 10:10:19 orion kernel: md: unbind<hda1,0>
Nov 29 10:10:19 orion kernel: md: export_rdev(hda1)
Nov 29 10:10:19 orion kernel: md: ... autorun DONE.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.289 / Virus Database: 265.4.3 - Release Date: 11/26/2004

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID5 Not coming back up after crash
  2004-11-29 17:40 ` Guy
@ 2004-11-30 21:29   ` Frank van Maarseveen
  0 siblings, 0 replies; 10+ messages in thread
From: Frank van Maarseveen @ 2004-11-30 21:29 UTC (permalink / raw)
  To: Guy; +Cc: 'B. J. Zolp', linux-raid

On Mon, Nov 29, 2004 at 12:40:54PM -0500, Guy wrote:
> 
> Also, it is common for a disk to grow bad spots over time.  These bad spots
> (sectors) can be re-mapped by the drive to a spare sector.  This re-mapping
> will occur when an attempt is made to write to the bad sector.  So, you can
> repair your disk by writing to the bad sectors.  But, be careful not to
> overwrite good data.  I have done this using dd.

If you know the pathname of the file containing the bad spot (and have a copy
of the file then you could use shred(1):

NAME
       shred  -  delete a file securely, first overwriting it to hide its con-
       tents

SYNOPSIS
       shred [OPTIONS] FILE [...]
...


-- 
Frank

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-11-30 21:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-29 20:56 RE: RAID5 Not coming back up after crash BERNARD JOHN ZOLP
2004-11-29 22:29 ` Guy
2004-11-30  5:38   ` B. J. Zolp
2004-11-30  5:45     ` Neil Brown
2004-11-30  5:48       ` B. J. Zolp
2004-11-30  5:54         ` Neil Brown
2004-11-30  6:33           ` THANKS!! was:Re: " B. J. Zolp
  -- strict thread matches above, loose matches on Subject: below --
2004-11-29 16:33 B. J. Zolp
2004-11-29 17:40 ` Guy
2004-11-30 21:29   ` Frank van Maarseveen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).