* Re: RE: RAID5 Not coming back up after crash
@ 2004-11-29 20:56 BERNARD JOHN ZOLP
2004-11-29 22:29 ` Guy
0 siblings, 1 reply; 7+ messages in thread
From: BERNARD JOHN ZOLP @ 2004-11-29 20:56 UTC (permalink / raw)
To: Guy; +Cc: linux-raid
Just a few follow up questions before I dive into this. Will mdadm work
with a RAID setup created with the older raidtools package that came
with my SuSE installation?
Assuming the drive with bad blocks is not getting worse, dont think it
is -- but you never know, could I map them out by writing to those
sectors with dd and then running the command to bring the array back
online? Or should I wait for the RMA of the flakey drive and dd_rescue
to the new one and bring that up?
Thanks again,
bjz
----- Original Message -----
From: Guy <bugzilla@watkins-home.com>
Date: Monday, November 29, 2004 11:40 am
Subject: RE: RAID5 Not coming back up after crash
> You can recover, but not with bad blocks.
>
> This command should get your array back on-line:
> mdadm -A /dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1
> /dev/hdj1
> But, as soon as md reads a bad block it will fail the disk and your
> arraywill be off-line.
>
> If you have an extra disk, you could attempt to copy the disk
> first, then
> replace the disk with the read error with the copy.
>
> dd_rescue can copy a disk with read errors.
>
> Also, it is common for a disk to grow bad spots over time. These
> bad spots
> (sectors) can be re-mapped by the drive to a spare sector. This re-
> mappingwill occur when an attempt is made to write to the bad
> sector. So, you can
> repair your disk by writing to the bad sectors. But, be careful
> not to
> overwrite good data. I have done this using dd. First I found the
> badsector with dd, then I wrote to the 1 bad sector with dd. I
> would need to
> refer to the man page to do it again, so I can't explain it here at
> thistime. It is not really hard, but 1 small mistake, and "that's
> it man, game
> over man, game over".
>
> Guy
>
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of B. J. Zolp
> Sent: Monday, November 29, 2004 11:33 AM
> To: linux-raid@vger.kernel.org
> Subject: RAID5 Not coming back up after crash
>
> I have a RAID5 setup on my fileserver using disks hda1 hdb1 hdc1
> hdd1
> hdi1 and hdj1. Yesterday I started moving a large chunk of files
> ~80GB
> from this array to a stand alone drive in the system and about
> halfway
> through the mv I got a ton of PERMISSION DENIED errors some of the
> remaining files left to be moved and the move process quit. I did
> a ls
> of the raid directory and got PERMISSION DENIED on the same files
> that
> errored out on the mv while some of the other files looked fine. I
> figured it might be a good idea to take the raid down and back up
> again
> (probably a mistake) and I could not reboot the machine without
> physically turning it off as some processes were hung. Upon
> booting
> back up, the raid did not come online stating that hdj1 was kicked
> due
> to inconsistency. Additionally hdb1 is listed as offline too. So
> I
> have 2 drives that are not cooperating. I have a hunch hdb1 might
> have
> not been working for some time.
>
> I found some info stating that if you mark the drive that failed
> first
> as "failed-drive" and try a "mkraid --force --dangerous-no-resync
> /dev/md0" then I might have some luck getting my files back. From
> my
> logs I can see that all the working drives have event counter:
> 00000022
> and hdj1 has event counter: 00000021 and hdb1 has event counter:
> 00000001. Does this mean that hdb1 failed a log time ago or is
> this
> difference in event counters likely within a few minutes fo each
> other?
> I just ran badblocks on both hdb1 and hdj1 and found 1 bad block on
> hdb1
> and about 15 on hdj1, would that be enough to cause my raid to get
> this
> out of whack? In any case I plan to replace those drives, but
> would the
> method above be the best route once I have copied the raw data to
> the
> new drives in order to bring my raid back up?
>
>
> Thanks,
>
> bjz
>
> here is my log from when I run raidstart /dev/md0:
>
> Nov 29 10:10:19 orion kernel: [events: 00000022]
> Nov 29 10:10:19 orion last message repeated 3 times
> Nov 29 10:10:19 orion kernel: [events: 00000021]
> Nov 29 10:10:19 orion kernel: md: autorun ...
> Nov 29 10:10:19 orion kernel: md: considering hdj1 ...
> Nov 29 10:10:19 orion kernel: md: adding hdj1 ...
> Nov 29 10:10:19 orion kernel: md: adding hdi1 ...
> Nov 29 10:10:19 orion kernel: md: adding hdd1 ...
> Nov 29 10:10:19 orion kernel: md: adding hdc1 ...
> Nov 29 10:10:19 orion kernel: md: adding hda1 ...
> Nov 29 10:10:19 orion kernel: md: created md0
> Nov 29 10:10:19 orion kernel: md: bind<hda1,1>
> Nov 29 10:10:19 orion kernel: md: bind<hdc1,2>
> Nov 29 10:10:19 orion kernel: md: bind<hdd1,3>
> Nov 29 10:10:19 orion kernel: md: bind<hdi1,4>
> Nov 29 10:10:19 orion kernel: md: bind<hdj1,5>
> Nov 29 10:10:19 orion kernel: md: running:
> <hdj1><hdi1><hdd1><hdc1><hda1>Nov 29 10:10:19 orion kernel: md:
> hdj1's event counter: 00000021
> Nov 29 10:10:19 orion kernel: md: hdi1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hdd1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hdc1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hda1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: superblock update time
> inconsistency
> -- using the most recent one
> Nov 29 10:10:19 orion kernel: md: freshest: hdi1
> Nov 29 10:10:19 orion kernel: md0: kicking faulty hdj1!
> Nov 29 10:10:19 orion kernel: md: unbind<hdj1,4>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdj1)
> Nov 29 10:10:19 orion kernel: md: md0: raid array is not clean --
> starting background reconstruction
> Nov 29 10:10:19 orion kernel: md0: max total readahead window set
> to 2560k
> Nov 29 10:10:19 orion kernel: md0: 5 data-disks, max readahead per
> data-disk: 512k
> Nov 29 10:10:19 orion kernel: raid5: device hdi1 operational as
> raid disk 4
> Nov 29 10:10:19 orion kernel: raid5: device hdd1 operational as
> raid disk 3
> Nov 29 10:10:19 orion kernel: raid5: device hdc1 operational as
> raid disk 2
> Nov 29 10:10:19 orion kernel: raid5: device hda1 operational as
> raid disk 0
> Nov 29 10:10:19 orion kernel: raid5: not enough operational devices
> for
> md0 (2/6 failed)
> Nov 29 10:10:19 orion kernel: RAID5 conf printout:
> Nov 29 10:10:19 orion kernel: --- rd:6 wd:4 fd:2
> Nov 29 10:10:19 orion kernel: disk 0, s:0, o:1, n:0 rd:0 us:1
> dev:hda1Nov 29 10:10:19 orion kernel: disk 1, s:0, o:0, n:1 rd:1
> us:1 dev:[dev
> 00:00]
> Nov 29 10:10:19 orion kernel: disk 2, s:0, o:1, n:2 rd:2 us:1
> dev:hdc1Nov 29 10:10:19 orion kernel: disk 3, s:0, o:1, n:3 rd:3
> us:1 dev:hdd1
> Nov 29 10:10:19 orion kernel: disk 4, s:0, o:1, n:4 rd:4 us:1
> dev:hdi1Nov 29 10:10:19 orion kernel: disk 5, s:0, o:0, n:5 rd:5
> us:1 dev:[dev
> 00:00]
> Nov 29 10:10:19 orion kernel: raid5: failed to run raid set md0
> Nov 29 10:10:19 orion kernel: md: pers->run() failed ...
> Nov 29 10:10:19 orion kernel: md :do_md_run() returned -22
> Nov 29 10:10:19 orion kernel: md: md0 stopped.
> Nov 29 10:10:19 orion kernel: md: unbind<hdi1,3>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdi1)
> Nov 29 10:10:19 orion kernel: md: unbind<hdd1,2>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdd1)
> Nov 29 10:10:19 orion kernel: md: unbind<hdc1,1>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdc1)
> Nov 29 10:10:19 orion kernel: md: unbind<hda1,0>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hda1)
> Nov 29 10:10:19 orion kernel: md: ... autorun DONE.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> No virus found in this incoming message.
> Checked by AVG Anti-Virus.
> Version: 7.0.289 / Virus Database: 265.4.3 - Release Date: 11/26/2004
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: RE: RAID5 Not coming back up after crash
2004-11-29 20:56 RE: RAID5 Not coming back up after crash BERNARD JOHN ZOLP
@ 2004-11-29 22:29 ` Guy
2004-11-30 5:38 ` B. J. Zolp
0 siblings, 1 reply; 7+ messages in thread
From: Guy @ 2004-11-29 22:29 UTC (permalink / raw)
To: 'BERNARD JOHN ZOLP'; +Cc: linux-raid
If you are sure you can overwrite the correct bad sectors, then do it.
mdadm is much better than raidtools. From what I have read, yes it is
compatible.
The below info is not required.
Who makes your 6 disk drives? And how old are they? Any bets anyone?
Guy
-----Original Message-----
From: BERNARD JOHN ZOLP [mailto:bjzolp@students.wisc.edu]
Sent: Monday, November 29, 2004 3:57 PM
To: Guy
Cc: linux-raid@vger.kernel.org
Subject: Re: RE: RAID5 Not coming back up after crash
Just a few follow up questions before I dive into this. Will mdadm work
with a RAID setup created with the older raidtools package that came
with my SuSE installation?
Assuming the drive with bad blocks is not getting worse, dont think it
is -- but you never know, could I map them out by writing to those
sectors with dd and then running the command to bring the array back
online? Or should I wait for the RMA of the flakey drive and dd_rescue
to the new one and bring that up?
Thanks again,
bjz
----- Original Message -----
From: Guy <bugzilla@watkins-home.com>
Date: Monday, November 29, 2004 11:40 am
Subject: RE: RAID5 Not coming back up after crash
> You can recover, but not with bad blocks.
>
> This command should get your array back on-line:
> mdadm -A /dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1
> /dev/hdj1
> But, as soon as md reads a bad block it will fail the disk and your
> arraywill be off-line.
>
> If you have an extra disk, you could attempt to copy the disk
> first, then
> replace the disk with the read error with the copy.
>
> dd_rescue can copy a disk with read errors.
>
> Also, it is common for a disk to grow bad spots over time. These
> bad spots
> (sectors) can be re-mapped by the drive to a spare sector. This re-
> mappingwill occur when an attempt is made to write to the bad
> sector. So, you can
> repair your disk by writing to the bad sectors. But, be careful
> not to
> overwrite good data. I have done this using dd. First I found the
> badsector with dd, then I wrote to the 1 bad sector with dd. I
> would need to
> refer to the man page to do it again, so I can't explain it here at
> thistime. It is not really hard, but 1 small mistake, and "that's
> it man, game
> over man, game over".
>
> Guy
>
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of B. J. Zolp
> Sent: Monday, November 29, 2004 11:33 AM
> To: linux-raid@vger.kernel.org
> Subject: RAID5 Not coming back up after crash
>
> I have a RAID5 setup on my fileserver using disks hda1 hdb1 hdc1
> hdd1
> hdi1 and hdj1. Yesterday I started moving a large chunk of files
> ~80GB
> from this array to a stand alone drive in the system and about
> halfway
> through the mv I got a ton of PERMISSION DENIED errors some of the
> remaining files left to be moved and the move process quit. I did
> a ls
> of the raid directory and got PERMISSION DENIED on the same files
> that
> errored out on the mv while some of the other files looked fine. I
> figured it might be a good idea to take the raid down and back up
> again
> (probably a mistake) and I could not reboot the machine without
> physically turning it off as some processes were hung. Upon
> booting
> back up, the raid did not come online stating that hdj1 was kicked
> due
> to inconsistency. Additionally hdb1 is listed as offline too. So
> I
> have 2 drives that are not cooperating. I have a hunch hdb1 might
> have
> not been working for some time.
>
> I found some info stating that if you mark the drive that failed
> first
> as "failed-drive" and try a "mkraid --force --dangerous-no-resync
> /dev/md0" then I might have some luck getting my files back. From
> my
> logs I can see that all the working drives have event counter:
> 00000022
> and hdj1 has event counter: 00000021 and hdb1 has event counter:
> 00000001. Does this mean that hdb1 failed a log time ago or is
> this
> difference in event counters likely within a few minutes fo each
> other?
> I just ran badblocks on both hdb1 and hdj1 and found 1 bad block on
> hdb1
> and about 15 on hdj1, would that be enough to cause my raid to get
> this
> out of whack? In any case I plan to replace those drives, but
> would the
> method above be the best route once I have copied the raw data to
> the
> new drives in order to bring my raid back up?
>
>
> Thanks,
>
> bjz
>
> here is my log from when I run raidstart /dev/md0:
>
> Nov 29 10:10:19 orion kernel: [events: 00000022]
> Nov 29 10:10:19 orion last message repeated 3 times
> Nov 29 10:10:19 orion kernel: [events: 00000021]
> Nov 29 10:10:19 orion kernel: md: autorun ...
> Nov 29 10:10:19 orion kernel: md: considering hdj1 ...
> Nov 29 10:10:19 orion kernel: md: adding hdj1 ...
> Nov 29 10:10:19 orion kernel: md: adding hdi1 ...
> Nov 29 10:10:19 orion kernel: md: adding hdd1 ...
> Nov 29 10:10:19 orion kernel: md: adding hdc1 ...
> Nov 29 10:10:19 orion kernel: md: adding hda1 ...
> Nov 29 10:10:19 orion kernel: md: created md0
> Nov 29 10:10:19 orion kernel: md: bind<hda1,1>
> Nov 29 10:10:19 orion kernel: md: bind<hdc1,2>
> Nov 29 10:10:19 orion kernel: md: bind<hdd1,3>
> Nov 29 10:10:19 orion kernel: md: bind<hdi1,4>
> Nov 29 10:10:19 orion kernel: md: bind<hdj1,5>
> Nov 29 10:10:19 orion kernel: md: running:
> <hdj1><hdi1><hdd1><hdc1><hda1>Nov 29 10:10:19 orion kernel: md:
> hdj1's event counter: 00000021
> Nov 29 10:10:19 orion kernel: md: hdi1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hdd1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hdc1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: hda1's event counter: 00000022
> Nov 29 10:10:19 orion kernel: md: superblock update time
> inconsistency
> -- using the most recent one
> Nov 29 10:10:19 orion kernel: md: freshest: hdi1
> Nov 29 10:10:19 orion kernel: md0: kicking faulty hdj1!
> Nov 29 10:10:19 orion kernel: md: unbind<hdj1,4>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdj1)
> Nov 29 10:10:19 orion kernel: md: md0: raid array is not clean --
> starting background reconstruction
> Nov 29 10:10:19 orion kernel: md0: max total readahead window set
> to 2560k
> Nov 29 10:10:19 orion kernel: md0: 5 data-disks, max readahead per
> data-disk: 512k
> Nov 29 10:10:19 orion kernel: raid5: device hdi1 operational as
> raid disk 4
> Nov 29 10:10:19 orion kernel: raid5: device hdd1 operational as
> raid disk 3
> Nov 29 10:10:19 orion kernel: raid5: device hdc1 operational as
> raid disk 2
> Nov 29 10:10:19 orion kernel: raid5: device hda1 operational as
> raid disk 0
> Nov 29 10:10:19 orion kernel: raid5: not enough operational devices
> for
> md0 (2/6 failed)
> Nov 29 10:10:19 orion kernel: RAID5 conf printout:
> Nov 29 10:10:19 orion kernel: --- rd:6 wd:4 fd:2
> Nov 29 10:10:19 orion kernel: disk 0, s:0, o:1, n:0 rd:0 us:1
> dev:hda1Nov 29 10:10:19 orion kernel: disk 1, s:0, o:0, n:1 rd:1
> us:1 dev:[dev
> 00:00]
> Nov 29 10:10:19 orion kernel: disk 2, s:0, o:1, n:2 rd:2 us:1
> dev:hdc1Nov 29 10:10:19 orion kernel: disk 3, s:0, o:1, n:3 rd:3
> us:1 dev:hdd1
> Nov 29 10:10:19 orion kernel: disk 4, s:0, o:1, n:4 rd:4 us:1
> dev:hdi1Nov 29 10:10:19 orion kernel: disk 5, s:0, o:0, n:5 rd:5
> us:1 dev:[dev
> 00:00]
> Nov 29 10:10:19 orion kernel: raid5: failed to run raid set md0
> Nov 29 10:10:19 orion kernel: md: pers->run() failed ...
> Nov 29 10:10:19 orion kernel: md :do_md_run() returned -22
> Nov 29 10:10:19 orion kernel: md: md0 stopped.
> Nov 29 10:10:19 orion kernel: md: unbind<hdi1,3>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdi1)
> Nov 29 10:10:19 orion kernel: md: unbind<hdd1,2>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdd1)
> Nov 29 10:10:19 orion kernel: md: unbind<hdc1,1>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hdc1)
> Nov 29 10:10:19 orion kernel: md: unbind<hda1,0>
> Nov 29 10:10:19 orion kernel: md: export_rdev(hda1)
> Nov 29 10:10:19 orion kernel: md: ... autorun DONE.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> No virus found in this incoming message.
> Checked by AVG Anti-Virus.
> Version: 7.0.289 / Virus Database: 265.4.3 - Release Date: 11/26/2004
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID5 Not coming back up after crash
2004-11-29 22:29 ` Guy
@ 2004-11-30 5:38 ` B. J. Zolp
2004-11-30 5:45 ` Neil Brown
0 siblings, 1 reply; 7+ messages in thread
From: B. J. Zolp @ 2004-11-30 5:38 UTC (permalink / raw)
To: Guy; +Cc: 'BERNARD JOHN ZOLP', linux-raid
I found a spare new drive that I copied hdj1 onto and put the new drive
on the proper IDE cable for hdj. Then tried running the mdadm -A
/dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1 /dev/hdj1
But it seg faults. Should I try the mkraid --force
--dangerous-no-resync /dev/md0 ???
Thanks,
bjz
Guy wrote:
>If you are sure you can overwrite the correct bad sectors, then do it.
>
>mdadm is much better than raidtools. From what I have read, yes it is
>compatible.
>
>The below info is not required.
>Who makes your 6 disk drives? And how old are they? Any bets anyone?
>
>Guy
>
>-----Original Message-----
>From: BERNARD JOHN ZOLP [mailto:bjzolp@students.wisc.edu]
>Sent: Monday, November 29, 2004 3:57 PM
>To: Guy
>Cc: linux-raid@vger.kernel.org
>Subject: Re: RE: RAID5 Not coming back up after crash
>
>Just a few follow up questions before I dive into this. Will mdadm work
>with a RAID setup created with the older raidtools package that came
>with my SuSE installation?
> Assuming the drive with bad blocks is not getting worse, dont think it
>is -- but you never know, could I map them out by writing to those
>sectors with dd and then running the command to bring the array back
>online? Or should I wait for the RMA of the flakey drive and dd_rescue
>to the new one and bring that up?
>
>Thanks again,
>bjz
>
>----- Original Message -----
>From: Guy <bugzilla@watkins-home.com>
>Date: Monday, November 29, 2004 11:40 am
>Subject: RE: RAID5 Not coming back up after crash
>
>
>
>>You can recover, but not with bad blocks.
>>
>>This command should get your array back on-line:
>>mdadm -A /dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1
>>/dev/hdj1
>>But, as soon as md reads a bad block it will fail the disk and your
>>arraywill be off-line.
>>
>>If you have an extra disk, you could attempt to copy the disk
>>first, then
>>replace the disk with the read error with the copy.
>>
>>dd_rescue can copy a disk with read errors.
>>
>>Also, it is common for a disk to grow bad spots over time. These
>>bad spots
>>(sectors) can be re-mapped by the drive to a spare sector. This re-
>>mappingwill occur when an attempt is made to write to the bad
>>sector. So, you can
>>repair your disk by writing to the bad sectors. But, be careful
>>not to
>>overwrite good data. I have done this using dd. First I found the
>>badsector with dd, then I wrote to the 1 bad sector with dd. I
>>would need to
>>refer to the man page to do it again, so I can't explain it here at
>>thistime. It is not really hard, but 1 small mistake, and "that's
>>it man, game
>>over man, game over".
>>
>>Guy
>>
>>
>>-----Original Message-----
>>From: linux-raid-owner@vger.kernel.org
>>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of B. J. Zolp
>>Sent: Monday, November 29, 2004 11:33 AM
>>To: linux-raid@vger.kernel.org
>>Subject: RAID5 Not coming back up after crash
>>
>>I have a RAID5 setup on my fileserver using disks hda1 hdb1 hdc1
>>hdd1
>>hdi1 and hdj1. Yesterday I started moving a large chunk of files
>>~80GB
>>from this array to a stand alone drive in the system and about
>>halfway
>>through the mv I got a ton of PERMISSION DENIED errors some of the
>>remaining files left to be moved and the move process quit. I did
>>a ls
>>of the raid directory and got PERMISSION DENIED on the same files
>>that
>>errored out on the mv while some of the other files looked fine. I
>>figured it might be a good idea to take the raid down and back up
>>again
>>(probably a mistake) and I could not reboot the machine without
>>physically turning it off as some processes were hung. Upon
>>booting
>>back up, the raid did not come online stating that hdj1 was kicked
>>due
>>to inconsistency. Additionally hdb1 is listed as offline too. So
>>I
>>have 2 drives that are not cooperating. I have a hunch hdb1 might
>>have
>>not been working for some time.
>>
>>I found some info stating that if you mark the drive that failed
>>first
>>as "failed-drive" and try a "mkraid --force --dangerous-no-resync
>>/dev/md0" then I might have some luck getting my files back. From
>>my
>>logs I can see that all the working drives have event counter:
>>00000022
>>and hdj1 has event counter: 00000021 and hdb1 has event counter:
>>00000001. Does this mean that hdb1 failed a log time ago or is
>>this
>>difference in event counters likely within a few minutes fo each
>>other?
>>I just ran badblocks on both hdb1 and hdj1 and found 1 bad block on
>>hdb1
>>and about 15 on hdj1, would that be enough to cause my raid to get
>>this
>>out of whack? In any case I plan to replace those drives, but
>>would the
>>method above be the best route once I have copied the raw data to
>>the
>>new drives in order to bring my raid back up?
>>
>>
>>Thanks,
>>
>>bjz
>>
>>here is my log from when I run raidstart /dev/md0:
>>
>>Nov 29 10:10:19 orion kernel: [events: 00000022]
>>Nov 29 10:10:19 orion last message repeated 3 times
>>Nov 29 10:10:19 orion kernel: [events: 00000021]
>>Nov 29 10:10:19 orion kernel: md: autorun ...
>>Nov 29 10:10:19 orion kernel: md: considering hdj1 ...
>>Nov 29 10:10:19 orion kernel: md: adding hdj1 ...
>>Nov 29 10:10:19 orion kernel: md: adding hdi1 ...
>>Nov 29 10:10:19 orion kernel: md: adding hdd1 ...
>>Nov 29 10:10:19 orion kernel: md: adding hdc1 ...
>>Nov 29 10:10:19 orion kernel: md: adding hda1 ...
>>Nov 29 10:10:19 orion kernel: md: created md0
>>Nov 29 10:10:19 orion kernel: md: bind<hda1,1>
>>Nov 29 10:10:19 orion kernel: md: bind<hdc1,2>
>>Nov 29 10:10:19 orion kernel: md: bind<hdd1,3>
>>Nov 29 10:10:19 orion kernel: md: bind<hdi1,4>
>>Nov 29 10:10:19 orion kernel: md: bind<hdj1,5>
>>Nov 29 10:10:19 orion kernel: md: running:
>><hdj1><hdi1><hdd1><hdc1><hda1>Nov 29 10:10:19 orion kernel: md:
>>hdj1's event counter: 00000021
>>Nov 29 10:10:19 orion kernel: md: hdi1's event counter: 00000022
>>Nov 29 10:10:19 orion kernel: md: hdd1's event counter: 00000022
>>Nov 29 10:10:19 orion kernel: md: hdc1's event counter: 00000022
>>Nov 29 10:10:19 orion kernel: md: hda1's event counter: 00000022
>>Nov 29 10:10:19 orion kernel: md: superblock update time
>>inconsistency
>>-- using the most recent one
>>Nov 29 10:10:19 orion kernel: md: freshest: hdi1
>>Nov 29 10:10:19 orion kernel: md0: kicking faulty hdj1!
>>Nov 29 10:10:19 orion kernel: md: unbind<hdj1,4>
>>Nov 29 10:10:19 orion kernel: md: export_rdev(hdj1)
>>Nov 29 10:10:19 orion kernel: md: md0: raid array is not clean --
>>starting background reconstruction
>>Nov 29 10:10:19 orion kernel: md0: max total readahead window set
>>to 2560k
>>Nov 29 10:10:19 orion kernel: md0: 5 data-disks, max readahead per
>>data-disk: 512k
>>Nov 29 10:10:19 orion kernel: raid5: device hdi1 operational as
>>raid disk 4
>>Nov 29 10:10:19 orion kernel: raid5: device hdd1 operational as
>>raid disk 3
>>Nov 29 10:10:19 orion kernel: raid5: device hdc1 operational as
>>raid disk 2
>>Nov 29 10:10:19 orion kernel: raid5: device hda1 operational as
>>raid disk 0
>>Nov 29 10:10:19 orion kernel: raid5: not enough operational devices
>>for
>>md0 (2/6 failed)
>>Nov 29 10:10:19 orion kernel: RAID5 conf printout:
>>Nov 29 10:10:19 orion kernel: --- rd:6 wd:4 fd:2
>>Nov 29 10:10:19 orion kernel: disk 0, s:0, o:1, n:0 rd:0 us:1
>>dev:hda1Nov 29 10:10:19 orion kernel: disk 1, s:0, o:0, n:1 rd:1
>>us:1 dev:[dev
>>00:00]
>>Nov 29 10:10:19 orion kernel: disk 2, s:0, o:1, n:2 rd:2 us:1
>>dev:hdc1Nov 29 10:10:19 orion kernel: disk 3, s:0, o:1, n:3 rd:3
>>us:1 dev:hdd1
>>Nov 29 10:10:19 orion kernel: disk 4, s:0, o:1, n:4 rd:4 us:1
>>dev:hdi1Nov 29 10:10:19 orion kernel: disk 5, s:0, o:0, n:5 rd:5
>>us:1 dev:[dev
>>00:00]
>>Nov 29 10:10:19 orion kernel: raid5: failed to run raid set md0
>>Nov 29 10:10:19 orion kernel: md: pers->run() failed ...
>>Nov 29 10:10:19 orion kernel: md :do_md_run() returned -22
>>Nov 29 10:10:19 orion kernel: md: md0 stopped.
>>Nov 29 10:10:19 orion kernel: md: unbind<hdi1,3>
>>Nov 29 10:10:19 orion kernel: md: export_rdev(hdi1)
>>Nov 29 10:10:19 orion kernel: md: unbind<hdd1,2>
>>Nov 29 10:10:19 orion kernel: md: export_rdev(hdd1)
>>Nov 29 10:10:19 orion kernel: md: unbind<hdc1,1>
>>Nov 29 10:10:19 orion kernel: md: export_rdev(hdc1)
>>Nov 29 10:10:19 orion kernel: md: unbind<hda1,0>
>>Nov 29 10:10:19 orion kernel: md: export_rdev(hda1)
>>Nov 29 10:10:19 orion kernel: md: ... autorun DONE.
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-
>>raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>--
>>No virus found in this incoming message.
>>Checked by AVG Anti-Virus.
>>Version: 7.0.289 / Virus Database: 265.4.3 - Release Date: 11/26/2004
>>
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-
>>raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID5 Not coming back up after crash
2004-11-30 5:38 ` B. J. Zolp
@ 2004-11-30 5:45 ` Neil Brown
2004-11-30 5:48 ` B. J. Zolp
0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2004-11-30 5:45 UTC (permalink / raw)
To: B. J. Zolp; +Cc: Guy, 'BERNARD JOHN ZOLP', linux-raid
On Monday November 29, bjzolp@wisc.edu wrote:
> I found a spare new drive that I copied hdj1 onto and put the new drive
> on the proper IDE cable for hdj. Then tried running the mdadm -A
> /dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1 /dev/hdj1
> But it seg faults. Should I try the mkraid --force
> --dangerous-no-resync /dev/md0 ???
>
which version of mdadm segfaults?
If you aren't using 1.8.0, use that.
NeilBrown
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID5 Not coming back up after crash
2004-11-30 5:45 ` Neil Brown
@ 2004-11-30 5:48 ` B. J. Zolp
2004-11-30 5:54 ` Neil Brown
0 siblings, 1 reply; 7+ messages in thread
From: B. J. Zolp @ 2004-11-30 5:48 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Its 1.8.1.
bjz
Neil Brown wrote:
>On Monday November 29, bjzolp@wisc.edu wrote:
>
>
>>I found a spare new drive that I copied hdj1 onto and put the new drive
>>on the proper IDE cable for hdj. Then tried running the mdadm -A
>>/dev/md0 --force /dev/hda1 /dev/hdc1 /dev/hdd1 /dev/hdi1 /dev/hdj1
>>But it seg faults. Should I try the mkraid --force
>>--dangerous-no-resync /dev/md0 ???
>>
>>
>>
>
>which version of mdadm segfaults?
>If you aren't using 1.8.0, use that.
>
>NeilBrown
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID5 Not coming back up after crash
2004-11-30 5:48 ` B. J. Zolp
@ 2004-11-30 5:54 ` Neil Brown
2004-11-30 6:33 ` THANKS!! was:Re: " B. J. Zolp
0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2004-11-30 5:54 UTC (permalink / raw)
To: B. J. Zolp; +Cc: linux-raid
On Monday November 29, bjzolp@wisc.edu wrote:
> Its 1.8.1.
That is development code. It is buggy. Only use it for testing and
giving me feedback (like it says in the release notes).
: This is a "development" release of mdadm. It should *not* be
: considered stable and should be used primarily for testing.
: The current "stable" version is 1.8.0.
NeilBrown
^ permalink raw reply [flat|nested] 7+ messages in thread
* THANKS!! was:Re: RAID5 Not coming back up after crash
2004-11-30 5:54 ` Neil Brown
@ 2004-11-30 6:33 ` B. J. Zolp
0 siblings, 0 replies; 7+ messages in thread
From: B. J. Zolp @ 2004-11-30 6:33 UTC (permalink / raw)
To: linux-raid
I am sure this does not get said enough, so I just wanted to thank Neil
Brown and Guy for their help and speedy replies on this list. I was
able to get my raid up in no time due to their help.
Thanks,
bjz
Neil Brown wrote:
>On Monday November 29, bjzolp@wisc.edu wrote:
>
>
>>Its 1.8.1.
>>
>>
>
>That is development code. It is buggy. Only use it for testing and
>giving me feedback (like it says in the release notes).
>
>
>: This is a "development" release of mdadm. It should *not* be
>: considered stable and should be used primarily for testing.
>: The current "stable" version is 1.8.0.
>
>NeilBrown
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-11-30 6:33 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-29 20:56 RE: RAID5 Not coming back up after crash BERNARD JOHN ZOLP
2004-11-29 22:29 ` Guy
2004-11-30 5:38 ` B. J. Zolp
2004-11-30 5:45 ` Neil Brown
2004-11-30 5:48 ` B. J. Zolp
2004-11-30 5:54 ` Neil Brown
2004-11-30 6:33 ` THANKS!! was:Re: " B. J. Zolp
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).