RAID 5 lost two disks

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID 5 lost two disks
@ 2004-03-05 17:26 Corey McGuire
  2004-03-05 18:05 ` Corey McGuire
  2004-03-05 23:07 ` RAID 5 lost two disks Lars Marowsky-Bree
  0 siblings, 2 replies; 6+ messages in thread
From: Corey McGuire @ 2004-03-05 17:26 UTC (permalink / raw)
  To: linux-raid

help! I'm too afraid to STFW.

All I have to say is SuSE is a @#$@#$ piece of @#$@#$!

I am not used to not having a !@#!@# RAIDTAB! Thats right, SuSE never 
generated a RAIDTAB!  I have no clue what my RAID5 is built like, and I need 
to mkraid -R it? yeah, right!

SuSE must autodetect the RAID, which would be fine if my RAID WERE STILL 
WORKING!

all I have to go by is what dmesg outputs when trying to build the raid.

before I put the dump, let me give my system run down

Kernel 2.4.23
mkraid version 0.90.0

6 disks, hda3, hdc3, hde3, hdg3, hdi3, hdk3

A and C are on the motherboard
E and G are on a promise card
I and K are on another promise card

This is /home this is my everything... 1 @#$@# TB of everything... backed up 
maybe 3 months ago, maybe 4...

everything was working great for nearly 8 months until the failure

Golden bricks people... There's not enough dietary fiber in the world...

as far as I can tell the order is [dev 00:00] hdg3 [dev 00:00] hdk3 hda3 hdc3

if i write this to the raidtab, and its wrong, can i raidstop and try again?

I'm sorry if I'm missing important info... I'm not thinking very well...

here is the dmesg output... 

 [events: 0000004c]
 [events: 00000049]
 [events: 0000004c]
 [events: 0000004a]
 [events: 0000004c]
 [events: 0000004c]
md: autorun ...
md: considering hdc3 ...
md:  adding hdc3 ...
md:  adding hdk3 ...
md:  adding hdi3 ...
md:  adding hdg3 ...
md:  adding hde3 ...
md:  adding hda3 ...
md: created md2
md: bind<hda3,1>
md: bind<hde3,2>
md: bind<hdg3,3>
md: bind<hdi3,4>
md: bind<hdk3,5>
md: bind<hdc3,6>
md: running: <hdc3><hdk3><hdi3><hdg3><hde3><hda3>
md: hdc3's event counter: 0000004c
md: hdk3's event counter: 0000004c
md: hdi3's event counter: 0000004a
md: hdg3's event counter: 0000004c
md: hde3's event counter: 00000049
md: hda3's event counter: 0000004c
md: superblock update time inconsistency -- using the most recent one
md: freshest: hdc3
md: kicking non-fresh hdi3 from array!
md: unbind<hdi3,5>
md: export_rdev(hdi3)
md: kicking non-fresh hde3 from array!
md: unbind<hde3,4>
md: export_rdev(hde3)
md2: removing former faulty hde3!
md2: removing former faulty hdi3!
md2: max total readahead window set to 1240k
md2: 5 data-disks, max readahead per data-disk: 248k
raid5: device hdc3 operational as raid disk 5
raid5: device hdk3 operational as raid disk 3
raid5: device hdg3 operational as raid disk 1
raid5: device hda3 operational as raid disk 4
raid5: not enough operational devices for md2 (2/6 failed)
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:0, n:0 rd:0 us:1 dev:[dev 00:00]
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:hdg3
 disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev 00:00]
 disk 3, s:0, o:1, n:3 rd:3 us:1 dev:hdk3
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hda3
 disk 5, s:0, o:1, n:5 rd:5 us:1 dev:hdc3
raid5: failed to run raid set md2
md: pers->run() failed ...
md :do_md_run() returned -22
md: md2 stopped.
md: unbind<hdc3,3>
md: export_rdev(hdc3)
md: unbind<hdk3,2>
md: export_rdev(hdk3)
md: unbind<hdg3,1>
md: export_rdev(hdg3)
md: unbind<hda3,0>
md: export_rdev(hda3)
md: ... autorun DONE.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID 5 lost two disks
  2004-03-05 17:26 RAID 5 lost two disks Corey McGuire
@ 2004-03-05 18:05 ` Corey McGuire
  2004-03-05 20:25   ` Corey McGuire
  2004-03-05 23:07 ` RAID 5 lost two disks Lars Marowsky-Bree
  1 sibling, 1 reply; 6+ messages in thread
From: Corey McGuire @ 2004-03-05 18:05 UTC (permalink / raw)
  To: linux-raid


I have some goodish news... I back up my / mirror to my /mnt/backup mirror 
nightly... that means I have last nights state saved... everything not 
in  /home is archived.  I am going to dig through it to see if i can find a 
copy of what SuSE thought my raidtab was.  If anyone has a clue, lemme know.

not to self, remark out all archiving cron jobs

On Friday 05 March 2004 09:26 am, you wrote:
> help! I'm too afraid to STFW.
>
> All I have to say is SuSE is a @#$@#$ piece of @#$@#$!
>
> I am not used to not having a !@#!@# RAIDTAB! Thats right, SuSE never
> generated a RAIDTAB!  I have no clue what my RAID5 is built like, and I
> need to mkraid -R it? yeah, right!
>
> SuSE must autodetect the RAID, which would be fine if my RAID WERE STILL
> WORKING!
>
> all I have to go by is what dmesg outputs when trying to build the raid.
>
> before I put the dump, let me give my system run down
>
> Kernel 2.4.23
> mkraid version 0.90.0
>
> 6 disks, hda3, hdc3, hde3, hdg3, hdi3, hdk3
>
> A and C are on the motherboard
> E and G are on a promise card
> I and K are on another promise card
>
> This is /home this is my everything... 1 @#$@# TB of everything... backed
> up maybe 3 months ago, maybe 4...
>
> everything was working great for nearly 8 months until the failure
>
> Golden bricks people... There's not enough dietary fiber in the world...
>
> as far as I can tell the order is [dev 00:00] hdg3 [dev 00:00] hdk3 hda3
> hdc3
>
> if i write this to the raidtab, and its wrong, can i raidstop and try
> again?
>
> I'm sorry if I'm missing important info... I'm not thinking very well...
>
> here is the dmesg output...
>
>  [events: 0000004c]
>  [events: 00000049]
>  [events: 0000004c]
>  [events: 0000004a]
>  [events: 0000004c]
>  [events: 0000004c]
> md: autorun ...
> md: considering hdc3 ...
> md:  adding hdc3 ...
> md:  adding hdk3 ...
> md:  adding hdi3 ...
> md:  adding hdg3 ...
> md:  adding hde3 ...
> md:  adding hda3 ...
> md: created md2
> md: bind<hda3,1>
> md: bind<hde3,2>
> md: bind<hdg3,3>
> md: bind<hdi3,4>
> md: bind<hdk3,5>
> md: bind<hdc3,6>
> md: running: <hdc3><hdk3><hdi3><hdg3><hde3><hda3>
> md: hdc3's event counter: 0000004c
> md: hdk3's event counter: 0000004c
> md: hdi3's event counter: 0000004a
> md: hdg3's event counter: 0000004c
> md: hde3's event counter: 00000049
> md: hda3's event counter: 0000004c
> md: superblock update time inconsistency -- using the most recent one
> md: freshest: hdc3
> md: kicking non-fresh hdi3 from array!
> md: unbind<hdi3,5>
> md: export_rdev(hdi3)
> md: kicking non-fresh hde3 from array!
> md: unbind<hde3,4>
> md: export_rdev(hde3)
> md2: removing former faulty hde3!
> md2: removing former faulty hdi3!
> md2: max total readahead window set to 1240k
> md2: 5 data-disks, max readahead per data-disk: 248k
> raid5: device hdc3 operational as raid disk 5
> raid5: device hdk3 operational as raid disk 3
> raid5: device hdg3 operational as raid disk 1
> raid5: device hda3 operational as raid disk 4
> raid5: not enough operational devices for md2 (2/6 failed)
> RAID5 conf printout:
>  --- rd:6 wd:4 fd:2
>  disk 0, s:0, o:0, n:0 rd:0 us:1 dev:[dev 00:00]
>  disk 1, s:0, o:1, n:1 rd:1 us:1 dev:hdg3
>  disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev 00:00]
>  disk 3, s:0, o:1, n:3 rd:3 us:1 dev:hdk3
>  disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hda3
>  disk 5, s:0, o:1, n:5 rd:5 us:1 dev:hdc3
> raid5: failed to run raid set md2
> md: pers->run() failed ...
> md :do_md_run() returned -22
> md: md2 stopped.
> md: unbind<hdc3,3>
> md: export_rdev(hdc3)
> md: unbind<hdk3,2>
> md: export_rdev(hdk3)
> md: unbind<hdg3,1>
> md: export_rdev(hdg3)
> md: unbind<hda3,0>
> md: export_rdev(hda3)
> md: ... autorun DONE.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID 5 lost two disks
  2004-03-05 18:05 ` Corey McGuire
@ 2004-03-05 20:25   ` Corey McGuire
  2004-03-06  9:56     ` Corey McGuire
  0 siblings, 1 reply; 6+ messages in thread
From: Corey McGuire @ 2004-03-05 20:25 UTC (permalink / raw)
  To: linux-raid, bugzilla

That kinda worked!!!!!! I need to FSCK it, but i'm still afraid of fscking it 
up...

Does anyone in San Jose/San Francisco/Anywhere-in-frag'n-California have a 
free TB I can use for a DD?  I will offer you my first child!

If I need to sweeten the deal, I have LOTS to share... I have a TB of goodies 
just looking to be backed up!

On Friday 05 March 2004 10:14 am, you wrote:
> I had a 2 disk failure; I will explain what I did.
> 1 disk was bad; it affected all disks on that SCSI buss.
> The RAID software got into a bad state, I think I needed to reboot, or
> power cycle.
> After the reboot, it said 2 disks were non fresh or whatever.
> My array had 14 disks, 7 on the buss with the 2 non fresh disks.
> I could not do a dd read test with much success on most of the disks, maybe
> 2 or 3 seemed ok, but not if I did 2 dd's at the same time.
> So I unplugged all disks but 1, tested the 1.  If success repeat with the
> next disk.  I found 1 disk that did not work.  So I connected the 6 good
> disks.  Did 6 dd's at the same time, all was well.
>
> So, now I have 13 of 14 disks and 1 of the 13 is non fresh.  I issued this
> command.
>
> mdadm -A --force /dev/md2 --scan
> For some reason my filesystem was corrupt.  I noticed that the spare disk
> was in the list.  I knew the rebuild to the spare never finished.  It may
> not have been synced at all since so many disks were not working.  So, I
> knew the spare should not be part of the array, yet!
>
> I had trouble stopping the array, so reboot.
>
> This time I listed the disks excluding the spare and the failed disk.
>
> mdadm -A --force /dev/md2 /dev/sdk1 /dev/sdd1 /dev/sdl1 /dev/sde1 /dev/sdm1
> /dev/sdf1 /dev/sdn1 /dev/sdg1 /dev/sdh1 /dev/sdo1 /dev/sdi1 /dev/sdp1
> /dev/sdj1
>
> I did not include the missing disk, but I did include the non fresh disk.
> Now my filesystem is fine.
>
> I added the spare, it re-built, a good day!  I bet if this had happened to
> a hardware RAID it could not have been saved.
>
> I replaced the bad disk and added it as a spare.
> That was about 1 month ago, everything is still fine.
>
> You will need to install mdadm if you don't have it.  mdadm does not use
> raidtab, it uses /etc/mdadm.conf
>
> Man mdadm for details!
>
> Good luck!
>
> Guy
>
> ===========================================================================
>= Tips:
>
> This will give details of each disk.
> mdadm -E /dev/hda3
> repeat for hdc3, hde3, hdg3, hdi3, hdk3.
>
> dd test...  To test a disk to determine if the surface is good.
> This is just a read test!
> dd if=/dev/hda of=/dev/null bs=64k
> repeat for hdc, hde, hdg, hdi, hdk.
>
> My mdadm.conf:
> MAILADDR bugzilla@watkins-home.com
> PROGRAM /root/bin/handle-mdadm-events
>
> DEVICE /dev/sd[abcdefghijklmnopqrstuvwxyz][12]
>
> ARRAY /dev/md0 level=raid1 num-devices=2
> UUID=1fb2890c:2c9c47bf:db12e1e3:16cd7ffe
>
> ARRAY /dev/md1 level=raid1 num-devices=2
> UUID=8f183b62:ea93fe30:a842431c:4b93c7bb
>
> ARRAY /dev/md2 level=raid5 num-devices=14
> UUID=8357a389:8853c2d1:f160d155:6b4e1b99

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID 5 lost two disks
  2004-03-05 20:25   ` Corey McGuire
@ 2004-03-06  9:56     ` Corey McGuire
  2004-03-06 22:25       ` RAID 5 lost two disks : anyone know of reiser recovery tools? Corey McGuire
  0 siblings, 1 reply; 6+ messages in thread
From: Corey McGuire @ 2004-03-06  9:56 UTC (permalink / raw)
  To: linux-raid


Well, I got the RAID up, I had reiserfsck work its mojo (it looks like I lost 
lots of folder names, but the files appear to remember who they are)

BUT mount segfaults (or something segfaults) every time I try to mount the 
damn thing...

I'm going to try running 2.6.somthing, hoping that maybe of the tools I built 
was just too new for suse 8.2/linux 2.4.23... but i highly doubt it... who 
knows, maybe 2.6 will behave more nicely... i hope mount -o ro will be enough 
to protect me if it doesn't... who knows...

any ideas what might be segfaulting mount?...

this is from /var/log/messages from about the time I tried mounting

Mar  6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 4096 --> 
1024
Mar  6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 1024 --> 
4096
Mar  6 01:14:39 ilneval kernel: reiserfs: found format "3.6" with standard 
journal
Mar  6 01:14:41 ilneval kernel: Unable to handle kernel paging request at 
virtual address e09ce004
Mar  6 01:14:41 ilneval kernel:  printing eip:
Mar  6 01:14:41 ilneval kernel: c01839b5
Mar  6 01:14:41 ilneval kernel: *pde = 1f5f7067
Mar  6 01:14:41 ilneval kernel: *pte = 00000000
Mar  6 01:14:41 ilneval kernel: Oops: 0002
Mar  6 01:14:41 ilneval kernel: CPU:    0
Mar  6 01:14:41 ilneval kernel: EIP:    0010:[<c01839b5>]    Not tainted
Mar  6 01:14:41 ilneval kernel: EFLAGS: 00010286
Mar  6 01:14:41 ilneval kernel: eax: dae13bc0   ebx: e09c6000   ecx: dae13c08   
edx: dae13bc0
Mar  6 01:14:41 ilneval kernel: esi: df26a000   edi: 00001000   ebp: dbf32000   
esp: dbeb1e2c
Mar  6 01:14:41 ilneval kernel: ds: 0018   es: 0018   ss: 0018
Mar  6 01:14:41 ilneval kernel: Process mount (pid: 829, stackpage=dbeb1000)
Mar  6 01:14:41 ilneval kernel: Stack: 00000902 00001003 00001000 00000003 
00000001 df26a000 00000902 dbf32000
Mar  6 01:14:41 ilneval kernel:        c01843cc df26a000 00000400 00002000 
dbeb1e68 00000001 00000000 00000000
Mar  6 01:14:41 ilneval kernel:        00000246 00000000 00000000 00000902 
fffffff3 df26a000 00000001 c013a4ba
Mar  6 01:14:41 ilneval kernel: Call Trace:    [<c01843cc>] [<c013a4ba>] 
[<c013ad4b>] [<c014c8ae>] [<c013b0d0>]
Mar  6 01:14:41 ilneval kernel:   [<c014da3e>] [<c014dd6c>] [<c014db95>] 
[<c014e15a>] [<c010745f>]
Mar  6 01:14:41 ilneval kernel:
Mar  6 01:14:41 ilneval kernel: Code: 89 44 fb 04 b8 01 00 00 00 8b 96 f4 00 
00 00 8b 4c fa 04 85



On Friday 05 March 2004 12:25 pm, Corey McGuire wrote:
> That kinda worked!!!!!! I need to FSCK it, but i'm still afraid of fscking
> it up...
>
> Does anyone in San Jose/San Francisco/Anywhere-in-frag'n-California have a
> free TB I can use for a DD?  I will offer you my first child!
>
> If I need to sweeten the deal, I have LOTS to share... I have a TB of
> goodies just looking to be backed up!
>
> On Friday 05 March 2004 10:14 am, you wrote:
> > I had a 2 disk failure; I will explain what I did.
> > 1 disk was bad; it affected all disks on that SCSI buss.
> > The RAID software got into a bad state, I think I needed to reboot, or
> > power cycle.
> > After the reboot, it said 2 disks were non fresh or whatever.
> > My array had 14 disks, 7 on the buss with the 2 non fresh disks.
> > I could not do a dd read test with much success on most of the disks,
> > maybe 2 or 3 seemed ok, but not if I did 2 dd's at the same time.
> > So I unplugged all disks but 1, tested the 1.  If success repeat with the
> > next disk.  I found 1 disk that did not work.  So I connected the 6 good
> > disks.  Did 6 dd's at the same time, all was well.
> >
> > So, now I have 13 of 14 disks and 1 of the 13 is non fresh.  I issued
> > this command.
> >
> > mdadm -A --force /dev/md2 --scan
> > For some reason my filesystem was corrupt.  I noticed that the spare disk
> > was in the list.  I knew the rebuild to the spare never finished.  It may
> > not have been synced at all since so many disks were not working.  So, I
> > knew the spare should not be part of the array, yet!
> >
> > I had trouble stopping the array, so reboot.
> >
> > This time I listed the disks excluding the spare and the failed disk.
> >
> > mdadm -A --force /dev/md2 /dev/sdk1 /dev/sdd1 /dev/sdl1 /dev/sde1
> > /dev/sdm1 /dev/sdf1 /dev/sdn1 /dev/sdg1 /dev/sdh1 /dev/sdo1 /dev/sdi1
> > /dev/sdp1 /dev/sdj1
> >
> > I did not include the missing disk, but I did include the non fresh disk.
> > Now my filesystem is fine.
> >
> > I added the spare, it re-built, a good day!  I bet if this had happened
> > to a hardware RAID it could not have been saved.
> >
> > I replaced the bad disk and added it as a spare.
> > That was about 1 month ago, everything is still fine.
> >
> > You will need to install mdadm if you don't have it.  mdadm does not use
> > raidtab, it uses /etc/mdadm.conf
> >
> > Man mdadm for details!
> >
> > Good luck!
> >
> > Guy
> >
> > =========================================================================
> >== = Tips:
> >
> > This will give details of each disk.
> > mdadm -E /dev/hda3
> > repeat for hdc3, hde3, hdg3, hdi3, hdk3.
> >
> > dd test...  To test a disk to determine if the surface is good.
> > This is just a read test!
> > dd if=/dev/hda of=/dev/null bs=64k
> > repeat for hdc, hde, hdg, hdi, hdk.
> >
> > My mdadm.conf:
> > MAILADDR bugzilla@watkins-home.com
> > PROGRAM /root/bin/handle-mdadm-events
> >
> > DEVICE /dev/sd[abcdefghijklmnopqrstuvwxyz][12]
> >
> > ARRAY /dev/md0 level=raid1 num-devices=2
> > UUID=1fb2890c:2c9c47bf:db12e1e3:16cd7ffe
> >
> > ARRAY /dev/md1 level=raid1 num-devices=2
> > UUID=8f183b62:ea93fe30:a842431c:4b93c7bb
> >
> > ARRAY /dev/md2 level=raid5 num-devices=14
> > UUID=8357a389:8853c2d1:f160d155:6b4e1b99
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID 5 lost two disks : anyone know of reiser recovery tools?
  2004-03-06  9:56     ` Corey McGuire
@ 2004-03-06 22:25       ` Corey McGuire
  0 siblings, 0 replies; 6+ messages in thread
From: Corey McGuire @ 2004-03-06 22:25 UTC (permalink / raw)
  To: linux-raid


Well, 2.6 didn't stop the segfaulting... it did give a little more info, but 
nothing I can decipher.  I'll paste the section from /var/log/messages at the 
end of this post...

now, if noone knows how to get this thing mounting again, are there any tools 
that would allow me to extract files from the drive?  I am thinking something 
it an FTP like interface.

FSCK seems to SEE the files, and that gives me hope, but because I can't mount 
it, I now have a useless 1TB partition.

So instead of mounting it, is there anyway to get at the files in a primitive 
fashion?

Here's my dump 

Mar  6 13:14:42 ilneval kernel: found reiserfs format "3.6" with standard 
journal
Mar  6 13:14:43 ilneval kernel: Unable to handle kernel paging request at 
virtual address e0999004
Mar  6 13:14:43 ilneval kernel:  printing eip:
Mar  6 13:14:43 ilneval kernel: c019e12d
Mar  6 13:14:43 ilneval kernel: *pde = 1fe7c067
Mar  6 13:14:43 ilneval kernel: *pte = 00000000
Mar  6 13:14:43 ilneval kernel: Oops: 0002 [#1]
Mar  6 13:14:43 ilneval kernel: CPU:    0
Mar  6 13:14:43 ilneval kernel: EIP:    0060:[read_old_bitmaps+189/256]    Not 
tainted
Mar  6 13:14:43 ilneval kernel: EIP:    0060:[<c019e12d>]    Not tainted
Mar  6 13:14:43 ilneval kernel: EFLAGS: 00010282
Mar  6 13:14:43 ilneval kernel: EIP is at read_old_bitmaps+0xbd/0x100
Mar  6 13:14:43 ilneval kernel: eax: da87a4e0   ebx: e0991000   ecx: dfd3b940   
edx: da87a4e0
Mar  6 13:14:43 ilneval kernel: esi: dd619400   edi: 00001000   ebp: db847000   
esp: db90fdf0
Mar  6 13:14:43 ilneval kernel: ds: 007b   es: 007b   ss: 0068
Mar  6 13:14:43 ilneval kernel: Process mount (pid: 833, threadinfo=db90e000 
task=dbacc6e0)
Mar  6 13:14:43 ilneval kernel: Stack: dfda9040 00001003 00001000 00000003 
def6ac00 dd619400 83c30000 000000e5
Mar  6 13:14:43 ilneval kernel:        c019ec93 dd619400 00002000 def6ac1c 
db90fe40 db90fe44 db90fe48 ffffffea
Mar  6 13:14:43 ilneval kernel:        db847000 00000001 dd619510 db90fea0 
00000000 00000000 00000000 c02e07b7
Mar  6 13:14:43 ilneval kernel: Call Trace:
Mar  6 13:14:43 ilneval kernel:  [reiserfs_fill_super+707/1680] 
reiserfs_fill_super+0x2c3/0x690
Mar  6 13:14:43 ilneval kernel:  [<c019ec93>] reiserfs_fill_super+0x2c3/0x690
Mar  6 13:14:43 ilneval kernel:  [disk_name+175/208] disk_name+0xaf/0xd0
Mar  6 13:14:43 ilneval kernel:  [<c01819ff>] disk_name+0xaf/0xd0
Mar  6 13:14:43 ilneval kernel:  [sb_set_blocksize+31/80] 
sb_set_blocksize+0x1f/0x50
Mar  6 13:14:43 ilneval kernel:  [<c01566ff>] sb_set_blocksize+0x1f/0x50
Mar  6 13:14:43 ilneval kernel:  [get_sb_bdev+234/368] get_sb_bdev+0xea/0x170
Mar  6 13:14:43 ilneval kernel:  [<c01560ba>] get_sb_bdev+0xea/0x170
Mar  6 13:14:43 ilneval kernel:  [get_super_block+47/64] 
get_super_block+0x2f/0x40
Mar  6 13:14:43 ilneval kernel:  [<c019f0cf>] get_super_block+0x2f/0x40
Mar  6 13:14:43 ilneval kernel:  [reiserfs_fill_super+0/1680] 
reiserfs_fill_super+0x0/0x690
Mar  6 13:14:43 ilneval kernel:  [<c019e9d0>] reiserfs_fill_super+0x0/0x690
Mar  6 13:14:43 ilneval kernel:  [do_kern_mount+91/240] 
do_kern_mount+0x5b/0xf0
Mar  6 13:14:43 ilneval kernel:  [<c015636b>] do_kern_mount+0x5b/0xf0
Mar  6 13:14:43 ilneval kernel:  [do_add_mount+151/400] 
do_add_mount+0x97/0x190
Mar  6 13:14:43 ilneval kernel:  [<c016c3a7>] do_add_mount+0x97/0x190
Mar  6 13:14:43 ilneval kernel:  [do_mount+404/448] do_mount+0x194/0x1c0
Mar  6 13:14:43 ilneval kernel:  [<c016c744>] do_mount+0x194/0x1c0
Mar  6 13:14:43 ilneval kernel:  [copy_mount_options+140/272] 
copy_mount_options+0x8c/0x110
Mar  6 13:14:43 ilneval kernel:  [<c016c52c>] copy_mount_options+0x8c/0x110
Mar  6 13:14:43 ilneval kernel:  [sys_mount+191/320] sys_mount+0xbf/0x140
Mar  6 13:14:43 ilneval kernel:  [<c016cb3f>] sys_mount+0xbf/0x140
Mar  6 13:14:43 ilneval kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Mar  6 13:14:43 ilneval kernel:  [<c010906b>] syscall_call+0x7/0xb
Mar  6 13:14:43 ilneval kernel:
Mar  6 13:14:43 ilneval kernel: Code: 89 44 fb 04 8b 86 64 01 00 00 8b 50 08 
b8 01 00 00 00 8b 4c


On Saturday 06 March 2004 01:56 am, Corey McGuire wrote:
> Well, I got the RAID up, I had reiserfsck work its mojo (it looks like I
> lost lots of folder names, but the files appear to remember who they are)
>
> BUT mount segfaults (or something segfaults) every time I try to mount the
> damn thing...
>
> I'm going to try running 2.6.somthing, hoping that maybe of the tools I
> built was just too new for suse 8.2/linux 2.4.23... but i highly doubt
> it... who knows, maybe 2.6 will behave more nicely... i hope mount -o ro
> will be enough to protect me if it doesn't... who knows...
>
> any ideas what might be segfaulting mount?...
>
> this is from /var/log/messages from about the time I tried mounting
>
> Mar  6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 4096
> --> 1024
> Mar  6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 1024
> --> 4096
> Mar  6 01:14:39 ilneval kernel: reiserfs: found format "3.6" with standard
> journal
> Mar  6 01:14:41 ilneval kernel: Unable to handle kernel paging request at
> virtual address e09ce004
> Mar  6 01:14:41 ilneval kernel:  printing eip:
> Mar  6 01:14:41 ilneval kernel: c01839b5
> Mar  6 01:14:41 ilneval kernel: *pde = 1f5f7067
> Mar  6 01:14:41 ilneval kernel: *pte = 00000000
> Mar  6 01:14:41 ilneval kernel: Oops: 0002
> Mar  6 01:14:41 ilneval kernel: CPU:    0
> Mar  6 01:14:41 ilneval kernel: EIP:    0010:[<c01839b5>]    Not tainted
> Mar  6 01:14:41 ilneval kernel: EFLAGS: 00010286
> Mar  6 01:14:41 ilneval kernel: eax: dae13bc0   ebx: e09c6000   ecx:
> dae13c08 edx: dae13bc0
> Mar  6 01:14:41 ilneval kernel: esi: df26a000   edi: 00001000   ebp:
> dbf32000 esp: dbeb1e2c
> Mar  6 01:14:41 ilneval kernel: ds: 0018   es: 0018   ss: 0018
> Mar  6 01:14:41 ilneval kernel: Process mount (pid: 829,
> stackpage=dbeb1000) Mar  6 01:14:41 ilneval kernel: Stack: 00000902
> 00001003 00001000 00000003 00000001 df26a000 00000902 dbf32000
> Mar  6 01:14:41 ilneval kernel:        c01843cc df26a000 00000400 00002000
> dbeb1e68 00000001 00000000 00000000
> Mar  6 01:14:41 ilneval kernel:        00000246 00000000 00000000 00000902
> fffffff3 df26a000 00000001 c013a4ba
> Mar  6 01:14:41 ilneval kernel: Call Trace:    [<c01843cc>] [<c013a4ba>]
> [<c013ad4b>] [<c014c8ae>] [<c013b0d0>]
> Mar  6 01:14:41 ilneval kernel:   [<c014da3e>] [<c014dd6c>] [<c014db95>]
> [<c014e15a>] [<c010745f>]
> Mar  6 01:14:41 ilneval kernel:
> Mar  6 01:14:41 ilneval kernel: Code: 89 44 fb 04 b8 01 00 00 00 8b 96 f4
> 00 00 00 8b 4c fa 04 85
>
> On Friday 05 March 2004 12:25 pm, Corey McGuire wrote:
> > That kinda worked!!!!!! I need to FSCK it, but i'm still afraid of
> > fscking it up...
> >
> > Does anyone in San Jose/San Francisco/Anywhere-in-frag'n-California have
> > a free TB I can use for a DD?  I will offer you my first child!
> >
> > If I need to sweeten the deal, I have LOTS to share... I have a TB of
> > goodies just looking to be backed up!
> >
> > On Friday 05 March 2004 10:14 am, you wrote:
> > > I had a 2 disk failure; I will explain what I did.
> > > 1 disk was bad; it affected all disks on that SCSI buss.
> > > The RAID software got into a bad state, I think I needed to reboot, or
> > > power cycle.
> > > After the reboot, it said 2 disks were non fresh or whatever.
> > > My array had 14 disks, 7 on the buss with the 2 non fresh disks.
> > > I could not do a dd read test with much success on most of the disks,
> > > maybe 2 or 3 seemed ok, but not if I did 2 dd's at the same time.
> > > So I unplugged all disks but 1, tested the 1.  If success repeat with
> > > the next disk.  I found 1 disk that did not work.  So I connected the 6
> > > good disks.  Did 6 dd's at the same time, all was well.
> > >
> > > So, now I have 13 of 14 disks and 1 of the 13 is non fresh.  I issued
> > > this command.
> > >
> > > mdadm -A --force /dev/md2 --scan
> > > For some reason my filesystem was corrupt.  I noticed that the spare
> > > disk was in the list.  I knew the rebuild to the spare never finished. 
> > > It may not have been synced at all since so many disks were not
> > > working.  So, I knew the spare should not be part of the array, yet!
> > >
> > > I had trouble stopping the array, so reboot.
> > >
> > > This time I listed the disks excluding the spare and the failed disk.
> > >
> > > mdadm -A --force /dev/md2 /dev/sdk1 /dev/sdd1 /dev/sdl1 /dev/sde1
> > > /dev/sdm1 /dev/sdf1 /dev/sdn1 /dev/sdg1 /dev/sdh1 /dev/sdo1 /dev/sdi1
> > > /dev/sdp1 /dev/sdj1
> > >
> > > I did not include the missing disk, but I did include the non fresh
> > > disk. Now my filesystem is fine.
> > >
> > > I added the spare, it re-built, a good day!  I bet if this had happened
> > > to a hardware RAID it could not have been saved.
> > >
> > > I replaced the bad disk and added it as a spare.
> > > That was about 1 month ago, everything is still fine.
> > >
> > > You will need to install mdadm if you don't have it.  mdadm does not
> > > use raidtab, it uses /etc/mdadm.conf
> > >
> > > Man mdadm for details!
> > >
> > > Good luck!
> > >
> > > Guy
> > >
> > > =======================================================================
> > >== == = Tips:
> > >
> > > This will give details of each disk.
> > > mdadm -E /dev/hda3
> > > repeat for hdc3, hde3, hdg3, hdi3, hdk3.
> > >
> > > dd test...  To test a disk to determine if the surface is good.
> > > This is just a read test!
> > > dd if=/dev/hda of=/dev/null bs=64k
> > > repeat for hdc, hde, hdg, hdi, hdk.
> > >
> > > My mdadm.conf:
> > > MAILADDR bugzilla@watkins-home.com
> > > PROGRAM /root/bin/handle-mdadm-events
> > >
> > > DEVICE /dev/sd[abcdefghijklmnopqrstuvwxyz][12]
> > >
> > > ARRAY /dev/md0 level=raid1 num-devices=2
> > > UUID=1fb2890c:2c9c47bf:db12e1e3:16cd7ffe
> > >
> > > ARRAY /dev/md1 level=raid1 num-devices=2
> > > UUID=8f183b62:ea93fe30:a842431c:4b93c7bb
> > >
> > > ARRAY /dev/md2 level=raid5 num-devices=14
> > > UUID=8357a389:8853c2d1:f160d155:6b4e1b99
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID 5 lost two disks
  2004-03-05 17:26 RAID 5 lost two disks Corey McGuire
  2004-03-05 18:05 ` Corey McGuire
@ 2004-03-05 23:07 ` Lars Marowsky-Bree
  1 sibling, 0 replies; 6+ messages in thread
From: Lars Marowsky-Bree @ 2004-03-05 23:07 UTC (permalink / raw)
  To: Corey McGuire, linux-raid

On 2004-03-05T09:26:42,
   Corey McGuire <coreyfro@coreyfro.com> said:

> help! I'm too afraid to STFW.
> 
> All I have to say is SuSE is a @#$@#$ piece of @#$@#$!

This is uncalled for.

> I am not used to not having a !@#!@# RAIDTAB! Thats right, SuSE never 
> generated a RAIDTAB!  I have no clue what my RAID5 is built like, and I need 
> to mkraid -R it? yeah, right!

Use mdadm -A, maybe you'll need to use -f.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-03-06 22:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-05 17:26 RAID 5 lost two disks Corey McGuire
2004-03-05 18:05 ` Corey McGuire
2004-03-05 20:25   ` Corey McGuire
2004-03-06  9:56     ` Corey McGuire
2004-03-06 22:25       ` RAID 5 lost two disks : anyone know of reiser recovery tools? Corey McGuire
2004-03-05 23:07 ` RAID 5 lost two disks Lars Marowsky-Bree

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).