Re: RAID 5 lost two disks

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Corey McGuire <coreyfro@coreyfro.com>
To: linux-raid@vger.kernel.org
Subject: Re: RAID 5 lost two disks
Date: Sat, 6 Mar 2004 01:56:11 -0800	[thread overview]
Message-ID: <200403060156.11808.coreyfro@coreyfro.com> (raw)
In-Reply-To: <200403051225.30661.coreyfro@coreyfro.com>


Well, I got the RAID up, I had reiserfsck work its mojo (it looks like I lost 
lots of folder names, but the files appear to remember who they are)

BUT mount segfaults (or something segfaults) every time I try to mount the 
damn thing...

I'm going to try running 2.6.somthing, hoping that maybe of the tools I built 
was just too new for suse 8.2/linux 2.4.23... but i highly doubt it... who 
knows, maybe 2.6 will behave more nicely... i hope mount -o ro will be enough 
to protect me if it doesn't... who knows...

any ideas what might be segfaulting mount?...

this is from /var/log/messages from about the time I tried mounting

Mar  6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 4096 --> 
1024
Mar  6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 1024 --> 
4096
Mar  6 01:14:39 ilneval kernel: reiserfs: found format "3.6" with standard 
journal
Mar  6 01:14:41 ilneval kernel: Unable to handle kernel paging request at 
virtual address e09ce004
Mar  6 01:14:41 ilneval kernel:  printing eip:
Mar  6 01:14:41 ilneval kernel: c01839b5
Mar  6 01:14:41 ilneval kernel: *pde = 1f5f7067
Mar  6 01:14:41 ilneval kernel: *pte = 00000000
Mar  6 01:14:41 ilneval kernel: Oops: 0002
Mar  6 01:14:41 ilneval kernel: CPU:    0
Mar  6 01:14:41 ilneval kernel: EIP:    0010:[<c01839b5>]    Not tainted
Mar  6 01:14:41 ilneval kernel: EFLAGS: 00010286
Mar  6 01:14:41 ilneval kernel: eax: dae13bc0   ebx: e09c6000   ecx: dae13c08   
edx: dae13bc0
Mar  6 01:14:41 ilneval kernel: esi: df26a000   edi: 00001000   ebp: dbf32000   
esp: dbeb1e2c
Mar  6 01:14:41 ilneval kernel: ds: 0018   es: 0018   ss: 0018
Mar  6 01:14:41 ilneval kernel: Process mount (pid: 829, stackpage=dbeb1000)
Mar  6 01:14:41 ilneval kernel: Stack: 00000902 00001003 00001000 00000003 
00000001 df26a000 00000902 dbf32000
Mar  6 01:14:41 ilneval kernel:        c01843cc df26a000 00000400 00002000 
dbeb1e68 00000001 00000000 00000000
Mar  6 01:14:41 ilneval kernel:        00000246 00000000 00000000 00000902 
fffffff3 df26a000 00000001 c013a4ba
Mar  6 01:14:41 ilneval kernel: Call Trace:    [<c01843cc>] [<c013a4ba>] 
[<c013ad4b>] [<c014c8ae>] [<c013b0d0>]
Mar  6 01:14:41 ilneval kernel:   [<c014da3e>] [<c014dd6c>] [<c014db95>] 
[<c014e15a>] [<c010745f>]
Mar  6 01:14:41 ilneval kernel:
Mar  6 01:14:41 ilneval kernel: Code: 89 44 fb 04 b8 01 00 00 00 8b 96 f4 00 
00 00 8b 4c fa 04 85



On Friday 05 March 2004 12:25 pm, Corey McGuire wrote:
> That kinda worked!!!!!! I need to FSCK it, but i'm still afraid of fscking
> it up...
>
> Does anyone in San Jose/San Francisco/Anywhere-in-frag'n-California have a
> free TB I can use for a DD?  I will offer you my first child!
>
> If I need to sweeten the deal, I have LOTS to share... I have a TB of
> goodies just looking to be backed up!
>
> On Friday 05 March 2004 10:14 am, you wrote:
> > I had a 2 disk failure; I will explain what I did.
> > 1 disk was bad; it affected all disks on that SCSI buss.
> > The RAID software got into a bad state, I think I needed to reboot, or
> > power cycle.
> > After the reboot, it said 2 disks were non fresh or whatever.
> > My array had 14 disks, 7 on the buss with the 2 non fresh disks.
> > I could not do a dd read test with much success on most of the disks,
> > maybe 2 or 3 seemed ok, but not if I did 2 dd's at the same time.
> > So I unplugged all disks but 1, tested the 1.  If success repeat with the
> > next disk.  I found 1 disk that did not work.  So I connected the 6 good
> > disks.  Did 6 dd's at the same time, all was well.
> >
> > So, now I have 13 of 14 disks and 1 of the 13 is non fresh.  I issued
> > this command.
> >
> > mdadm -A --force /dev/md2 --scan
> > For some reason my filesystem was corrupt.  I noticed that the spare disk
> > was in the list.  I knew the rebuild to the spare never finished.  It may
> > not have been synced at all since so many disks were not working.  So, I
> > knew the spare should not be part of the array, yet!
> >
> > I had trouble stopping the array, so reboot.
> >
> > This time I listed the disks excluding the spare and the failed disk.
> >
> > mdadm -A --force /dev/md2 /dev/sdk1 /dev/sdd1 /dev/sdl1 /dev/sde1
> > /dev/sdm1 /dev/sdf1 /dev/sdn1 /dev/sdg1 /dev/sdh1 /dev/sdo1 /dev/sdi1
> > /dev/sdp1 /dev/sdj1
> >
> > I did not include the missing disk, but I did include the non fresh disk.
> > Now my filesystem is fine.
> >
> > I added the spare, it re-built, a good day!  I bet if this had happened
> > to a hardware RAID it could not have been saved.
> >
> > I replaced the bad disk and added it as a spare.
> > That was about 1 month ago, everything is still fine.
> >
> > You will need to install mdadm if you don't have it.  mdadm does not use
> > raidtab, it uses /etc/mdadm.conf
> >
> > Man mdadm for details!
> >
> > Good luck!
> >
> > Guy
> >
> > =========================================================================
> >== = Tips:
> >
> > This will give details of each disk.
> > mdadm -E /dev/hda3
> > repeat for hdc3, hde3, hdg3, hdi3, hdk3.
> >
> > dd test...  To test a disk to determine if the surface is good.
> > This is just a read test!
> > dd if=/dev/hda of=/dev/null bs=64k
> > repeat for hdc, hde, hdg, hdi, hdk.
> >
> > My mdadm.conf:
> > MAILADDR bugzilla@watkins-home.com
> > PROGRAM /root/bin/handle-mdadm-events
> >
> > DEVICE /dev/sd[abcdefghijklmnopqrstuvwxyz][12]
> >
> > ARRAY /dev/md0 level=raid1 num-devices=2
> > UUID=1fb2890c:2c9c47bf:db12e1e3:16cd7ffe
> >
> > ARRAY /dev/md1 level=raid1 num-devices=2
> > UUID=8f183b62:ea93fe30:a842431c:4b93c7bb
> >
> > ARRAY /dev/md2 level=raid5 num-devices=14
> > UUID=8357a389:8853c2d1:f160d155:6b4e1b99
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2004-03-06  9:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-05 17:26 RAID 5 lost two disks Corey McGuire
2004-03-05 18:05 ` Corey McGuire
2004-03-05 20:25   ` Corey McGuire
2004-03-06  9:56     ` Corey McGuire [this message]
2004-03-06 22:25       ` RAID 5 lost two disks : anyone know of reiser recovery tools? Corey McGuire
2004-03-05 23:07 ` RAID 5 lost two disks Lars Marowsky-Bree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200403060156.11808.coreyfro@coreyfro.com \
    --to=coreyfro@coreyfro.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).