Re: RAID 5 lost two disks

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Corey McGuire <coreyfro@coreyfro.com>
To: linux-raid@vger.kernel.org
Subject: Re: RAID 5 lost two disks
Date: Sat, 6 Mar 2004 01:56:11 -0800	[thread overview]
Message-ID: <200403060156.11808.coreyfro@coreyfro.com> (raw)
In-Reply-To: <200403051225.30661.coreyfro@coreyfro.com>


Well, I got the RAID up, I had reiserfsck work its mojo (it looks like I lost 
lots of folder names, but the files appear to remember who they are)

BUT mount segfaults (or something segfaults) every time I try to mount the 
damn thing...

I'm going to try running 2.6.somthing, hoping that maybe of the tools I built 
was just too new for suse 8.2/linux 2.4.23... but i highly doubt it... who 
knows, maybe 2.6 will behave more nicely... i hope mount -o ro will be enough 
to protect me if it doesn't... who knows...

any ideas what might be segfaulting mount?...

this is from /var/log/messages from about the time I tried mounting

Mar  6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 4096 --> 
1024
Mar  6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 1024 --> 
4096
Mar  6 01:14:39 ilneval kernel: reiserfs: found format "3.6" with standard 
journal
Mar  6 01:14:41 ilneval kernel: Unable to handle kernel paging request at 
virtual address e09ce004
Mar  6 01:14:41 ilneval kernel:  printing eip:
Mar  6 01:14:41 ilneval kernel: c01839b5
Mar  6 01:14:41 ilneval kernel: *pde = 1f5f7067
Mar  6 01:14:41 ilneval kernel: *pte = 00000000
Mar  6 01:14:41 ilneval kernel: Oops: 0002
Mar  6 01:14:41 ilneval kernel: CPU:    0
Mar  6 01:14:41 ilneval kernel: EIP:    0010:[<c01839b5>]    Not tainted
Mar  6 01:14:41 ilneval kernel: EFLAGS: 00010286
Mar  6 01:14:41 ilneval kernel: eax: dae13bc0   ebx: e09c6000   ecx: dae13c08   
edx: dae13bc0
Mar  6 01:14:41 ilneval kernel: esi: df26a000   edi: 00001000   ebp: dbf32000   
esp: dbeb1e2c
Mar  6 01:14:41 ilneval kernel: ds: 0018   es: 0018   ss: 0018
Mar  6 01:14:41 ilneval kernel: Process mount (pid: 829, stackpage=dbeb1000)
Mar  6 01:14:41 ilneval kernel: Stack: 00000902 00001003 00001000 00000003 
00000001 df26a000 00000902 dbf32000
Mar  6 01:14:41 ilneval kernel:        c01843cc df26a000 00000400 00002000 
dbeb1e68 00000001 00000000 00000000
Mar  6 01:14:41 ilneval kernel:        00000246 00000000 00000000 00000902 
fffffff3 df26a000 00000001 c013a4ba
Mar  6 01:14:41 ilneval kernel: Call Trace:    [<c01843cc>] [<c013a4ba>] 
[<c013ad4b>] [<c014c8ae>] [<c013b0d0>]
Mar  6 01:14:41 ilneval kernel:   [<c014da3e>] [<c014dd6c>] [<c014db95>] 
[<c014e15a>] [<c010745f>]
Mar  6 01:14:41 ilneval kernel:
Mar  6 01:14:41 ilneval kernel: Code: 89 44 fb 04 b8 01 00 00 00 8b 96 f4 00 
00 00 8b 4c fa 04 85



On Friday 05 March 2004 12:25 pm, Corey McGuire wrote:
> That kinda worked!!!!!! I need to FSCK it, but i'm still afraid of fscking
> it up...
>
> Does anyone in San Jose/San Francisco/Anywhere-in-frag'n-California have a
> free TB I can use for a DD?  I will offer you my first child!
>
> If I need to sweeten the deal, I have LOTS to share... I have a TB of
> goodies just looking to be backed up!
>
> On Friday 05 March 2004 10:14 am, you wrote:
> > I had a 2 disk failure; I will explain what I did.
> > 1 disk was bad; it affected all disks on that SCSI buss.
> > The RAID software got into a bad state, I think I needed to reboot, or
> > power cycle.
> > After the reboot, it said 2 disks were non fresh or whatever.
> > My array had 14 disks, 7 on the buss with the 2 non fresh disks.
> > I could not do a dd read test with much success on most of the disks,
> > maybe 2 or 3 seemed ok, but not if I did 2 dd's at the same time.
> > So I unplugged all disks but 1, tested the 1.  If success repeat with the
> > next disk.  I found 1 disk that did not work.  So I connected the 6 good
> > disks.  Did 6 dd's at the same time, all was well.
> >
> > So, now I have 13 of 14 disks and 1 of the 13 is non fresh.  I issued
> > this command.
> >
> > mdadm -A --force /dev/md2 --scan
> > For some reason my filesystem was corrupt.  I noticed that the spare disk
> > was in the list.  I knew the rebuild to the spare never finished.  It may
> > not have been synced at all since so many disks were not working.  So, I
> > knew the spare should not be part of the array, yet!
> >
> > I had trouble stopping the array, so reboot.
> >
> > This time I listed the disks excluding the spare and the failed disk.
> >
> > mdadm -A --force /dev/md2 /dev/sdk1 /dev/sdd1 /dev/sdl1 /dev/sde1
> > /dev/sdm1 /dev/sdf1 /dev/sdn1 /dev/sdg1 /dev/sdh1 /dev/sdo1 /dev/sdi1
> > /dev/sdp1 /dev/sdj1
> >
> > I did not include the missing disk, but I did include the non fresh disk.
> > Now my filesystem is fine.
> >
> > I added the spare, it re-built, a good day!  I bet if this had happened
> > to a hardware RAID it could not have been saved.
> >
> > I replaced the bad disk and added it as a spare.
> > That was about 1 month ago, everything is still fine.
> >
> > You will need to install mdadm if you don't have it.  mdadm does not use
> > raidtab, it uses /etc/mdadm.conf
> >
> > Man mdadm for details!
> >
> > Good luck!
> >
> > Guy
> >
> > =========================================================================
> >== = Tips:
> >
> > This will give details of each disk.
> > mdadm -E /dev/hda3
> > repeat for hdc3, hde3, hdg3, hdi3, hdk3.
> >
> > dd test...  To test a disk to determine if the surface is good.
> > This is just a read test!
> > dd if=/dev/hda of=/dev/null bs=64k
> > repeat for hdc, hde, hdg, hdi, hdk.
> >
> > My mdadm.conf:
> > MAILADDR bugzilla@watkins-home.com
> > PROGRAM /root/bin/handle-mdadm-events
> >
> > DEVICE /dev/sd[abcdefghijklmnopqrstuvwxyz][12]
> >
> > ARRAY /dev/md0 level=raid1 num-devices=2
> > UUID=1fb2890c:2c9c47bf:db12e1e3:16cd7ffe
> >
> > ARRAY /dev/md1 level=raid1 num-devices=2
> > UUID=8f183b62:ea93fe30:a842431c:4b93c7bb
> >
> > ARRAY /dev/md2 level=raid5 num-devices=14
> > UUID=8357a389:8853c2d1:f160d155:6b4e1b99
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2004-03-06  9:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-05 17:26 RAID 5 lost two disks Corey McGuire
2004-03-05 18:05 ` Corey McGuire
2004-03-05 20:25   ` Corey McGuire
2004-03-06  9:56     ` Corey McGuire [this message]
2004-03-06 22:25       ` RAID 5 lost two disks : anyone know of reiser recovery tools? Corey McGuire
2004-03-05 23:07 ` RAID 5 lost two disks Lars Marowsky-Bree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200403060156.11808.coreyfro@coreyfro.com \
    --to=coreyfro@coreyfro.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.