From: Corey McGuire <coreyfro@coreyfro.com>
To: linux-raid@vger.kernel.org
Subject: Re: RAID 5 lost two disks
Date: Sat, 6 Mar 2004 01:56:11 -0800 [thread overview]
Message-ID: <200403060156.11808.coreyfro@coreyfro.com> (raw)
In-Reply-To: <200403051225.30661.coreyfro@coreyfro.com>
Well, I got the RAID up, I had reiserfsck work its mojo (it looks like I lost
lots of folder names, but the files appear to remember who they are)
BUT mount segfaults (or something segfaults) every time I try to mount the
damn thing...
I'm going to try running 2.6.somthing, hoping that maybe of the tools I built
was just too new for suse 8.2/linux 2.4.23... but i highly doubt it... who
knows, maybe 2.6 will behave more nicely... i hope mount -o ro will be enough
to protect me if it doesn't... who knows...
any ideas what might be segfaulting mount?...
this is from /var/log/messages from about the time I tried mounting
Mar 6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 4096 -->
1024
Mar 6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 1024 -->
4096
Mar 6 01:14:39 ilneval kernel: reiserfs: found format "3.6" with standard
journal
Mar 6 01:14:41 ilneval kernel: Unable to handle kernel paging request at
virtual address e09ce004
Mar 6 01:14:41 ilneval kernel: printing eip:
Mar 6 01:14:41 ilneval kernel: c01839b5
Mar 6 01:14:41 ilneval kernel: *pde = 1f5f7067
Mar 6 01:14:41 ilneval kernel: *pte = 00000000
Mar 6 01:14:41 ilneval kernel: Oops: 0002
Mar 6 01:14:41 ilneval kernel: CPU: 0
Mar 6 01:14:41 ilneval kernel: EIP: 0010:[<c01839b5>] Not tainted
Mar 6 01:14:41 ilneval kernel: EFLAGS: 00010286
Mar 6 01:14:41 ilneval kernel: eax: dae13bc0 ebx: e09c6000 ecx: dae13c08
edx: dae13bc0
Mar 6 01:14:41 ilneval kernel: esi: df26a000 edi: 00001000 ebp: dbf32000
esp: dbeb1e2c
Mar 6 01:14:41 ilneval kernel: ds: 0018 es: 0018 ss: 0018
Mar 6 01:14:41 ilneval kernel: Process mount (pid: 829, stackpage=dbeb1000)
Mar 6 01:14:41 ilneval kernel: Stack: 00000902 00001003 00001000 00000003
00000001 df26a000 00000902 dbf32000
Mar 6 01:14:41 ilneval kernel: c01843cc df26a000 00000400 00002000
dbeb1e68 00000001 00000000 00000000
Mar 6 01:14:41 ilneval kernel: 00000246 00000000 00000000 00000902
fffffff3 df26a000 00000001 c013a4ba
Mar 6 01:14:41 ilneval kernel: Call Trace: [<c01843cc>] [<c013a4ba>]
[<c013ad4b>] [<c014c8ae>] [<c013b0d0>]
Mar 6 01:14:41 ilneval kernel: [<c014da3e>] [<c014dd6c>] [<c014db95>]
[<c014e15a>] [<c010745f>]
Mar 6 01:14:41 ilneval kernel:
Mar 6 01:14:41 ilneval kernel: Code: 89 44 fb 04 b8 01 00 00 00 8b 96 f4 00
00 00 8b 4c fa 04 85
On Friday 05 March 2004 12:25 pm, Corey McGuire wrote:
> That kinda worked!!!!!! I need to FSCK it, but i'm still afraid of fscking
> it up...
>
> Does anyone in San Jose/San Francisco/Anywhere-in-frag'n-California have a
> free TB I can use for a DD? I will offer you my first child!
>
> If I need to sweeten the deal, I have LOTS to share... I have a TB of
> goodies just looking to be backed up!
>
> On Friday 05 March 2004 10:14 am, you wrote:
> > I had a 2 disk failure; I will explain what I did.
> > 1 disk was bad; it affected all disks on that SCSI buss.
> > The RAID software got into a bad state, I think I needed to reboot, or
> > power cycle.
> > After the reboot, it said 2 disks were non fresh or whatever.
> > My array had 14 disks, 7 on the buss with the 2 non fresh disks.
> > I could not do a dd read test with much success on most of the disks,
> > maybe 2 or 3 seemed ok, but not if I did 2 dd's at the same time.
> > So I unplugged all disks but 1, tested the 1. If success repeat with the
> > next disk. I found 1 disk that did not work. So I connected the 6 good
> > disks. Did 6 dd's at the same time, all was well.
> >
> > So, now I have 13 of 14 disks and 1 of the 13 is non fresh. I issued
> > this command.
> >
> > mdadm -A --force /dev/md2 --scan
> > For some reason my filesystem was corrupt. I noticed that the spare disk
> > was in the list. I knew the rebuild to the spare never finished. It may
> > not have been synced at all since so many disks were not working. So, I
> > knew the spare should not be part of the array, yet!
> >
> > I had trouble stopping the array, so reboot.
> >
> > This time I listed the disks excluding the spare and the failed disk.
> >
> > mdadm -A --force /dev/md2 /dev/sdk1 /dev/sdd1 /dev/sdl1 /dev/sde1
> > /dev/sdm1 /dev/sdf1 /dev/sdn1 /dev/sdg1 /dev/sdh1 /dev/sdo1 /dev/sdi1
> > /dev/sdp1 /dev/sdj1
> >
> > I did not include the missing disk, but I did include the non fresh disk.
> > Now my filesystem is fine.
> >
> > I added the spare, it re-built, a good day! I bet if this had happened
> > to a hardware RAID it could not have been saved.
> >
> > I replaced the bad disk and added it as a spare.
> > That was about 1 month ago, everything is still fine.
> >
> > You will need to install mdadm if you don't have it. mdadm does not use
> > raidtab, it uses /etc/mdadm.conf
> >
> > Man mdadm for details!
> >
> > Good luck!
> >
> > Guy
> >
> > =========================================================================
> >== = Tips:
> >
> > This will give details of each disk.
> > mdadm -E /dev/hda3
> > repeat for hdc3, hde3, hdg3, hdi3, hdk3.
> >
> > dd test... To test a disk to determine if the surface is good.
> > This is just a read test!
> > dd if=/dev/hda of=/dev/null bs=64k
> > repeat for hdc, hde, hdg, hdi, hdk.
> >
> > My mdadm.conf:
> > MAILADDR bugzilla@watkins-home.com
> > PROGRAM /root/bin/handle-mdadm-events
> >
> > DEVICE /dev/sd[abcdefghijklmnopqrstuvwxyz][12]
> >
> > ARRAY /dev/md0 level=raid1 num-devices=2
> > UUID=1fb2890c:2c9c47bf:db12e1e3:16cd7ffe
> >
> > ARRAY /dev/md1 level=raid1 num-devices=2
> > UUID=8f183b62:ea93fe30:a842431c:4b93c7bb
> >
> > ARRAY /dev/md2 level=raid5 num-devices=14
> > UUID=8357a389:8853c2d1:f160d155:6b4e1b99
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2004-03-06 9:56 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-05 17:26 RAID 5 lost two disks Corey McGuire
2004-03-05 18:05 ` Corey McGuire
2004-03-05 20:25 ` Corey McGuire
2004-03-06 9:56 ` Corey McGuire [this message]
2004-03-06 22:25 ` RAID 5 lost two disks : anyone know of reiser recovery tools? Corey McGuire
2004-03-05 23:07 ` RAID 5 lost two disks Lars Marowsky-Bree
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200403060156.11808.coreyfro@coreyfro.com \
--to=coreyfro@coreyfro.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).