* Re: RAID5 causing lockups
@ 2003-06-26 3:54 Corey McGuire
2003-06-26 11:46 ` Mike Black
2003-06-26 13:32 ` Matthew Mitchell
0 siblings, 2 replies; 15+ messages in thread
From: Corey McGuire @ 2003-06-26 3:54 UTC (permalink / raw)
To: linux-raid
Well, two of my drives did have an older bios, but the upgrade changed nothing.
I noticed that even unmounted, as long as I didn't raidstop the device, the system still crashes.
I tried setting down my bios as much as possible, and I am looking to do the same with the kernel, 2.4.21. I'll try the magic sysrq key, but I can't find my nulmodem cable to save my life, so I'll have to barrow one from work.
My server marches on, but without /dev/md2... I'll try just letting it sit, /dev/md2 intact, over night, but for now, I need it up, even if it is only for fits and spurts.
Thanks everyone, keep the ideas rolling in.
<sigh>
>Hey folks,
>
>I just upgraded my system from a ~200GB mirror to a ~1TB RAID5, but all has
>not transitioned well.
>
>I really don't know how to debug this issue, though I have tried. I gave
>up this morning before work, but I was going to try the magickey next
>(something I don't really know how to use, but anything for a clue)
>followed by upgrading to 2.4.21.
>
>The lock up is typical to a system with a failing drive; the system is
>responsive to input, but nothing happens. Keyboard works fine, but
>programs become idle (not really crashing.) I tried keeping "top" up,
>hoping I would see something obvious, like raid5syncd doing something
>strange, but if it does, top doesn't update after the problem.
>
>The lockups happen even if the system is doing nothing (other than
>raid5syncd, which is awfully busy since my RAID won't stay up)
>
>If I unmount the RAID5 and RAIDSTOP it, my system will work fine, but I'm
>out 1TB of disk. Right now, I have it running the bare essentials (all
>services on, but my /home directory has only public_html and mail stuff for
>each user.)
>
>Anything I can do to get more information out of this problem? I don't
>really know where to look.
>
>
>System Infro
>=======================================================================
>
>My kernel is 2.4.20, my raid tools is raidtools-0.90, no patches on
>anything, home built distro (linux from scratch.) Had been running on a
>mirror for nearly a year.
>
>Each drive on my system is connected to promise UltraATA 100 controllers.
>I have 6 drives and 3 controllers. Each drive is a 200GB WD drive, set to
>"Single/Master" on their channel.
>
>No device has a slave.
>
>Drives are hda hdc hde hdg hdi hdk
>
>------- Each drive is configured exactly like the device below -------
>
>Disk /dev/hda: 255 heads, 63 sectors, 24321 cylinders
>Units = cylinders of 16065 * 512 bytes
>
> Device Boot Start End Blocks Id System
>/dev/hda1 1 319 2562336 fd Linux raid autodetect
>/dev/hda2 320 352 265072+ 82 Linux swap
>/dev/hda3 353 24321 192530992+ fd Linux raid autodetect
>
>------------------------- Here is my raidtab -------------------------
>
>raiddev /dev/md0
> raid-level 1
> chunk-size 32
> nr-raid-disks 2
> nr-spare-disks 0
> persistent-superblock 1
> device /dev/hda1
> raid-disk 0
> device /dev/hdc1
> raid-disk 1
>
>raiddev /dev/md1
> raid-level 1
> chunk-size 32
> nr-raid-disks 2
> nr-spare-disks 0
> persistent-superblock 1
> device /dev/hde1
> raid-disk 0
> device /dev/hdg1
> raid-disk 1
>
>raiddev /dev/md2
> raid-level 5
> chunk-size 32
> nr-raid-disks 6
> nr-spare-disks 0
> persistent-superblock 1
> device /dev/hda3
> raid-disk 0
> device /dev/hdc3
> raid-disk 1
> device /dev/hde3
> raid-disk 2
> device /dev/hdg3
> raid-disk 3
> device /dev/hdi3
> raid-disk 4
> device /dev/hdk3
> raid-disk 5
>
>raiddev /dev/md3
> raid-level 1
> chunk-size 32
> nr-raid-disks 2
> nr-spare-disks 0
> persistent-superblock 1
> device /dev/hdi1
> raid-disk 0
> device /dev/hdk1
> raid-disk 1
>
>-------------------------- Here is my fstab --------------------------
>
># Begin /etc/fstab
>
># filesystem mount-point fs-type options dump fsck-order
>
>/dev/md0 / reiserfs defaults 1 1
>/dev/md1 /mnt/backup reiserfs noauto,defaults 1 3
>/dev/md2 /home reiserfs defaults 1 2
>/dev/hda2 swap swap pri=42 0 0
>/dev/hdc2 swap swap pri=42 0 0
>/dev/hde2 swap swap pri=42 0 0
>/dev/hdg2 swap swap pri=42 0 0
>/dev/hdi2 swap swap pri=42 0 0
>/dev/hdk2 swap swap pri=42 0 0
>proc /proc proc defaults 0 0
>
># End /etc/fstab
>
>=======================================================================
>
>Let me know if I missed anything (probably lots.)
>
>Thanks for your time.
>
>
>/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
>
>coreyfro@coreyfro.com
>http://www.coreyfro.com/
>http://stats.distributed.net/rc5-64/psummary.php3?id=196879
>ICQ : 3168059
>
>-----BEGIN GEEK CODE BLOCK-----
>GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
>O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
>Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au w+
>v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
>------END GEEK CODE BLOCK------
>
>Home of Geek Code - http://www.geekcode.com/
>The Geek Code Decoder Page - http://www.ebb.org/ungeek//
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
coreyfro@coreyfro.com
http://www.coreyfro.com/
http://stats.distributed.net/rc5-64/psearch.php3?st=coreyfro
ICQ : 3168059
-----BEGIN GEEK CODE BLOCK-----
GCS !d--(+) s: a- C++++$ UL++>++++ P+ L++>++++ E- W+++$ N++ o? K? w++++$>+++++$ O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+) Dl++(++++) D++ G++(-) e>+++ h++(---) r++>+$ y++**>$ H++++ n---(----) p? !au w+ v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
------END GEEK CODE BLOCK------
Home of Geek Code - http://www.geekcode.com/
The Geek Code Decoder Page - http://www.ebb.org/ungeek//
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID5 causing lockups
2003-06-26 3:54 RAID5 causing lockups Corey McGuire
@ 2003-06-26 11:46 ` Mike Black
2003-06-26 13:32 ` Matthew Mitchell
1 sibling, 0 replies; 15+ messages in thread
From: Mike Black @ 2003-06-26 11:46 UTC (permalink / raw)
To: Corey McGuire, linux-raid
Why don't you try creating a 3-disk RAID5, 4-disk, etc.
Perhaps you have one bad disk which this should point out.
Also...I don't think we ever heard what kind of power supply you have.
You might be overloading your system. If power is a problem this will also probably show good behavior with 3 disks instead of 6.
So...if you find out that 3 is OK, 4 is OK, 5 causes problems -- then you try the 5-disk again with a different drive than the last
one and if it still fails you probably have a power problem.
----- Original Message -----
From: "Corey McGuire" <coreyfro@coreyfro.com>
To: <linux-raid@vger.kernel.org>
Sent: Wednesday, June 25, 2003 11:54 PM
Subject: Re: RAID5 causing lockups
> Well, two of my drives did have an older bios, but the upgrade changed nothing.
>
> I noticed that even unmounted, as long as I didn't raidstop the device, the system still crashes.
>
> I tried setting down my bios as much as possible, and I am looking to do the same with the kernel, 2.4.21. I'll try the magic
sysrq key, but I can't find my nulmodem cable to save my life, so I'll have to barrow one from work.
>
> My server marches on, but without /dev/md2... I'll try just letting it sit, /dev/md2 intact, over night, but for now, I need it
up, even if it is only for fits and spurts.
>
> Thanks everyone, keep the ideas rolling in.
>
> <sigh>
>
> >Hey folks,
> >
> >I just upgraded my system from a ~200GB mirror to a ~1TB RAID5, but all has
> >not transitioned well.
> >
> >I really don't know how to debug this issue, though I have tried. I gave
> >up this morning before work, but I was going to try the magickey next
> >(something I don't really know how to use, but anything for a clue)
> >followed by upgrading to 2.4.21.
> >
> >The lock up is typical to a system with a failing drive; the system is
> >responsive to input, but nothing happens. Keyboard works fine, but
> >programs become idle (not really crashing.) I tried keeping "top" up,
> >hoping I would see something obvious, like raid5syncd doing something
> >strange, but if it does, top doesn't update after the problem.
> >
> >The lockups happen even if the system is doing nothing (other than
> >raid5syncd, which is awfully busy since my RAID won't stay up)
> >
> >If I unmount the RAID5 and RAIDSTOP it, my system will work fine, but I'm
> >out 1TB of disk. Right now, I have it running the bare essentials (all
> >services on, but my /home directory has only public_html and mail stuff for
> >each user.)
> >
> >Anything I can do to get more information out of this problem? I don't
> >really know where to look.
> >
> >
> >System Infro
> >=======================================================================
> >
> >My kernel is 2.4.20, my raid tools is raidtools-0.90, no patches on
> >anything, home built distro (linux from scratch.) Had been running on a
> >mirror for nearly a year.
> >
> >Each drive on my system is connected to promise UltraATA 100 controllers.
> >I have 6 drives and 3 controllers. Each drive is a 200GB WD drive, set to
> >"Single/Master" on their channel.
> >
> >No device has a slave.
> >
> >Drives are hda hdc hde hdg hdi hdk
> >
> >------- Each drive is configured exactly like the device below -------
> >
> >Disk /dev/hda: 255 heads, 63 sectors, 24321 cylinders
> >Units = cylinders of 16065 * 512 bytes
> >
> > Device Boot Start End Blocks Id System
> >/dev/hda1 1 319 2562336 fd Linux raid autodetect
> >/dev/hda2 320 352 265072+ 82 Linux swap
> >/dev/hda3 353 24321 192530992+ fd Linux raid autodetect
> >
> >------------------------- Here is my raidtab -------------------------
> >
> >raiddev /dev/md0
> > raid-level 1
> > chunk-size 32
> > nr-raid-disks 2
> > nr-spare-disks 0
> > persistent-superblock 1
> > device /dev/hda1
> > raid-disk 0
> > device /dev/hdc1
> > raid-disk 1
> >
> >raiddev /dev/md1
> > raid-level 1
> > chunk-size 32
> > nr-raid-disks 2
> > nr-spare-disks 0
> > persistent-superblock 1
> > device /dev/hde1
> > raid-disk 0
> > device /dev/hdg1
> > raid-disk 1
> >
> >raiddev /dev/md2
> > raid-level 5
> > chunk-size 32
> > nr-raid-disks 6
> > nr-spare-disks 0
> > persistent-superblock 1
> > device /dev/hda3
> > raid-disk 0
> > device /dev/hdc3
> > raid-disk 1
> > device /dev/hde3
> > raid-disk 2
> > device /dev/hdg3
> > raid-disk 3
> > device /dev/hdi3
> > raid-disk 4
> > device /dev/hdk3
> > raid-disk 5
> >
> >raiddev /dev/md3
> > raid-level 1
> > chunk-size 32
> > nr-raid-disks 2
> > nr-spare-disks 0
> > persistent-superblock 1
> > device /dev/hdi1
> > raid-disk 0
> > device /dev/hdk1
> > raid-disk 1
> >
> >-------------------------- Here is my fstab --------------------------
> >
> ># Begin /etc/fstab
> >
> ># filesystem mount-point fs-type options dump fsck-order
> >
> >/dev/md0 / reiserfs defaults 1 1
> >/dev/md1 /mnt/backup reiserfs noauto,defaults 1 3
> >/dev/md2 /home reiserfs defaults 1 2
> >/dev/hda2 swap swap pri=42 0 0
> >/dev/hdc2 swap swap pri=42 0 0
> >/dev/hde2 swap swap pri=42 0 0
> >/dev/hdg2 swap swap pri=42 0 0
> >/dev/hdi2 swap swap pri=42 0 0
> >/dev/hdk2 swap swap pri=42 0 0
> >proc /proc proc defaults 0 0
> >
> ># End /etc/fstab
> >
> >=======================================================================
> >
> >Let me know if I missed anything (probably lots.)
> >
> >Thanks for your time.
> >
> >
> >/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
> >
> >coreyfro@coreyfro.com
> >http://www.coreyfro.com/
> >http://stats.distributed.net/rc5-64/psummary.php3?id=196879
> >ICQ : 3168059
> >
> >-----BEGIN GEEK CODE BLOCK-----
> >GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
> >O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
> >Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au w+
> >v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
> >------END GEEK CODE BLOCK------
> >
> >Home of Geek Code - http://www.geekcode.com/
> >The Geek Code Decoder Page - http://www.ebb.org/ungeek//
> >
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
>
> /\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
>
> coreyfro@coreyfro.com
> http://www.coreyfro.com/
> http://stats.distributed.net/rc5-64/psearch.php3?st=coreyfro
> ICQ : 3168059
>
> -----BEGIN GEEK CODE BLOCK-----
> GCS !d--(+) s: a- C++++$ UL++>++++ P+ L++>++++ E- W+++$ N++ o? K? w++++$>+++++$ O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+)
!X- R(+) !tv b-(+) Dl++(++++) D++ G++(-) e>+++ h++(---) r++>+$ y++**>$ H++++ n---(----) p? !au w+ v- 3+>++ j- G'''' B--- u+++*** f*
Quake++++>+++++$
> ------END GEEK CODE BLOCK------
>
> Home of Geek Code - http://www.geekcode.com/
> The Geek Code Decoder Page - http://www.ebb.org/ungeek//
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID5 causing lockups
2003-06-26 3:54 RAID5 causing lockups Corey McGuire
2003-06-26 11:46 ` Mike Black
@ 2003-06-26 13:32 ` Matthew Mitchell
1 sibling, 0 replies; 15+ messages in thread
From: Matthew Mitchell @ 2003-06-26 13:32 UTC (permalink / raw)
To: Corey McGuire; +Cc: linux-raid
Going along with some suggestions of other respondents, be very
suspicious of your power cables. I had a hell of a time getting a
stable raid running because of dodgy power cables. More accurately, the
5v or 12v lines in some of the little plugs were kind of loose, and they
would lose contact. So get in there with a pair of teeny pliers or
tweezers and crimp those babies on tight.
As an aside, has anyone ever experimented with any type of conductive
compound when putting these things together? IMO the cheesiest part of
the setup is the crappy PC-style power connector, and I wondered about
various solutions from compound to soldering on a PCB for a more
reliable means of powering drives.
Best of luck to you.
Corey McGuire wrote:
> Well, two of my drives did have an older bios, but the upgrade changed nothing.
>
> I noticed that even unmounted, as long as I didn't raidstop the device, the system still crashes.
>
> I tried setting down my bios as much as possible, and I am looking to do the same with the kernel, 2.4.21. I'll try the magic sysrq key, but I can't find my nulmodem cable to save my life, so I'll have to barrow one from work.
>
> My server marches on, but without /dev/md2... I'll try just letting it sit, /dev/md2 intact, over night, but for now, I need it up, even if it is only for fits and spurts.
>
> Thanks everyone, keep the ideas rolling in.
--
Matthew Mitchell
Systems Programmer/Administrator matthew@geodev.com
Geophysical Development Corporation phone 713 782 1234
1 Riverway Suite 2100, Houston, TX 77056 fax 713 782 1829
^ permalink raw reply [flat|nested] 15+ messages in thread
* re: RAID5 causing lockups
@ 2003-06-26 17:34 Corey McGuire
2003-06-27 5:02 ` Corey McGuire
0 siblings, 1 reply; 15+ messages in thread
From: Corey McGuire @ 2003-06-26 17:34 UTC (permalink / raw)
To: linux-raid
Much progress has been made, but success is still out of reach.
First of all, 2.4.21 has been very helpful. Feedback regarding drive
problems is much more verbose. I don't know who to blame, the RAID people,
the ATA people, or the promise driver people, but immediately, I found that
one of my controllers was hosing up the works. I moved the devices from
said controller to my VIA onboard controller and gained about 5MB/second on
the rebuild speed. I don't know if this is because 2.4.21 is faster, VIA
is faster, I was saturating my PCI bus (since the VIA controller in on the
Southbridge) or because I was previously getting these errors and no
feedback.
Alas, problem persists, but I have found out why (90% certain.)
Now when there is a crash, the system spits out why and panics. It looks
to be HDA (or HDA is getting the blame) and, thanks to a seemingly
pointless script I wrote to watch the rebuild, I found that the system dies
at around 12.5% on the RAID5 rebuild every time.
Bad disk? Maybe, probably, but I'll keep banging my head against it for a
while.
Score,
2.4.21 + progress script 1
2.4.20 + crossing fingers 0
I am currently running a kernel with DMA turned off by default. This
sounded like a good idea last night, around 4 in the morning, but now it
sounds like an exercise in futility. The idea came to me shortly after I
was visited by the bovine-fairy. She told me that everything can be fixed
with "moon pies." I know this apparition was real and not a hallucination
because, until last night, I had never heard of "moon pies." After a quick
search of google, sure enough, moon pies; they look tasty, maybe she's
right.
Score
Bovine fairies 1
Sleep depravation 0
At any rate, by my calculations, without DMA, it will take another 12hours
to get to the 12.5% fail point. I should be back from work by then.
Longevity through sloth.
To answer some questions,
My power situation is good. I have had a lot more juice getting sucked
through this power supply before. Used to be a dual P3's with 30MM
Peltiers and 3 10,000 RPM cheetahs. (Peltiers are not worth it, I had to
underclock my system and drop the voltage before it would run any cooler.)
I think these WD's draw 20 watts peak, 14 otherwise. My power supply is
~400 watts. Shouldn't be a problem, seeing as how I can run my mirrors
just fine for days, but die after turning my stripe on for minutes.
Building smaller RAID's. Yeah, I will give that a whirl, just to make sure
HDA is the problem. I don't think I need to yank HDA, I'll just remove it
from my RAIDTAB and mkraid again.
One point I'd like to make; why is a drive failure killing my RAID5? Kinda
defeats the purpose.
Here is the aforementioned script plus its results so you can see what I
see.
4tlods.sh (for the love of dog, sync! I said I was sleep deprived.)
while ((1)) ; do top -n 1 | head -n 20 ; echo ; cat /proc/mdstat ; done
2.4.21
12:12am up 19 min, 5 users, load average: 0.87, 1.06, 0.82
49 processes: 48 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 1.0% user, 52.5% system, 0.0% nice, 46.3% idle
Mem: 516592K av, 95204K used, 421388K free, 0K shrd, 52588K
buff
Swap: 1590384K av, 0K used, 1590384K free 17196K
cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
1 root 9 0 504 504 440 S 0.0 0.0 0:06 init
2 root 9 0 0 0 0 SW 0.0 0.0 0:00 keventd
3 root 19 19 0 0 0 SWN 0.0 0.0 0:00
ksoftirqd_CPU0
4 root 9 0 0 0 0 SW 0.0 0.0 0:00 kswapd
5 root 9 0 0 0 0 SW 0.0 0.0 0:00 bdflush
6 root 9 0 0 0 0 SW 0.0 0.0 0:00 kupdated
7 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 mdrecoveryd
8 root 7 -20 0 0 0 SW< 0.0 0.0 6:32 raid5d
9 root 19 19 0 0 0 DWN 0.0 0.0 1:08 raid5syncd
10 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
11 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
12 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
13 root 9 0 0 0 0 SW 0.0 0.0 0:00 kreiserfsd
Personalities : [raid1] [raid5]
read_ahead 1024 sectors
md0 : active raid1 hdc1[1] hda1[0]
2562240 blocks [2/2] [UU]
md1 : active raid1 hdg1[1] hde1[0]
2562240 blocks [2/2] [UU]
md3 : active raid1 hdk1[1] hdi1[0]
2562240 blocks [2/2] [UU]
md2 : active raid5 hdk3[5] hdi3[4] hdg3[3] hde3[2] hdc3[1] hda3[0]
962654400 blocks level 5, 32k chunk, algorithm 0 [6/6] [UUUUUU]
[==>..................] resync = 12.5% (24153592/192530880)
finish=134.7min speed=20822K/sec
unused devices: <none>
2.4.21
2:38am up 19 min, 1 user, load average: 0.63, 1.13, 0.89
42 processes: 41 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 0.9% user, 52.1% system, 0.0% nice, 46.8% idle
Mem: 516592K av, 89824K used, 426768K free, 0K shrd, 57908K
buff
Swap: 0K av, 0K used, 0K free 10644K
cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
1 root 8 0 504 504 440 S 0.0 0.0 0:06 init
2 root 9 0 0 0 0 SW 0.0 0.0 0:00 keventd
3 root 19 19 0 0 0 SWN 0.0 0.0 0:00
ksoftirqd_CPU0
4 root 9 0 0 0 0 SW 0.0 0.0 0:00 kswapd
5 root 9 0 0 0 0 SW 0.0 0.0 0:00 bdflush
6 root 9 0 0 0 0 SW 0.0 0.0 0:00 kupdated
7 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 mdrecoveryd
8 root 15 -20 0 0 0 SW< 0.0 0.0 6:29 raid5d
9 root 19 19 0 0 0 DWN 0.0 0.0 1:09 raid5syncd
14 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
15 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1syncd
16 root 9 0 0 0 0 SW 0.0 0.0 0:00 kreiserfsd
74 root 9 0 616 616 512 S 0.0 0.1 0:00 syslogd
Personalities : [raid1] [raid5]
read_ahead 1024 sectors
md0 : active raid1 hdc1[1] hda1[0]
2562240 blocks [2/2] [UU]
resync=DELAYED
md2 : active raid5 hdk3[5] hdi3[4] hdg3[3] hde3[2] hdc3[1] hda3[0]
962654400 blocks level 5, 32k chunk, algorithm 0 [6/6] [UUUUUU]
[==>..................] resync = 12.5% (24153596/192530880)
finish=139.2min speed=20147K/sec
unused devices: <none>
2.4.20
3:22am up 21 min, 1 user, load average: 1.04, 1.31, 1.02
47 processes: 46 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 0.9% user, 54.7% system, 0.0% nice, 44.2% idle
Mem: 516604K av, 125824K used, 390780K free, 0K shrd, 91628K
buff
Swap: 1590384K av, 0K used, 1590384K free 10796K
cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
1 root 9 0 504 504 440 S 0.0 0.0 0:10 init
2 root 9 0 0 0 0 SW 0.0 0.0 0:00 keventd
3 root 9 0 0 0 0 SW 0.0 0.0 0:00 kapmd
4 root 18 19 0 0 0 SWN 0.0 0.0 0:00
ksoftirqd_CPU0
5 root 9 0 0 0 0 SW 0.0 0.0 0:00 kswapd
6 root 9 0 0 0 0 SW 0.0 0.0 0:00 bdflush
7 root 9 0 0 0 0 SW 0.0 0.0 0:00 kupdated
8 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 mdrecoveryd
9 root 4 -20 0 0 0 SW< 0.0 0.0 7:16 raid5d
10 root 19 19 0 0 0 DWN 0.0 0.0 1:07 raid5syncd
11 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
12 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1syncd
13 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
Personalities : [raid1] [raid5] [multipath]
read_ahead 1024 sectors
md0 : active raid1 hdc1[1] hda1[0]
2562240 blocks [2/2] [UU]
resync=DELAYED
md1 : active raid1 hdg1[1] hde1[0]
2562240 blocks [2/2] [UU]
resync=DELAYED
md3 : active raid1 hdk1[1] hdi1[0]
2562240 blocks [2/2] [UU]
resync=DELAYED
md2 : active raid5 hdk3[5] hdi3[4] hdg3[3] hde3[2] hdc3[1] hda3[0]
962654400 blocks level 5, 32k chunk, algorithm 0 [6/6] [UUUUUU]
[==>..................] resync = 12.5% (24155416/192530880)
finish=181.1min speed=15487K/sec
unused devices: <none>
Thanks for your help everyone, I'll keep trying.
/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
coreyfro@coreyfro.com
http://www.coreyfro.com/
http://stats.distributed.net/rc5-64/psummary.php3?id=196879
ICQ : 3168059
-----BEGIN GEEK CODE BLOCK-----
GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au w+
v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
------END GEEK CODE BLOCK------
Home of Geek Code - http://www.geekcode.com/
The Geek Code Decoder Page - http://www.ebb.org/ungeek//
^ permalink raw reply [flat|nested] 15+ messages in thread* re: RAID5 causing lockups
2003-06-26 17:34 Corey McGuire
@ 2003-06-27 5:02 ` Corey McGuire
2003-06-27 5:32 ` Mike Dresser
0 siblings, 1 reply; 15+ messages in thread
From: Corey McGuire @ 2003-06-27 5:02 UTC (permalink / raw)
To: linux-raid
Whoa!!!
The strangest thing happened when I hit 12.7% on my RAID5 rebuild
9:56pm up 14:16, 3 users, load average: 3.33, 2.85, 2.59
51 processes: 44 sleeping, 6 running, 1 zombie, 0 stopped
CPU states: 1.2% user, 10.3% system, 0.0% nice, 4.8% idle
Mem: 516592K av, 511704K used, 4888K free, 0K shrd, 89408K buff
Swap: 1590384K av, 264K used, 1590120K free 394204K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
13 root 7 -20 0 0 0 SW< 25.8 0.0 0:26 raid1d
4299 root 0 -20 0 0 0 SW< 22.0 0.0 0:31 raid1d
4303 root 19 19 0 0 0 RWN 16.2 0.0 0:12 raid1syncd
6 root 9 0 0 0 0 SW 13.4 0.0 15:41 kupdated
14 root 20 19 0 0 0 RWN 7.6 0.0 0:11 raid1syncd
8 root -1 -20 0 0 0 SW< 5.7 0.0 29:37 raid5d
31151 root 10 0 0 0 0 Z 0.9 0.0 0:00 top <defunct>
31153 root 10 0 920 916 716 R 0.9 0.1 0:00 top
1 root 9 0 504 504 440 S 0.0 0.0 2:37 init
2 root 9 0 0 0 0 SW 0.0 0.0 0:02 keventd
3 root 19 19 0 0 0 SWN 0.0 0.0 35:11 ksoftirqd_CPU0
4 root 9 0 0 0 0 SW 0.0 0.0 0:37 kswapd
5 root 9 0 0 0 0 SW 0.0 0.0 0:00 bdflush
Personalities : [raid1] [raid5]
read_ahead 1024 sectors
md1 : active raid1 hdg1[1] hde1[0]
2562240 blocks [2/2] [UU]
[>....................] resync = 2.9% (75904/2562240) finish=50.8min speed=814K/sec
md0 : active raid1 hdc1[1] hda1[0]
2562240 blocks [2/2] [UU]
[>....................] resync = 2.7% (70656/2562240) finish=53.7min speed=769K/sec
md2 : active raid5 hdk3[5] hdi3[4] hdg3[3] hde3[2] hdc3[1] hda3[0](F)
962654400 blocks level 5, 32k chunk, algorithm 0 [6/5] [_UUUUU]
unused devices: <none>
It stopped rebuilding, and moved on to my mirrors... very odd. I'll try forcing another rebuild, but this is quasi good news.
*********** REPLY SEPARATOR ***********
On 6/26/2003 at 10:34 AM Corey McGuire wrote:
>Much progress has been made, but success is still out of reach.
>
>First of all, 2.4.21 has been very helpful. Feedback regarding drive
>problems is much more verbose. I don't know who to blame, the RAID people,
>the ATA people, or the promise driver people, but immediately, I found that
>one of my controllers was hosing up the works. I moved the devices from
>said controller to my VIA onboard controller and gained about 5MB/second on
>the rebuild speed. I don't know if this is because 2.4.21 is faster, VIA
>is faster, I was saturating my PCI bus (since the VIA controller in on the
>Southbridge) or because I was previously getting these errors and no
>feedback.
>
>Alas, problem persists, but I have found out why (90% certain.)
>
>Now when there is a crash, the system spits out why and panics. It looks
>to be HDA (or HDA is getting the blame) and, thanks to a seemingly
>pointless script I wrote to watch the rebuild, I found that the system dies
>at around 12.5% on the RAID5 rebuild every time.
>
>Bad disk? Maybe, probably, but I'll keep banging my head against it for a
>while.
>
>Score,
>2.4.21 + progress script 1
>2.4.20 + crossing fingers 0
>
>I am currently running a kernel with DMA turned off by default. This
>sounded like a good idea last night, around 4 in the morning, but now it
>sounds like an exercise in futility. The idea came to me shortly after I
>was visited by the bovine-fairy. She told me that everything can be fixed
>with "moon pies." I know this apparition was real and not a hallucination
>because, until last night, I had never heard of "moon pies." After a quick
>search of google, sure enough, moon pies; they look tasty, maybe she's
>right.
>
>Score
>Bovine fairies 1
>Sleep depravation 0
>
>At any rate, by my calculations, without DMA, it will take another 12hours
>to get to the 12.5% fail point. I should be back from work by then.
>Longevity through sloth.
>
>To answer some questions,
>
>My power situation is good. I have had a lot more juice getting sucked
>through this power supply before. Used to be a dual P3's with 30MM
>Peltiers and 3 10,000 RPM cheetahs. (Peltiers are not worth it, I had to
>underclock my system and drop the voltage before it would run any cooler.)
>I think these WD's draw 20 watts peak, 14 otherwise. My power supply is
>~400 watts. Shouldn't be a problem, seeing as how I can run my mirrors
>just fine for days, but die after turning my stripe on for minutes.
>
>Building smaller RAID's. Yeah, I will give that a whirl, just to make sure
>HDA is the problem. I don't think I need to yank HDA, I'll just remove it
>from my RAIDTAB and mkraid again.
>
>One point I'd like to make; why is a drive failure killing my RAID5? Kinda
>defeats the purpose.
>
>Here is the aforementioned script plus its results so you can see what I
>see.
>
>4tlods.sh (for the love of dog, sync! I said I was sleep deprived.)
>
>while ((1)) ; do top -n 1 | head -n 20 ; echo ; cat /proc/mdstat ; done
>
>2.4.21
>
>12:12am up 19 min, 5 users, load average: 0.87, 1.06, 0.82
>49 processes: 48 sleeping, 1 running, 0 zombie, 0 stopped
>CPU states: 1.0% user, 52.5% system, 0.0% nice, 46.3% idle
>Mem: 516592K av, 95204K used, 421388K free, 0K shrd, 52588K
>buff
>Swap: 1590384K av, 0K used, 1590384K free 17196K
>cached
>
> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
> 1 root 9 0 504 504 440 S 0.0 0.0 0:06 init
> 2 root 9 0 0 0 0 SW 0.0 0.0 0:00 keventd
> 3 root 19 19 0 0 0 SWN 0.0 0.0 0:00
>ksoftirqd_CPU0
> 4 root 9 0 0 0 0 SW 0.0 0.0 0:00 kswapd
> 5 root 9 0 0 0 0 SW 0.0 0.0 0:00 bdflush
> 6 root 9 0 0 0 0 SW 0.0 0.0 0:00 kupdated
> 7 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 mdrecoveryd
> 8 root 7 -20 0 0 0 SW< 0.0 0.0 6:32 raid5d
> 9 root 19 19 0 0 0 DWN 0.0 0.0 1:08 raid5syncd
> 10 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
> 11 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
> 12 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
> 13 root 9 0 0 0 0 SW 0.0 0.0 0:00 kreiserfsd
>
>Personalities : [raid1] [raid5]
>read_ahead 1024 sectors
>md0 : active raid1 hdc1[1] hda1[0]
> 2562240 blocks [2/2] [UU]
>
>md1 : active raid1 hdg1[1] hde1[0]
> 2562240 blocks [2/2] [UU]
>
>md3 : active raid1 hdk1[1] hdi1[0]
> 2562240 blocks [2/2] [UU]
>
>md2 : active raid5 hdk3[5] hdi3[4] hdg3[3] hde3[2] hdc3[1] hda3[0]
> 962654400 blocks level 5, 32k chunk, algorithm 0 [6/6] [UUUUUU]
> [==>..................] resync = 12.5% (24153592/192530880)
>finish=134.7min speed=20822K/sec
>unused devices: <none>
>
>
>2.4.21
>
>2:38am up 19 min, 1 user, load average: 0.63, 1.13, 0.89
>42 processes: 41 sleeping, 1 running, 0 zombie, 0 stopped
>CPU states: 0.9% user, 52.1% system, 0.0% nice, 46.8% idle
>Mem: 516592K av, 89824K used, 426768K free, 0K shrd, 57908K
>buff
>Swap: 0K av, 0K used, 0K free 10644K
>cached
>
> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
> 1 root 8 0 504 504 440 S 0.0 0.0 0:06 init
> 2 root 9 0 0 0 0 SW 0.0 0.0 0:00 keventd
> 3 root 19 19 0 0 0 SWN 0.0 0.0 0:00
>ksoftirqd_CPU0
> 4 root 9 0 0 0 0 SW 0.0 0.0 0:00 kswapd
> 5 root 9 0 0 0 0 SW 0.0 0.0 0:00 bdflush
> 6 root 9 0 0 0 0 SW 0.0 0.0 0:00 kupdated
> 7 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 mdrecoveryd
> 8 root 15 -20 0 0 0 SW< 0.0 0.0 6:29 raid5d
> 9 root 19 19 0 0 0 DWN 0.0 0.0 1:09 raid5syncd
> 14 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
> 15 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1syncd
> 16 root 9 0 0 0 0 SW 0.0 0.0 0:00 kreiserfsd
> 74 root 9 0 616 616 512 S 0.0 0.1 0:00 syslogd
>
>Personalities : [raid1] [raid5]
>read_ahead 1024 sectors
>md0 : active raid1 hdc1[1] hda1[0]
> 2562240 blocks [2/2] [UU]
> resync=DELAYED
>md2 : active raid5 hdk3[5] hdi3[4] hdg3[3] hde3[2] hdc3[1] hda3[0]
> 962654400 blocks level 5, 32k chunk, algorithm 0 [6/6] [UUUUUU]
> [==>..................] resync = 12.5% (24153596/192530880)
>finish=139.2min speed=20147K/sec
>unused devices: <none>
>
>
>2.4.20
>
>3:22am up 21 min, 1 user, load average: 1.04, 1.31, 1.02
>47 processes: 46 sleeping, 1 running, 0 zombie, 0 stopped
>CPU states: 0.9% user, 54.7% system, 0.0% nice, 44.2% idle
>Mem: 516604K av, 125824K used, 390780K free, 0K shrd, 91628K
>buff
>Swap: 1590384K av, 0K used, 1590384K free 10796K
>cached
>
> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
> 1 root 9 0 504 504 440 S 0.0 0.0 0:10 init
> 2 root 9 0 0 0 0 SW 0.0 0.0 0:00 keventd
> 3 root 9 0 0 0 0 SW 0.0 0.0 0:00 kapmd
> 4 root 18 19 0 0 0 SWN 0.0 0.0 0:00
>ksoftirqd_CPU0
> 5 root 9 0 0 0 0 SW 0.0 0.0 0:00 kswapd
> 6 root 9 0 0 0 0 SW 0.0 0.0 0:00 bdflush
> 7 root 9 0 0 0 0 SW 0.0 0.0 0:00 kupdated
> 8 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 mdrecoveryd
> 9 root 4 -20 0 0 0 SW< 0.0 0.0 7:16 raid5d
> 10 root 19 19 0 0 0 DWN 0.0 0.0 1:07 raid5syncd
> 11 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
> 12 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1syncd
> 13 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 raid1d
>
>Personalities : [raid1] [raid5] [multipath]
>read_ahead 1024 sectors
>md0 : active raid1 hdc1[1] hda1[0]
> 2562240 blocks [2/2] [UU]
> resync=DELAYED
>md1 : active raid1 hdg1[1] hde1[0]
> 2562240 blocks [2/2] [UU]
> resync=DELAYED
>md3 : active raid1 hdk1[1] hdi1[0]
> 2562240 blocks [2/2] [UU]
> resync=DELAYED
>md2 : active raid5 hdk3[5] hdi3[4] hdg3[3] hde3[2] hdc3[1] hda3[0]
> 962654400 blocks level 5, 32k chunk, algorithm 0 [6/6] [UUUUUU]
> [==>..................] resync = 12.5% (24155416/192530880)
>finish=181.1min speed=15487K/sec
>unused devices: <none>
>
>
>Thanks for your help everyone, I'll keep trying.
>
>
>/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
>
>coreyfro@coreyfro.com
>http://www.coreyfro.com/
>http://stats.distributed.net/rc5-64/psummary.php3?id=196879
>ICQ : 3168059
>
>-----BEGIN GEEK CODE BLOCK-----
>GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
>O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
>Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au w+
>v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
>------END GEEK CODE BLOCK------
>
>Home of Geek Code - http://www.geekcode.com/
>The Geek Code Decoder Page - http://www.ebb.org/ungeek//
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
coreyfro@coreyfro.com
http://www.coreyfro.com/
http://stats.distributed.net/rc5-64/psearch.php3?st=coreyfro
ICQ : 3168059
-----BEGIN GEEK CODE BLOCK-----
GCS !d--(+) s: a- C++++$ UL++>++++ P+ L++>++++ E- W+++$ N++ o? K? w++++$>+++++$ O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+) Dl++(++++) D++ G++(-) e>+++ h++(---) r++>+$ y++**>$ H++++ n---(----) p? !au w+ v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
------END GEEK CODE BLOCK------
Home of Geek Code - http://www.geekcode.com/
The Geek Code Decoder Page - http://www.ebb.org/ungeek//
^ permalink raw reply [flat|nested] 15+ messages in thread* re: RAID5 causing lockups
2003-06-27 5:02 ` Corey McGuire
@ 2003-06-27 5:32 ` Mike Dresser
2003-06-27 5:47 ` Corey McGuire
0 siblings, 1 reply; 15+ messages in thread
From: Mike Dresser @ 2003-06-27 5:32 UTC (permalink / raw)
To: Corey McGuire; +Cc: linux-raid
On Thu, 26 Jun 2003, Corey McGuire wrote:
> Whoa!!!
>
> The strangest thing happened when I hit 12.7% on my RAID5 rebuild
>
> md2 : active raid5 hdk3[5] hdi3[4] hdg3[3] hde3[2] hdc3[1] hda3[0](F)
> 962654400 blocks level 5, 32k chunk, algorithm 0 [6/5] [_UUUUU]
Ins't that a missing disk? Or do i just not remember raid5 properly.
> >md2 : active raid5 hdk3[5] hdi3[4] hdg3[3] hde3[2] hdc3[1] hda3[0]
> > 962654400 blocks level 5, 32k chunk, algorithm 0 [6/6] [UUUUUU]
> > [==>..................] resync = 12.5% (24153596/192530880)
Yeah, I thought something had changed here. What's up with that?
the (F) by hda, and I think you were talkinga bout HDA being bad?
Have you pulled hda out of the raid, and used the manufacturers utilities
to test this drive?
Mike
^ permalink raw reply [flat|nested] 15+ messages in thread
* re: RAID5 causing lockups
2003-06-27 5:32 ` Mike Dresser
@ 2003-06-27 5:47 ` Corey McGuire
0 siblings, 0 replies; 15+ messages in thread
From: Corey McGuire @ 2003-06-27 5:47 UTC (permalink / raw)
To: linux-raid
>Yeah, I thought something had changed here. What's up with that?
>
>the (F) by hda, and I think you were talkinga bout HDA being bad?
>
>Have you pulled hda out of the raid, and used the manufacturers utilities
>to test this drive?
>
>Mike
yeah, I saw the little f... and the (_UUUUU) bit too
I just made my floppy, as a matter of fact. Just waiting for the mirrors the rebuild ;-)
12 minutes to go. They are moving at a whopping 800K a piece! Look out!
UDMA is a four letter word.
I should write the Promise guys, the ATA guys and the RAID guys to inform everyone that my RAID5 didn't fail with UDMA enabled. Only, I'll have to make that sound that's a bad thing because "Didn't fail with UDMA enabled" sounds like a good thing, only its not, cuz it should have failed, cuz its bad, and stuff...
This was not the week to give up caffine...
/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
coreyfro@coreyfro.com
http://www.coreyfro.com/
http://stats.distributed.net/rc5-64/psearch.php3?st=coreyfro
ICQ : 3168059
-----BEGIN GEEK CODE BLOCK-----
GCS !d--(+) s: a- C++++$ UL++>++++ P+ L++>++++ E- W+++$ N++ o? K? w++++$>+++++$ O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+) Dl++(++++) D++ G++(-) e>+++ h++(---) r++>+$ y++**>$ H++++ n---(----) p? !au w+ v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
------END GEEK CODE BLOCK------
Home of Geek Code - http://www.geekcode.com/
The Geek Code Decoder Page - http://www.ebb.org/ungeek//
^ permalink raw reply [flat|nested] 15+ messages in thread
* RAID5 causing lockups
@ 2003-06-25 19:16 Corey McGuire
2003-06-25 19:28 ` Mike Dresser
2003-06-25 20:36 ` Matt Simonsen
0 siblings, 2 replies; 15+ messages in thread
From: Corey McGuire @ 2003-06-25 19:16 UTC (permalink / raw)
To: alewman, bort, corvus, kratz.franz, blatt.guy, linux-raid,
mario.scalise, harbeck.seth, phil
Hey folks,
I just upgraded my system from a ~200GB mirror to a ~1TB RAID5, but all has
not transitioned well.
I really don't know how to debug this issue, though I have tried. I gave
up this morning before work, but I was going to try the magickey next
(something I don't really know how to use, but anything for a clue)
followed by upgrading to 2.4.21.
The lock up is typical to a system with a failing drive; the system is
responsive to input, but nothing happens. Keyboard works fine, but
programs become idle (not really crashing.) I tried keeping "top" up,
hoping I would see something obvious, like raid5syncd doing something
strange, but if it does, top doesn't update after the problem.
The lockups happen even if the system is doing nothing (other than
raid5syncd, which is awfully busy since my RAID won't stay up)
If I unmount the RAID5 and RAIDSTOP it, my system will work fine, but I'm
out 1TB of disk. Right now, I have it running the bare essentials (all
services on, but my /home directory has only public_html and mail stuff for
each user.)
Anything I can do to get more information out of this problem? I don't
really know where to look.
System Infro
=======================================================================
My kernel is 2.4.20, my raid tools is raidtools-0.90, no patches on
anything, home built distro (linux from scratch.) Had been running on a
mirror for nearly a year.
Each drive on my system is connected to promise UltraATA 100 controllers.
I have 6 drives and 3 controllers. Each drive is a 200GB WD drive, set to
"Single/Master" on their channel.
No device has a slave.
Drives are hda hdc hde hdg hdi hdk
------- Each drive is configured exactly like the device below -------
Disk /dev/hda: 255 heads, 63 sectors, 24321 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hda1 1 319 2562336 fd Linux raid autodetect
/dev/hda2 320 352 265072+ 82 Linux swap
/dev/hda3 353 24321 192530992+ fd Linux raid autodetect
------------------------- Here is my raidtab -------------------------
raiddev /dev/md0
raid-level 1
chunk-size 32
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
device /dev/hda1
raid-disk 0
device /dev/hdc1
raid-disk 1
raiddev /dev/md1
raid-level 1
chunk-size 32
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
device /dev/hde1
raid-disk 0
device /dev/hdg1
raid-disk 1
raiddev /dev/md2
raid-level 5
chunk-size 32
nr-raid-disks 6
nr-spare-disks 0
persistent-superblock 1
device /dev/hda3
raid-disk 0
device /dev/hdc3
raid-disk 1
device /dev/hde3
raid-disk 2
device /dev/hdg3
raid-disk 3
device /dev/hdi3
raid-disk 4
device /dev/hdk3
raid-disk 5
raiddev /dev/md3
raid-level 1
chunk-size 32
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
device /dev/hdi1
raid-disk 0
device /dev/hdk1
raid-disk 1
-------------------------- Here is my fstab --------------------------
# Begin /etc/fstab
# filesystem mount-point fs-type options dump fsck-order
/dev/md0 / reiserfs defaults 1 1
/dev/md1 /mnt/backup reiserfs noauto,defaults 1 3
/dev/md2 /home reiserfs defaults 1 2
/dev/hda2 swap swap pri=42 0 0
/dev/hdc2 swap swap pri=42 0 0
/dev/hde2 swap swap pri=42 0 0
/dev/hdg2 swap swap pri=42 0 0
/dev/hdi2 swap swap pri=42 0 0
/dev/hdk2 swap swap pri=42 0 0
proc /proc proc defaults 0 0
# End /etc/fstab
=======================================================================
Let me know if I missed anything (probably lots.)
Thanks for your time.
/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
coreyfro@coreyfro.com
http://www.coreyfro.com/
http://stats.distributed.net/rc5-64/psummary.php3?id=196879
ICQ : 3168059
-----BEGIN GEEK CODE BLOCK-----
GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au w+
v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
------END GEEK CODE BLOCK------
Home of Geek Code - http://www.geekcode.com/
The Geek Code Decoder Page - http://www.ebb.org/ungeek//
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID5 causing lockups
2003-06-25 19:16 Corey McGuire
@ 2003-06-25 19:28 ` Mike Dresser
2003-06-25 19:41 ` Corey McGuire
2003-06-25 20:36 ` Matt Simonsen
1 sibling, 1 reply; 15+ messages in thread
From: Mike Dresser @ 2003-06-25 19:28 UTC (permalink / raw)
To: Corey McGuire; +Cc: linux-raid
On Wed, 25 Jun 2003, Corey McGuire wrote:
> I have 6 drives and 3 controllers. Each drive is a 200GB WD drive, set to
> "Single/Master" on their channel.
Go get the utility on wdc's site to fix the problems they have, and see
what happens after that.
http://support.wdc.com/download/index.asp#raidno3ware
They have problems with power management, and the drive is kicked out of
the raid array.
This may or may not be the trouble, but at least see if it needs it.
Mike
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID5 causing lockups
2003-06-25 19:28 ` Mike Dresser
@ 2003-06-25 19:41 ` Corey McGuire
2003-06-25 19:56 ` Mike Dresser
0 siblings, 1 reply; 15+ messages in thread
From: Corey McGuire @ 2003-06-25 19:41 UTC (permalink / raw)
To: linux-raid
NASTY!
Thanks, I'll give that a whirl. Is there a way I can kill all power
manglement outside of the BIOS just to make sure I prevent this? Once this
is working, I won't even need console blanking.
Should I make sure APM is killed in the kernel config too? I think I may
have turned it on when I added RAID5 support, thinking I'd use idle calls
now that I have 6 7200RPM drives to heat up my case ;-)
The system opporated in a mirror just fine for around year, if that makes a
difference... using two of these drives (hde and hdk if my memory serves)
*********** REPLY SEPARATOR ***********
On 6/25/2003 at 3:28 PM Mike Dresser wrote:
>On Wed, 25 Jun 2003, Corey McGuire wrote:
>
>> I have 6 drives and 3 controllers. Each drive is a 200GB WD drive, set
>to
>> "Single/Master" on their channel.
>
>Go get the utility on wdc's site to fix the problems they have, and see
>what happens after that.
>
>http://support.wdc.com/download/index.asp#raidno3ware
>
>They have problems with power management, and the drive is kicked out of
>the raid array.
>
>This may or may not be the trouble, but at least see if it needs it.
>
>Mike
/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
coreyfro@coreyfro.com
http://www.coreyfro.com/
http://stats.distributed.net/rc5-64/psummary.php3?id=196879
ICQ : 3168059
-----BEGIN GEEK CODE BLOCK-----
GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au w+
v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
------END GEEK CODE BLOCK------
Home of Geek Code - http://www.geekcode.com/
The Geek Code Decoder Page - http://www.ebb.org/ungeek//
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID5 causing lockups
2003-06-25 19:41 ` Corey McGuire
@ 2003-06-25 19:56 ` Mike Dresser
2003-06-25 20:51 ` Corey McGuire
0 siblings, 1 reply; 15+ messages in thread
From: Mike Dresser @ 2003-06-25 19:56 UTC (permalink / raw)
To: Corey McGuire; +Cc: linux-raid
On Wed, 25 Jun 2003, Corey McGuire wrote:
> NASTY!
>
> Thanks, I'll give that a whirl. Is there a way I can kill all power
> manglement outside of the BIOS just to make sure I prevent this? Once this
> is working, I won't even need console blanking.
>
Doh, i made the same mistake I made on irc a few weeks ago with someone.
Accoustic management, not power management
*silently beats head against wall*
Sorry about that, i hear the word management and my brain shuts down.
Anyways, the power management is fine. The drive manages its accoustic
noise or something. Search wdc's tech knowledge base for 3ware, and
you'll find the relative article there.
Mike
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID5 causing lockups
2003-06-25 19:56 ` Mike Dresser
@ 2003-06-25 20:51 ` Corey McGuire
0 siblings, 0 replies; 15+ messages in thread
From: Corey McGuire @ 2003-06-25 20:51 UTC (permalink / raw)
To: linux-raid
>Doh, i made the same mistake I made on irc a few weeks ago with someone.
>
>Accoustic management, not power management
>
>*silently beats head against wall*
ok.... cool... I'll check it out...
>Sorry about that, i hear the word management and my brain shuts down.
bwahahahahahaha!
>Anyways, the power management is fine. The drive manages its accoustic
>noise or something.
I followed the link... I'll try it when I get home...
thanks, again
/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
coreyfro@coreyfro.com
http://www.coreyfro.com/
http://stats.distributed.net/rc5-64/psummary.php3?id=196879
ICQ : 3168059
-----BEGIN GEEK CODE BLOCK-----
GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au w+
v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
------END GEEK CODE BLOCK------
Home of Geek Code - http://www.geekcode.com/
The Geek Code Decoder Page - http://www.ebb.org/ungeek//
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID5 causing lockups
2003-06-25 19:16 Corey McGuire
2003-06-25 19:28 ` Mike Dresser
@ 2003-06-25 20:36 ` Matt Simonsen
2003-06-25 20:56 ` Corey McGuire
1 sibling, 1 reply; 15+ messages in thread
From: Matt Simonsen @ 2003-06-25 20:36 UTC (permalink / raw)
To: Corey McGuire; +Cc: linux-raid
On Wed, 2003-06-25 at 12:16, Corey McGuire wrote:
> Hey folks,
>
> I just upgraded my system from a ~200GB mirror to a ~1TB RAID5, but all has
> not transitioned well.
Did the RAID array every finish syncing? It may take a long time
(1048576 megs / 10 mb/sec / 3600 seconds/hr = 29 hours!) ...
I have one (SCSI) system with a slower CPU, syncing the array I was sure
something was wrong. Once the array was up, though, everything has
worked great. I just have to be sure it shuts down clean or the rebuild
is painful!
Maybe I'm way off, but I'd just give it a day to see if it eventually
syncs. If you can login, do a cat of /proc/mdstat every 15 minutes, if
it's making progress I'd leave it.
Matt
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID5 causing lockups
2003-06-25 20:36 ` Matt Simonsen
@ 2003-06-25 20:56 ` Corey McGuire
[not found] ` <1056575536.24919.101.camel@mattswrk>
0 siblings, 1 reply; 15+ messages in thread
From: Corey McGuire @ 2003-06-25 20:56 UTC (permalink / raw)
To: linux-raid
drives are silent. No activity at all.
Maybe i'll give it a try tonight again. I can give it a bunch more
horsepower (have it underclocked because the system is basicly a big disk)
maybe if I have more available CPU, the arrays will be a bit more busy if
they really are rebuilding.
still, the problem causes top to stop. I wouldn't think that would be a
problem if the array was just rebuilding...
*********** REPLY SEPARATOR ***********
On 6/25/2003 at 1:36 PM Matt Simonsen wrote:
>On Wed, 2003-06-25 at 12:16, Corey McGuire wrote:
>> Hey folks,
>>
>> I just upgraded my system from a ~200GB mirror to a ~1TB RAID5, but all
>has
>> not transitioned well.
>
>
>Did the RAID array every finish syncing? It may take a long time
>(1048576 megs / 10 mb/sec / 3600 seconds/hr = 29 hours!) ...
>
>I have one (SCSI) system with a slower CPU, syncing the array I was sure
>something was wrong. Once the array was up, though, everything has
>worked great. I just have to be sure it shuts down clean or the rebuild
>is painful!
>
>Maybe I'm way off, but I'd just give it a day to see if it eventually
>syncs. If you can login, do a cat of /proc/mdstat every 15 minutes, if
>it's making progress I'd leave it.
>
>Matt
/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
coreyfro@coreyfro.com
http://www.coreyfro.com/
http://stats.distributed.net/rc5-64/psummary.php3?id=196879
ICQ : 3168059
-----BEGIN GEEK CODE BLOCK-----
GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au w+
v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
------END GEEK CODE BLOCK------
Home of Geek Code - http://www.geekcode.com/
The Geek Code Decoder Page - http://www.ebb.org/ungeek//
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2003-06-27 5:47 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-26 3:54 RAID5 causing lockups Corey McGuire
2003-06-26 11:46 ` Mike Black
2003-06-26 13:32 ` Matthew Mitchell
-- strict thread matches above, loose matches on Subject: below --
2003-06-26 17:34 Corey McGuire
2003-06-27 5:02 ` Corey McGuire
2003-06-27 5:32 ` Mike Dresser
2003-06-27 5:47 ` Corey McGuire
2003-06-25 19:16 Corey McGuire
2003-06-25 19:28 ` Mike Dresser
2003-06-25 19:41 ` Corey McGuire
2003-06-25 19:56 ` Mike Dresser
2003-06-25 20:51 ` Corey McGuire
2003-06-25 20:36 ` Matt Simonsen
2003-06-25 20:56 ` Corey McGuire
[not found] ` <1056575536.24919.101.camel@mattswrk>
2003-06-25 21:16 ` Corey McGuire
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).