RAID5 crash and burn

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID5 crash and burn
@ 2004-10-31  4:59 coreyfro
  2004-10-31  5:18 ` Guy
  2004-10-31 19:05 ` Michael Robinton
  0 siblings, 2 replies; 16+ messages in thread
From: coreyfro @ 2004-10-31  4:59 UTC (permalink / raw)
  To: linux-raid

Its that time of the year again.  My biannual RAID5 crash.  Yippie!

I had a drive die yesterday, and, while raid 5 can handle that, the kernel
couldn't handle the swap on that drive going poof.  My system crashed, so
I rebooted, thinking that the system would be able to figure out that the
swap was dead and not to start it.

RAID5 started rebuilding, services started loading, started loading swap,
system crashed again.

Now, my raid is down.  I have tried using mdadm, the old raidtools, and
kicking the machine, but nothing has worked.

Here is all the info I can think to muster, let me know if i need to add
anything else.

Thanks,
Coreyfro

========================================================================
ilneval ~ # cat /proc/version
Linux version 2.6.7-gentoo-r12 (root@livecd) (gcc version 3.3.4 20040623
(Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)) #1 Fri Aug 13 22:04:18
PDT 2004

========================================================================
ilneval ~ # cat /etc/raidtab.bak
# autogenerated /etc/raidtab by YaST2

raiddev /dev/md0
   raid-level       1
   nr-raid-disks    2
   nr-spare-disks   0
   persistent-superblock 1
   chunk-size        4
   device   /dev/hde1
   raid-disk 0
   device   /dev/hdg1
   raid-disk 1

raiddev /dev/md1
   raid-level       1
   nr-raid-disks    2
   nr-spare-disks   0
   persistent-superblock 1
   chunk-size        4
   device   /dev/hda1
   raid-disk 0
   device   /dev/hdc1
   raid-disk 1

raiddev /dev/md3
   raid-level       1
   nr-raid-disks    2
   nr-spare-disks   0
   persistent-superblock 1
   chunk-size        4
   device   /dev/hdi1
   raid-disk 0
   device   /dev/hdk1
   raid-disk 1

raiddev /dev/md2
    raid-level                5
    nr-raid-disks             6
    nr-spare-disks            0
    persistent-superblock     1

   chunk-size                 64
    device                    /dev/hda3
    raid-disk                 0
    device                    /dev/hdc3
    raid-disk                 1
    device                    /dev/hde3
    failed-disk                 2
    device                    /dev/hdg3
    raid-disk                 3
    device                    /dev/hdi3
    raid-disk                 4
    device                    /dev/hdk3
    raid-disk                 5

========================================================================
ilneval ~ # cat /proc/mdstat
Personalities : [raid1] [raid5]
md3 : active raid1 hdk1[1] hdi1[0]
      2562240 blocks [2/2] [UU]

md1 : active raid1 hdc1[1] hda1[0]
      2562240 blocks [2/2] [UU]

md0 : active raid1 hdg1[1]
      2562240 blocks [2/1] [_U]

unused devices: <none>

(Note the lack of /DEV/MD2

========================================================================
ilneval etc # dmesg -c
md: raidstart(pid 1821) used deprecated START_ARRAY ioctl. This will not
be supported beyond 2.6
md: autorun ...
md: considering hde3 ...
md:  adding hde3 ...
md:  adding hdk3 ...
md:  adding hdi3 ...
md:  adding hdg3 ...
md:  adding hdc3 ...
md:  adding hda3 ...
md: created md2
md: bind<hda3>
md: bind<hdc3>
md: bind<hdg3>
md: bind<hdi3>
md: bind<hdk3>
md: bind<hde3>
md: running: <hde3><hdk3><hdi3><hdg3><hdc3><hda3>
md: kicking non-fresh hde3 from array!
md: unbind<hde3>
md: export_rdev(hde3)
md: md2: raid array is not clean -- starting background reconstruction
raid5: device hdk3 operational as raid disk 5
raid5: device hdi3 operational as raid disk 4
raid5: device hdg3 operational as raid disk 3
raid5: device hdc3 operational as raid disk 1
raid5: device hda3 operational as raid disk 0
raid5: cannot start dirty degraded array for md2
RAID5 conf printout:
 --- rd:6 wd:5 fd:1
 disk 0, o:1, dev:hda3
 disk 1, o:1, dev:hdc3
 disk 3, o:1, dev:hdg3
 disk 4, o:1, dev:hdi3
 disk 5, o:1, dev:hdk3
raid5: failed to run raid set md2
md: pers->run() failed ...
md :do_md_run() returned -22
md: md2 stopped.
md: unbind<hdk3>
md: export_rdev(hdk3)
md: unbind<hdi3>
md: export_rdev(hdi3)
md: unbind<hdg3>
md: export_rdev(hdg3)
md: unbind<hdc3>
md: export_rdev(hdc3)
md: unbind<hda3>
md: export_rdev(hda3)
md: ... autorun DONE.

========================================================================

ilneval etc # mdadm --assemble --scan /dev/md2
Segmentation fault

========================================================================



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: RAID5 crash and burn
  2004-10-31  4:59 RAID5 crash and burn coreyfro
@ 2004-10-31  5:18 ` Guy
  2004-10-31  9:59   ` coreyfro
  2004-10-31 19:05 ` Michael Robinton
  1 sibling, 1 reply; 16+ messages in thread
From: Guy @ 2004-10-31  5:18 UTC (permalink / raw)
  To: coreyfro, linux-raid

Normally I would refer you to the man page for mdadm.

--scan requires the config file, I have read that mdadm will crash if you
use --scan with out it.

Try this: 
mdadm --assemble /dev/md2 /dev/hda3 /dev/hdc3 /dev/hde3 /dev/hdi3 /dev/hdk3

or this:
mdadm --assemble /dev/md2 --force /dev/hda3 /dev/hdc3 /dev/hde3 /dev/hdi3
/dev/hdk3

I left out hdg3, since you indicate it is the failed disk.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of coreyfro@coreyfro.com
Sent: Sunday, October 31, 2004 12:59 AM
To: linux-raid@vger.kernel.org
Subject: RAID5 crash and burn

Its that time of the year again.  My biannual RAID5 crash.  Yippie!

I had a drive die yesterday, and, while raid 5 can handle that, the kernel
couldn't handle the swap on that drive going poof.  My system crashed, so
I rebooted, thinking that the system would be able to figure out that the
swap was dead and not to start it.

RAID5 started rebuilding, services started loading, started loading swap,
system crashed again.

Now, my raid is down.  I have tried using mdadm, the old raidtools, and
kicking the machine, but nothing has worked.

Here is all the info I can think to muster, let me know if i need to add
anything else.

Thanks,
Coreyfro

========================================================================
ilneval ~ # cat /proc/version
Linux version 2.6.7-gentoo-r12 (root@livecd) (gcc version 3.3.4 20040623
(Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)) #1 Fri Aug 13 22:04:18
PDT 2004

========================================================================
ilneval ~ # cat /etc/raidtab.bak
# autogenerated /etc/raidtab by YaST2

raiddev /dev/md0
   raid-level       1
   nr-raid-disks    2
   nr-spare-disks   0
   persistent-superblock 1
   chunk-size        4
   device   /dev/hde1
   raid-disk 0
   device   /dev/hdg1
   raid-disk 1

raiddev /dev/md1
   raid-level       1
   nr-raid-disks    2
   nr-spare-disks   0
   persistent-superblock 1
   chunk-size        4
   device   /dev/hda1
   raid-disk 0
   device   /dev/hdc1
   raid-disk 1

raiddev /dev/md3
   raid-level       1
   nr-raid-disks    2
   nr-spare-disks   0
   persistent-superblock 1
   chunk-size        4
   device   /dev/hdi1
   raid-disk 0
   device   /dev/hdk1
   raid-disk 1

raiddev /dev/md2
    raid-level                5
    nr-raid-disks             6
    nr-spare-disks            0
    persistent-superblock     1

   chunk-size                 64
    device                    /dev/hda3
    raid-disk                 0
    device                    /dev/hdc3
    raid-disk                 1
    device                    /dev/hde3
    failed-disk                 2
    device                    /dev/hdg3
    raid-disk                 3
    device                    /dev/hdi3
    raid-disk                 4
    device                    /dev/hdk3
    raid-disk                 5

========================================================================
ilneval ~ # cat /proc/mdstat
Personalities : [raid1] [raid5]
md3 : active raid1 hdk1[1] hdi1[0]
      2562240 blocks [2/2] [UU]

md1 : active raid1 hdc1[1] hda1[0]
      2562240 blocks [2/2] [UU]

md0 : active raid1 hdg1[1]
      2562240 blocks [2/1] [_U]

unused devices: <none>

(Note the lack of /DEV/MD2

========================================================================
ilneval etc # dmesg -c
md: raidstart(pid 1821) used deprecated START_ARRAY ioctl. This will not
be supported beyond 2.6
md: autorun ...
md: considering hde3 ...
md:  adding hde3 ...
md:  adding hdk3 ...
md:  adding hdi3 ...
md:  adding hdg3 ...
md:  adding hdc3 ...
md:  adding hda3 ...
md: created md2
md: bind<hda3>
md: bind<hdc3>
md: bind<hdg3>
md: bind<hdi3>
md: bind<hdk3>
md: bind<hde3>
md: running: <hde3><hdk3><hdi3><hdg3><hdc3><hda3>
md: kicking non-fresh hde3 from array!
md: unbind<hde3>
md: export_rdev(hde3)
md: md2: raid array is not clean -- starting background reconstruction
raid5: device hdk3 operational as raid disk 5
raid5: device hdi3 operational as raid disk 4
raid5: device hdg3 operational as raid disk 3
raid5: device hdc3 operational as raid disk 1
raid5: device hda3 operational as raid disk 0
raid5: cannot start dirty degraded array for md2
RAID5 conf printout:
 --- rd:6 wd:5 fd:1
 disk 0, o:1, dev:hda3
 disk 1, o:1, dev:hdc3
 disk 3, o:1, dev:hdg3
 disk 4, o:1, dev:hdi3
 disk 5, o:1, dev:hdk3
raid5: failed to run raid set md2
md: pers->run() failed ...
md :do_md_run() returned -22
md: md2 stopped.
md: unbind<hdk3>
md: export_rdev(hdk3)
md: unbind<hdi3>
md: export_rdev(hdi3)
md: unbind<hdg3>
md: export_rdev(hdg3)
md: unbind<hdc3>
md: export_rdev(hdc3)
md: unbind<hda3>
md: export_rdev(hda3)
md: ... autorun DONE.

========================================================================

ilneval etc # mdadm --assemble --scan /dev/md2
Segmentation fault

========================================================================


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: RAID5 crash and burn
  2004-10-31  5:18 ` Guy
@ 2004-10-31  9:59   ` coreyfro
  2004-10-31 11:26     ` Robin Bowes
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: coreyfro @ 2004-10-31  9:59 UTC (permalink / raw)
  To: linux-raid

Ahhhhhh... doesn't use the raidtab... nothing needs raidtab anymore... i
guess its time i got with the program...

About swap failing, would there be much of a performence hit if i mirrored
swap?  I don't like running without it, and I don't want to repeat this
incident...  My system has more than enough ram for the load it has, but I
under stand the other reasons for having swap, so slow swap is better than
nothing or faulty, i spose...

Looks like fsck is working, thanks for the help...

> Normally I would refer you to the man page for mdadm.
>
> --scan requires the config file, I have read that mdadm will crash if you
> use --scan with out it.
>
> Try this:
> mdadm --assemble /dev/md2 /dev/hda3 /dev/hdc3 /dev/hde3 /dev/hdi3
> /dev/hdk3
>
> or this:
> mdadm --assemble /dev/md2 --force /dev/hda3 /dev/hdc3 /dev/hde3 /dev/hdi3
> /dev/hdk3
>
> I left out hdg3, since you indicate it is the failed disk.
>
> Guy
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of
> coreyfro@coreyfro.com
> Sent: Sunday, October 31, 2004 12:59 AM
> To: linux-raid@vger.kernel.org
> Subject: RAID5 crash and burn
>
> Its that time of the year again.  My biannual RAID5 crash.  Yippie!
>
> I had a drive die yesterday, and, while raid 5 can handle that, the kernel
> couldn't handle the swap on that drive going poof.  My system crashed, so
> I rebooted, thinking that the system would be able to figure out that the
> swap was dead and not to start it.
>
> RAID5 started rebuilding, services started loading, started loading swap,
> system crashed again.
>
> Now, my raid is down.  I have tried using mdadm, the old raidtools, and
> kicking the machine, but nothing has worked.
>
> Here is all the info I can think to muster, let me know if i need to add
> anything else.
>
> Thanks,
> Coreyfro
>
> ========================================================================
> ilneval ~ # cat /proc/version
> Linux version 2.6.7-gentoo-r12 (root@livecd) (gcc version 3.3.4 20040623
> (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)) #1 Fri Aug 13 22:04:18
> PDT 2004
>
> ========================================================================
> ilneval ~ # cat /etc/raidtab.bak
> # autogenerated /etc/raidtab by YaST2
>
> raiddev /dev/md0
>    raid-level       1
>    nr-raid-disks    2
>    nr-spare-disks   0
>    persistent-superblock 1
>    chunk-size        4
>    device   /dev/hde1
>    raid-disk 0
>    device   /dev/hdg1
>    raid-disk 1
>
> raiddev /dev/md1
>    raid-level       1
>    nr-raid-disks    2
>    nr-spare-disks   0
>    persistent-superblock 1
>    chunk-size        4
>    device   /dev/hda1
>    raid-disk 0
>    device   /dev/hdc1
>    raid-disk 1
>
> raiddev /dev/md3
>    raid-level       1
>    nr-raid-disks    2
>    nr-spare-disks   0
>    persistent-superblock 1
>    chunk-size        4
>    device   /dev/hdi1
>    raid-disk 0
>    device   /dev/hdk1
>    raid-disk 1
>
> raiddev /dev/md2
>     raid-level                5
>     nr-raid-disks             6
>     nr-spare-disks            0
>     persistent-superblock     1
>
>    chunk-size                 64
>     device                    /dev/hda3
>     raid-disk                 0
>     device                    /dev/hdc3
>     raid-disk                 1
>     device                    /dev/hde3
>     failed-disk                 2
>     device                    /dev/hdg3
>     raid-disk                 3
>     device                    /dev/hdi3
>     raid-disk                 4
>     device                    /dev/hdk3
>     raid-disk                 5
>
> ========================================================================
> ilneval ~ # cat /proc/mdstat
> Personalities : [raid1] [raid5]
> md3 : active raid1 hdk1[1] hdi1[0]
>       2562240 blocks [2/2] [UU]
>
> md1 : active raid1 hdc1[1] hda1[0]
>       2562240 blocks [2/2] [UU]
>
> md0 : active raid1 hdg1[1]
>       2562240 blocks [2/1] [_U]
>
> unused devices: <none>
>
> (Note the lack of /DEV/MD2
>
> ========================================================================
> ilneval etc # dmesg -c
> md: raidstart(pid 1821) used deprecated START_ARRAY ioctl. This will not
> be supported beyond 2.6
> md: autorun ...
> md: considering hde3 ...
> md:  adding hde3 ...
> md:  adding hdk3 ...
> md:  adding hdi3 ...
> md:  adding hdg3 ...
> md:  adding hdc3 ...
> md:  adding hda3 ...
> md: created md2
> md: bind<hda3>
> md: bind<hdc3>
> md: bind<hdg3>
> md: bind<hdi3>
> md: bind<hdk3>
> md: bind<hde3>
> md: running: <hde3><hdk3><hdi3><hdg3><hdc3><hda3>
> md: kicking non-fresh hde3 from array!
> md: unbind<hde3>
> md: export_rdev(hde3)
> md: md2: raid array is not clean -- starting background reconstruction
> raid5: device hdk3 operational as raid disk 5
> raid5: device hdi3 operational as raid disk 4
> raid5: device hdg3 operational as raid disk 3
> raid5: device hdc3 operational as raid disk 1
> raid5: device hda3 operational as raid disk 0
> raid5: cannot start dirty degraded array for md2
> RAID5 conf printout:
>  --- rd:6 wd:5 fd:1
>  disk 0, o:1, dev:hda3
>  disk 1, o:1, dev:hdc3
>  disk 3, o:1, dev:hdg3
>  disk 4, o:1, dev:hdi3
>  disk 5, o:1, dev:hdk3
> raid5: failed to run raid set md2
> md: pers->run() failed ...
> md :do_md_run() returned -22
> md: md2 stopped.
> md: unbind<hdk3>
> md: export_rdev(hdk3)
> md: unbind<hdi3>
> md: export_rdev(hdi3)
> md: unbind<hdg3>
> md: export_rdev(hdg3)
> md: unbind<hdc3>
> md: export_rdev(hdc3)
> md: unbind<hda3>
> md: export_rdev(hda3)
> md: ... autorun DONE.
>
> ========================================================================
>
> ilneval etc # mdadm --assemble --scan /dev/md2
> Segmentation fault
>
> ========================================================================
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 crash and burn
  2004-10-31  9:59   ` coreyfro
@ 2004-10-31 11:26     ` Robin Bowes
  2004-10-31 17:19       ` coreyfro
  2004-10-31 13:15     ` Guy
  2004-10-31 13:54     ` Nathan Dietsch
  2 siblings, 1 reply; 16+ messages in thread
From: Robin Bowes @ 2004-10-31 11:26 UTC (permalink / raw)
  To: coreyfro; +Cc: linux-raid

coreyfro@coreyfro.com wrote:

> About swap failing, would there be much of a performence hit if i mirrored
> swap?  I don't like running without it, and I don't want to repeat this
> incident...  My system has more than enough ram for the load it has, but I
> under stand the other reasons for having swap, so slow swap is better than
> nothing or faulty, i spose...

What other reasons are these (genuine question - I'm curious)

I've got a box that is in a similar situation - loads of RAM, never 
anywhere near memory filling up. Currently I've not got swap activated 
but my disk architecture was created to allow one or two 1.5GB mirrored 
partitions for swap.

R.
-- 
http://robinbowes.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 crash and burn
  2004-10-31 11:26     ` Robin Bowes
@ 2004-10-31 17:19       ` coreyfro
  0 siblings, 0 replies; 16+ messages in thread
From: coreyfro @ 2004-10-31 17:19 UTC (permalink / raw)
  To: linux-raid

Hey everyone, I'll just get these all in one e-mail

First off, my disks had the following config each

2GB partition for mirrors (boot, backup, and upgrade mirrors)
256MB swap paritions
198GB RAID5 paritions

so, i had six swap partitons for a total of 1.5GB + .5GB of RAM

This is way more than I need, and now that I think about it, a bad idea. 
Swap was MAJORLY fast, but who cares, I rarely used it.

Because I used them all, I will be facing a slowdown, even with RAID1's
distributed read, but I don't really care...  I plan on Mirroring it three
ways, so I have two 256MB swap mirrors.  Overcompensating, I know.

What I was more concerned about was if people did it and it could be done.

> What other reasons are these (genuine question - I'm curious)

The other reasons for swap are these.

1. Linux caches all HDD activity.  Allowing Linux to use RAM for cache and
HDD for swap can actually speed up system with lots of RAM.  Now, linux
may never actualy swap anything if you have WAY too much (is there such a
thing?) RAM, but the point is still valid.
2. Linux isn't very good at memory clean up.  Unused pages in RAM may
never get flushed, or if they do, it requires actual processing (if forget
which) while unused pages in Swap just get forgotten.  This may lead to
guy's 100% CPU utilization, I don't know if the persons swap was full or
buggy or what, but it sounds like the arguement for using swap.  So, in
short, swap can be used as "/dev/null" for unused pages.
3. Disk be cheap.

There were more, but I forget

There was a big debate on the Linux Kernel forum between the "Use" and the
"Use nots" and the "Use" people made more good arguments, which the key
argument was "It can't hurt" (which isn't 100% true, but, its around
97.523423333_% true.)

========================================================================
For your reading pleasure

Linux: Is Swap Necessary?
http://kerneltrap.org/node/view/3202

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: RAID5 crash and burn
  2004-10-31  9:59   ` coreyfro
  2004-10-31 11:26     ` Robin Bowes
@ 2004-10-31 13:15     ` Guy
  2004-10-31 13:54     ` Nathan Dietsch
  2 siblings, 0 replies; 16+ messages in thread
From: Guy @ 2004-10-31 13:15 UTC (permalink / raw)
  To: coreyfro, linux-raid

How often do you swap?  Maybe never, if never, what performance problem?
Most people have more memory than needed these days, so little or no
swapping.

Performance problem....  No idea, writing may be slower, reading would be
faster, since you can read from 2 disks at once.

I don't want my system to crash, so I mirror swap.

If you are really worried, create little swap partitions on every disk, and
mirror them.  You have 6 disks (or more), you could have 3 mirrored swap
partitions.  This is what I do on large Unix systems (HP-UX).  This way if
it does swap it has 10 or more swap partitions to use, which allows it to
swap ten times faster.

With HP-UX you must have swap space.  1 reason, anytime shared memory is
allocated, the swap space is reserved, even if it is never used.  Seems
silly, I had an 8Gig system which never used even 4 gig, I needed about 2
gig of swap space that was not written to.  As far as I know, Linux does not
require swap space unless you want to exceed available memory.  But I never
risk it, I have swap.

A swap story....
I once had a system that the users said was so slow they almost could not
type.  I knew they were over reacting.  It took me about 10 minutes to
login.  It was so slow the login was timing out before it asked for my
passwd.  I saw it was using on 10-20% of the CPU.  But the boot disk was at
100% usage, swapping.  It could not use more CPU because every process was
waiting to swap in some code.  I created little 128Meg partitions on every
disk I could use.  Maybe 6 to 10 of them.  Each time I added 1 of them to
swap, the system got faster.  I gave the new swap partitions priority 0 so
the new swap partitions would be favored over the default one.  By the time
I was done the CPU load was at 90% or more, and the users were happy.  We
did add ram soon after that.  My emergency swap partitions were not
mirrored, with HP-UX you must buy the mirror software.  That sucks!

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of coreyfro@coreyfro.com
Sent: Sunday, October 31, 2004 5:00 AM
To: linux-raid@vger.kernel.org
Subject: RE: RAID5 crash and burn

Ahhhhhh... doesn't use the raidtab... nothing needs raidtab anymore... i
guess its time i got with the program...

About swap failing, would there be much of a performence hit if i mirrored
swap?  I don't like running without it, and I don't want to repeat this
incident...  My system has more than enough ram for the load it has, but I
under stand the other reasons for having swap, so slow swap is better than
nothing or faulty, i spose...

Looks like fsck is working, thanks for the help...

> Normally I would refer you to the man page for mdadm.
>
> --scan requires the config file, I have read that mdadm will crash if you
> use --scan with out it.
>
> Try this:
> mdadm --assemble /dev/md2 /dev/hda3 /dev/hdc3 /dev/hde3 /dev/hdi3
> /dev/hdk3
>
> or this:
> mdadm --assemble /dev/md2 --force /dev/hda3 /dev/hdc3 /dev/hde3 /dev/hdi3
> /dev/hdk3
>
> I left out hdg3, since you indicate it is the failed disk.
>
> Guy
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of
> coreyfro@coreyfro.com
> Sent: Sunday, October 31, 2004 12:59 AM
> To: linux-raid@vger.kernel.org
> Subject: RAID5 crash and burn
>
> Its that time of the year again.  My biannual RAID5 crash.  Yippie!
>
> I had a drive die yesterday, and, while raid 5 can handle that, the kernel
> couldn't handle the swap on that drive going poof.  My system crashed, so
> I rebooted, thinking that the system would be able to figure out that the
> swap was dead and not to start it.
>
> RAID5 started rebuilding, services started loading, started loading swap,
> system crashed again.
>
> Now, my raid is down.  I have tried using mdadm, the old raidtools, and
> kicking the machine, but nothing has worked.
>
> Here is all the info I can think to muster, let me know if i need to add
> anything else.
>
> Thanks,
> Coreyfro
>
> ========================================================================
> ilneval ~ # cat /proc/version
> Linux version 2.6.7-gentoo-r12 (root@livecd) (gcc version 3.3.4 20040623
> (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)) #1 Fri Aug 13 22:04:18
> PDT 2004
>
> ========================================================================
> ilneval ~ # cat /etc/raidtab.bak
> # autogenerated /etc/raidtab by YaST2
>
> raiddev /dev/md0
>    raid-level       1
>    nr-raid-disks    2
>    nr-spare-disks   0
>    persistent-superblock 1
>    chunk-size        4
>    device   /dev/hde1
>    raid-disk 0
>    device   /dev/hdg1
>    raid-disk 1
>
> raiddev /dev/md1
>    raid-level       1
>    nr-raid-disks    2
>    nr-spare-disks   0
>    persistent-superblock 1
>    chunk-size        4
>    device   /dev/hda1
>    raid-disk 0
>    device   /dev/hdc1
>    raid-disk 1
>
> raiddev /dev/md3
>    raid-level       1
>    nr-raid-disks    2
>    nr-spare-disks   0
>    persistent-superblock 1
>    chunk-size        4
>    device   /dev/hdi1
>    raid-disk 0
>    device   /dev/hdk1
>    raid-disk 1
>
> raiddev /dev/md2
>     raid-level                5
>     nr-raid-disks             6
>     nr-spare-disks            0
>     persistent-superblock     1
>
>    chunk-size                 64
>     device                    /dev/hda3
>     raid-disk                 0
>     device                    /dev/hdc3
>     raid-disk                 1
>     device                    /dev/hde3
>     failed-disk                 2
>     device                    /dev/hdg3
>     raid-disk                 3
>     device                    /dev/hdi3
>     raid-disk                 4
>     device                    /dev/hdk3
>     raid-disk                 5
>
> ========================================================================
> ilneval ~ # cat /proc/mdstat
> Personalities : [raid1] [raid5]
> md3 : active raid1 hdk1[1] hdi1[0]
>       2562240 blocks [2/2] [UU]
>
> md1 : active raid1 hdc1[1] hda1[0]
>       2562240 blocks [2/2] [UU]
>
> md0 : active raid1 hdg1[1]
>       2562240 blocks [2/1] [_U]
>
> unused devices: <none>
>
> (Note the lack of /DEV/MD2
>
> ========================================================================
> ilneval etc # dmesg -c
> md: raidstart(pid 1821) used deprecated START_ARRAY ioctl. This will not
> be supported beyond 2.6
> md: autorun ...
> md: considering hde3 ...
> md:  adding hde3 ...
> md:  adding hdk3 ...
> md:  adding hdi3 ...
> md:  adding hdg3 ...
> md:  adding hdc3 ...
> md:  adding hda3 ...
> md: created md2
> md: bind<hda3>
> md: bind<hdc3>
> md: bind<hdg3>
> md: bind<hdi3>
> md: bind<hdk3>
> md: bind<hde3>
> md: running: <hde3><hdk3><hdi3><hdg3><hdc3><hda3>
> md: kicking non-fresh hde3 from array!
> md: unbind<hde3>
> md: export_rdev(hde3)
> md: md2: raid array is not clean -- starting background reconstruction
> raid5: device hdk3 operational as raid disk 5
> raid5: device hdi3 operational as raid disk 4
> raid5: device hdg3 operational as raid disk 3
> raid5: device hdc3 operational as raid disk 1
> raid5: device hda3 operational as raid disk 0
> raid5: cannot start dirty degraded array for md2
> RAID5 conf printout:
>  --- rd:6 wd:5 fd:1
>  disk 0, o:1, dev:hda3
>  disk 1, o:1, dev:hdc3
>  disk 3, o:1, dev:hdg3
>  disk 4, o:1, dev:hdi3
>  disk 5, o:1, dev:hdk3
> raid5: failed to run raid set md2
> md: pers->run() failed ...
> md :do_md_run() returned -22
> md: md2 stopped.
> md: unbind<hdk3>
> md: export_rdev(hdk3)
> md: unbind<hdi3>
> md: export_rdev(hdi3)
> md: unbind<hdg3>
> md: export_rdev(hdg3)
> md: unbind<hdc3>
> md: export_rdev(hdc3)
> md: unbind<hda3>
> md: export_rdev(hda3)
> md: ... autorun DONE.
>
> ========================================================================
>
> ilneval etc # mdadm --assemble --scan /dev/md2
> Segmentation fault
>
> ========================================================================
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 crash and burn
  2004-10-31  9:59   ` coreyfro
  2004-10-31 11:26     ` Robin Bowes
  2004-10-31 13:15     ` Guy
@ 2004-10-31 13:54     ` Nathan Dietsch
  2004-10-31 19:41       ` Mark Hahn
  2 siblings, 1 reply; 16+ messages in thread
From: Nathan Dietsch @ 2004-10-31 13:54 UTC (permalink / raw)
  To: coreyfro, linux-raid

Hello Corey,

coreyfro@coreyfro.com wrote:

>Ahhhhhh... doesn't use the raidtab... nothing needs raidtab anymore... i
>guess its time i got with the program...
>
>About swap failing, would there be much of a performence hit if i mirrored
>swap?  
>  
>
 From what I read earlier in the thread, you already have a swap 
partition on software RAID5, so how would going to RAID1 hurt performance.?
With swap on RAID5 -- unless you are paging very heavily (enough for a 
full-stripe write) -- it seems you would go into Read-Modify-Write 
anyway and this will hurt ... badly.

Mirroring swap is a good idea to avoid crashes as Guy suggests.
If you are seriously considering the performance implications of RAID1 
vs RAID5 for swap, you are already done for performance wise.

Personally I would not consider software RAID5 suitable for tasks which 
require performance. An exception would be for heavy I/O that causes 
full-stripe writes, in which case you would probably get a win out of 
the extra disks.

I hope this helps.

Kind Regards,

Nathan Dietsch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 crash and burn
  2004-10-31 13:54     ` Nathan Dietsch
@ 2004-10-31 19:41       ` Mark Hahn
  2004-11-01  4:26         ` Nathan Dietsch
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Hahn @ 2004-10-31 19:41 UTC (permalink / raw)
  To: linux-raid

> If you are seriously considering the performance implications of RAID1 
> vs RAID5 for swap, you are already done for performance wise.

I disagree.  the whole point of Linux's approach to swap is that 
it's fairly cheap to write a page to swap.  whether you ever need 
it again depends on how overcommitted you are.  this is assuming that
the kernel successfully chooses not-likley-to-be-reused pages to 
write to swap, and that your swap partitions are reasonably fast 
to write to.  this approach works very well, excepting heuristic problems
in certain kernel versions.

in other words, reading from swap is permitted to be expensive,
since any disk read is always assumed to be slow.  but writing to 
swap really should be fairly cheap, and this is a premise of the 
kernel's swap policies.

I would not use raid5 swap for this reason; raid1 is not insane,
since you don't need *that* much swap space, so the 50% overhead 
is not crippling.  (don't even think about things like raid 10
for swap - the kernel already has zero-overhead swap striping.)

regards, mark hahn.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 crash and burn
  2004-10-31 19:41       ` Mark Hahn
@ 2004-11-01  4:26         ` Nathan Dietsch
  2004-11-01  5:29           ` Mark Hahn
  0 siblings, 1 reply; 16+ messages in thread
From: Nathan Dietsch @ 2004-11-01  4:26 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-raid

Mark Hahn wrote:

>>If you are seriously considering the performance implications of RAID1 
>>vs RAID5 for swap, you are already done for performance wise.
>>    
>>
>
>I disagree.  the whole point of Linux's approach to swap is that 
>it's fairly cheap to write a page to swap.
>
Writing to any form of secondary storage is not "cheap" when compared to 
memory.
Writing to a swap RAID5 volume, where you will probably incur a 
read-modify-write operation is not considered "cheap" either.

>whether you ever need 
>it again depends on how overcommitted you are.  this is assuming that
>the kernel successfully chooses not-likley-to-be-reused pages to 
>write to swap, and that your swap partitions are reasonably fast 
>to write to. 
>
This is what I am getting at: Any partition can not be considered 
reasonably fast to write to when compared to memory.

>this approach works very well, excepting heuristic problems
>in certain kernel versions.
>
>in other words, reading from swap is permitted to be expensive,
>since any disk read is always assumed to be slow.  
>
Agreed.

>but writing to 
>swap really should be fairly cheap, and this is a premise of the 
>kernel's swap policies.
>  
>
Sorry, I still don't buy the premise that writing to disk can be 
considered cheap.

>I would not use raid5 swap for this reason; raid1 is not insane,
>since you don't need *that* much swap space, so the 50% overhead 
>is not crippling.  (don't even think about things like raid 10
>for swap - the kernel already has zero-overhead swap striping.)
>  
>
I would agree with that.

What I was trying to say was that if you are forced to go to swap, for a 
page out or a page in, you are going to incur overhead which is 
incredibly inefficient compared to memory.
Putting swap on RAID5, where you incur even more overhead is worse.

Putting swap on RAID1, as Guy pointed out, avoids crashes. So for the 
one time that you will go to swap, it is worse the extra bit of overhead 
to write it to a second disk to avoid losing the machine in the event of 
a disk failure.

Simply: Looking for performance improvements in swap means that you are 
already very, very overloaded OR it is simply an exercise in theory.

Kind Regards,

Nathan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 crash and burn
  2004-11-01  4:26         ` Nathan Dietsch
@ 2004-11-01  5:29           ` Mark Hahn
  2004-11-01  6:05             ` Nathan Dietsch
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Hahn @ 2004-11-01  5:29 UTC (permalink / raw)
  To: linux-raid

> >>If you are seriously considering the performance implications of RAID1 
> >>vs RAID5 for swap, you are already done for performance wise.
> >>    
> >>
> >
> >I disagree.  the whole point of Linux's approach to swap is that 
> >it's fairly cheap to write a page to swap.
> >
> Writing to any form of secondary storage is not "cheap" when compared to 
> memory.

nonsense - from the cpu and kernel perspective, firing off 
a write of a page really is normally cheap.  more expensive
than doing nothing, well, yeah.

> Writing to a swap RAID5 volume, where you will probably incur a 
> read-modify-write operation is not considered "cheap" either.

hence my message.

> >whether you ever need 
> >it again depends on how overcommitted you are.  this is assuming that
> >the kernel successfully chooses not-likley-to-be-reused pages to 
> >write to swap, and that your swap partitions are reasonably fast 
> >to write to. 
> >
> This is what I am getting at: Any partition can not be considered 
> reasonably fast to write to when compared to memory.

perhaps you missed the point because I was sloppy with "fast".
what I meant was "low overhead".  that is, forcing a page to 
swap is largely a fire-and-forget kind of operation.  yes, you 
sometimes have to handle an interrupt, but it's not expensive,
and doesn't slow you down.  you're not holding up progress until
the write completes.  the overhead is largely book-keeping.

> Sorry, I still don't buy the premise that writing to disk can be 
> considered cheap.

do some benchmarking.

> Simply: Looking for performance improvements in swap means that you are 
> already very, very overloaded OR it is simply an exercise in theory.

no.  current linux VM logic uses swap as a way of optimizing page 
usage; handling overcommit is useful, even essential, but not a 
normal function of swap, and not relevant to most machines.

let me put it a completely other way: swap outs are normal, fairly
frequent, and GOOD for performance.  swap *IN*s are unusual, quite
infrequent, and bad for performance.  you're in trouble if you see
more than a trivial number of swap-ins, but even a lot of swap-outs
is not a problem.  (obviously, you don't want swapouts to starve your 
explicit/non-swap IO needs.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 crash and burn
  2004-11-01  5:29           ` Mark Hahn
@ 2004-11-01  6:05             ` Nathan Dietsch
  2004-11-01  6:32               ` Mark Hahn
  0 siblings, 1 reply; 16+ messages in thread
From: Nathan Dietsch @ 2004-11-01  6:05 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-raid

Hello Mark,

Mark Hahn wrote:

>>>>If you are seriously considering the performance implications of RAID1 
>>>>vs RAID5 for swap, you are already done for performance wise.
>>>>   
>>>>
>>>>        
>>>>
>>>I disagree.  the whole point of Linux's approach to swap is that 
>>>it's fairly cheap to write a page to swap.
>>>
>>>      
>>>
>>Writing to any form of secondary storage is not "cheap" when compared to 
>>memory.
>>    
>>
>
>nonsense - from the cpu and kernel perspective, firing off 
>a write of a page really is normally cheap.  more expensive
>than doing nothing, well, yeah.
>  
>
Nonsense ??? I said swapping to disk is not cheap when compared to memory.

 From Understanding the Linux Kernel "Every access by a program to a 
page that is swapped out increases the process execution time by an 
order of magnitude. In short, if performance is of great importance, 
swapping should be considered as a last resort." (p. 529)
 
I agree that the write is cheap compared to the read (where you will 
stall), but what is the purpose of writing something to swap if you do 
not plan to read it again?

The most important part of my original post was that analysing the 
performance characteristics of RAID1 swap vs RAID5 swap is avoiding the 
point that if you are looking at swap for performance gains, then you 
are missing the many other places that could improve performance.
If you plan on swapping for tasks which require performance[0], then you 
have badly sized the system.

>  
>
>>Writing to a swap RAID5 volume, where you will probably incur a 
>>read-modify-write operation is not considered "cheap" either.
>>    
>>
>
>hence my message.
>  
>
We agree.

>  
>
>>>whether you ever need 
>>>it again depends on how overcommitted you are.  this is assuming that
>>>the kernel successfully chooses not-likley-to-be-reused pages to 
>>>write to swap, and that your swap partitions are reasonably fast 
>>>to write to. 
>>>
>>>      
>>>
>>This is what I am getting at: Any partition can not be considered 
>>reasonably fast to write to when compared to memory.
>>    
>>
>
>perhaps you missed the point because I was sloppy with "fast".
>what I meant was "low overhead".  that is, forcing a page to 
>swap is largely a fire-and-forget kind of operation.  yes, you 
>sometimes have to handle an interrupt, but it's not expensive,
>and doesn't slow you down.  you're not holding up progress until
>the write completes.  the overhead is largely book-keeping.
>  
>
Interesting point.

>  
>
>>Sorry, I still don't buy the premise that writing to disk can be 
>>considered cheap.
>>    
>>
>
>do some benchmarking.
>  
>
I will do that.

>  
>
>>Simply: Looking for performance improvements in swap means that you are 
>>already very, very overloaded OR it is simply an exercise in theory.
>>    
>>
>
>no.  current linux VM logic uses swap as a way of optimizing page 
>usage; handling overcommit is useful, even essential, but not a 
>normal function of swap, and not relevant to most machines.
>
>let me put it a completely other way: swap outs are normal, fairly
>frequent, and GOOD for performance. 
>
Please explain how swapping out a page is good for performance.
I understand that unused pages are paged out, but how does this improve 
performance beyond making more room for the FS cache?

> swap *IN*s are unusual, quite
>infrequent, and bad for performance.  you're in trouble if you see
>more than a trivial number of swap-ins, but even a lot of swap-outs
>is not a problem.  (obviously, you don't want swapouts to starve your 
>explicit/non-swap IO needs.)
>  
>
I agree with you 100% on that.
I am not going to argue with you on Linux swap internals, you obviously 
understand them quite well and my knowledge is more from the Solaris 
side of the fence.

I am simply saying that comparing RAID5 and RAID1 for the purposes of 
swapping is irrelevant, because if you are relying on the performance of 
your swap device, you are missing the point that you have run out of 
memory.

Regards,

Nathan

[0] Some longer running tasks (like batch jobs), that do not have 
performance limitations can afford to swap.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 crash and burn
  2004-11-01  6:05             ` Nathan Dietsch
@ 2004-11-01  6:32               ` Mark Hahn
  2004-11-01  7:01                 ` Guy
  2004-11-01  8:58                 ` Nathan Dietsch
  0 siblings, 2 replies; 16+ messages in thread
From: Mark Hahn @ 2004-11-01  6:32 UTC (permalink / raw)
  To: Nathan Dietsch; +Cc: linux-raid

>  From Understanding the Linux Kernel "Every access by a program to a 
> page that is swapped out increases the process execution time by an 
> order of magnitude. In short, if performance is of great importance, 
> swapping should be considered as a last resort." (p. 529)

not relevant: we're talking about what it costs to perform
a swapout, not what happens if the wrong page is chosen for 
swapout, and later necessitates a swapin.

> I agree that the write is cheap compared to the read (where you will 
> stall), but what is the purpose of writing something to swap if you do 
> not plan to read it again?

to free the underlying physical page for other use.

> >let me put it a completely other way: swap outs are normal, fairly
> >frequent, and GOOD for performance. 
> >
> Please explain how swapping out a page is good for performance.

swapouts mean that the kernel doesn't need to dedicate a page of 
physical memory to keep track of some stale page.  swapping out 
active pages would be mistake, obviously.  the kernel tries to 
avoid that mistake in any OS.

> I understand that unused pages are paged out, but how does this improve 
> performance beyond making more room for the FS cache?

a swapout frees a physical page, period.  whether it's used for 
the FS cache or not is irrelevant.  perhaps your model for how 
the system works involves uniform references to all pages?
if that were true, then indeed, swapouts would be dubious.
since references are highly skewed, there are some pages which
are not effective investments of physical ram, so to speak,
but which can't be thrown away (because they're dirty and/or anonymous).

> I am simply saying that comparing RAID5 and RAID1 for the purposes of 
> swapping is irrelevant, because if you are relying on the performance of 
> your swap device, you are missing the point that you have run out of 
> memory.

which is either vapid or wrong.  yes, it's silly to worry about 
how fast your system thrashes.  but no one is talking about non-fatal error
conditions like thrashing.  and there *is* a good reason to choose 
r1 over r5 for swap, since writes are so much lower-overhead.  worrying
about the efficiency of r1 vs r5 reads verges on the "optimizing 
for thrashing" that you're so worried about.

> [0] Some longer running tasks (like batch jobs), that do not have 
> performance limitations can afford to swap.

interesting way to think of it, since you're assuming that systems 
will always have as much memory as they can possibly use.  all real
systems have "memory pressure", which leads the VM to attempt top
optimize which needs consume the limited, inadequate number of physical 
pages.  swapouts are nothing more than the kernel trying to find a 
better (hotter) use for a page.  pure capacity swapouts aren't really
interesting, since there's really nothing you can do about them.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: RAID5 crash and burn
  2004-11-01  6:32               ` Mark Hahn
@ 2004-11-01  7:01                 ` Guy
  2004-11-01  9:01                   ` Nathan Dietsch
  2004-11-01  8:58                 ` Nathan Dietsch
  1 sibling, 1 reply; 16+ messages in thread
From: Guy @ 2004-11-01  7:01 UTC (permalink / raw)
  To: 'Mark Hahn', 'Nathan Dietsch'; +Cc: linux-raid

You said:
"swapouts mean that the kernel doesn't need to dedicate a page of 
physical memory to keep track of some stale page.  swapping out 
active pages would be mistake, obviously.  the kernel tries to 
avoid that mistake in any OS."

Any OS!  MS NT 2000 and XP favor disk cache over all running programs.  If
you leave your computer during a virus scan, or disk backup, when you come
back everything has been swapped, all windows are very slow to respond.  My
system has 1 Gig of ram.  With 300-400Meg used, the disk cache can grow to
700-600 Meg.  But it goes well beyond that!  Well, maybe you don't consider
NT, 2000 or XP as operating systems!  :)

Back to swapping.  I can't think how swapping could be better than having
enough memory.

In my example, I did not have enough memory.  The system was very very slow.
But CPU usage was only 10-20% since everything runnable was sitting out on
disk.  Once I added enough swap spindles, the system was able to swap fast
enough to utilize most of the CPU, which was about 90% or more.  This made
the users happy, it was like a new computer.  But if I was able to add more
memory the system would have been even faster.  The memory was added at some
point.  And swapping stopped, or at least most of it stopped.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mark Hahn
Sent: Monday, November 01, 2004 1:33 AM
To: Nathan Dietsch
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID5 crash and burn

>  From Understanding the Linux Kernel "Every access by a program to a 
> page that is swapped out increases the process execution time by an 
> order of magnitude. In short, if performance is of great importance, 
> swapping should be considered as a last resort." (p. 529)

not relevant: we're talking about what it costs to perform
a swapout, not what happens if the wrong page is chosen for 
swapout, and later necessitates a swapin.

> I agree that the write is cheap compared to the read (where you will 
> stall), but what is the purpose of writing something to swap if you do 
> not plan to read it again?

to free the underlying physical page for other use.

> >let me put it a completely other way: swap outs are normal, fairly
> >frequent, and GOOD for performance. 
> >
> Please explain how swapping out a page is good for performance.

swapouts mean that the kernel doesn't need to dedicate a page of 
physical memory to keep track of some stale page.  swapping out 
active pages would be mistake, obviously.  the kernel tries to 
avoid that mistake in any OS.

> I understand that unused pages are paged out, but how does this improve 
> performance beyond making more room for the FS cache?

a swapout frees a physical page, period.  whether it's used for 
the FS cache or not is irrelevant.  perhaps your model for how 
the system works involves uniform references to all pages?
if that were true, then indeed, swapouts would be dubious.
since references are highly skewed, there are some pages which
are not effective investments of physical ram, so to speak,
but which can't be thrown away (because they're dirty and/or anonymous).

> I am simply saying that comparing RAID5 and RAID1 for the purposes of 
> swapping is irrelevant, because if you are relying on the performance of 
> your swap device, you are missing the point that you have run out of 
> memory.

which is either vapid or wrong.  yes, it's silly to worry about 
how fast your system thrashes.  but no one is talking about non-fatal error
conditions like thrashing.  and there *is* a good reason to choose 
r1 over r5 for swap, since writes are so much lower-overhead.  worrying
about the efficiency of r1 vs r5 reads verges on the "optimizing 
for thrashing" that you're so worried about.

> [0] Some longer running tasks (like batch jobs), that do not have 
> performance limitations can afford to swap.

interesting way to think of it, since you're assuming that systems 
will always have as much memory as they can possibly use.  all real
systems have "memory pressure", which leads the VM to attempt top
optimize which needs consume the limited, inadequate number of physical 
pages.  swapouts are nothing more than the kernel trying to find a 
better (hotter) use for a page.  pure capacity swapouts aren't really
interesting, since there's really nothing you can do about them.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 crash and burn
  2004-11-01  7:01                 ` Guy
@ 2004-11-01  9:01                   ` Nathan Dietsch
  0 siblings, 0 replies; 16+ messages in thread
From: Nathan Dietsch @ 2004-11-01  9:01 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

Hello Guy,

Guy wrote:

>Back to swapping.  I can't think how swapping could be better than having
>enough memory.
>  
>
Now here is a guy[0] that knows what I am talking about.

Mark: This is exactly what I was getting at.

Regards,

Nathan

[0] Pun intended

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 crash and burn
  2004-11-01  6:32               ` Mark Hahn
  2004-11-01  7:01                 ` Guy
@ 2004-11-01  8:58                 ` Nathan Dietsch
  1 sibling, 0 replies; 16+ messages in thread
From: Nathan Dietsch @ 2004-11-01  8:58 UTC (permalink / raw)
  To: Mark Hahn, linux-raid

Mark,

Mark Hahn wrote:

>> From Understanding the Linux Kernel "Every access by a program to a 
>>page that is swapped out increases the process execution time by an 
>>order of magnitude. In short, if performance is of great importance, 
>>swapping should be considered as a last resort." (p. 529)
>>    
>>
>
>not relevant: we're talking about what it costs to perform
>a swapout, not what happens if the wrong page is chosen for 
>swapout, and later necessitates a swapin.
>  
>
You have convinced me that swapping out is low-overhead in Linux.

>  
>
>>I agree that the write is cheap compared to the read (where you will 
>>stall), but what is the purpose of writing something to swap if you do 
>>not plan to read it again?
>>    
>>
>
>to free the underlying physical page for other use.
>  
>
Unused memory is wasted memory, hence the reason for FS caches.
If you have such a squeeze on memory that you are swapping for a more 
active page using LRU algorithms, you do not have enough memory for the 
task at hand.

>>>let me put it a completely other way: swap outs are normal, fairly
>>>frequent, and GOOD for performance. 
>>>
>>>      
>>>
>>Please explain how swapping out a page is good for performance.
>>    
>>
>
>swapouts mean that the kernel doesn't need to dedicate a page of 
>physical memory to keep track of some stale page.  swapping out 
>active pages would be mistake, obviously.  the kernel tries to 
>avoid that mistake in any OS.
>  
>
Agreed, but yet again it comes back to not having enough memory for 
*performance*.

>  
>
>>I understand that unused pages are paged out, but how does this improve 
>>performance beyond making more room for the FS cache?
>>    
>>
>
>a swapout frees a physical page, period.  whether it's used for 
>the FS cache or not is irrelevant.  perhaps your model for how 
>the system works involves uniform references to all pages?
>if that were true, then indeed, swapouts would be dubious.
>since references are highly skewed, there are some pages which
>are not effective investments of physical ram, so to speak,
>but which can't be thrown away (because they're dirty and/or anonymous).
>
>  
>
>>I am simply saying that comparing RAID5 and RAID1 for the purposes of 
>>swapping is irrelevant, because if you are relying on the performance of 
>>your swap device, you are missing the point that you have run out of 
>>memory.
>>    
>>
>
>which is either vapid or wrong.  yes, it's silly to worry about 
>how fast your system thrashes.  but no one is talking about non-fatal error
>conditions like thrashing.  and there *is* a good reason to choose 
>r1 over r5 for swap, since writes are so much lower-overhead.  worrying
>about the efficiency of r1 vs r5 reads verges on the "optimizing 
>for thrashing" that you're so worried about.
>  
>
That is what I was getting at.

I think this is going in circles, but it has made me see swapping in 
another light.

Regards,

Nathan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 crash and burn
  2004-10-31  4:59 RAID5 crash and burn coreyfro
  2004-10-31  5:18 ` Guy
@ 2004-10-31 19:05 ` Michael Robinton
  1 sibling, 0 replies; 16+ messages in thread
From: Michael Robinton @ 2004-10-31 19:05 UTC (permalink / raw)
  To: coreyfro; +Cc: linux-raid



On Sat, 30 Oct 2004 coreyfro@coreyfro.com wrote:

> Its that time of the year again.  My biannual RAID5 crash.  Yippie!
>
> I had a drive die yesterday, and, while raid 5 can handle that, the kernel
> couldn't handle the swap on that drive going poof.  My system crashed, so
> I rebooted, thinking that the system would be able to figure out that the
> swap was dead and not to start it.
>
> RAID5 started rebuilding, services started loading, started loading swap,
> system crashed again.
>
> Now, my raid is down.  I have tried using mdadm, the old raidtools, and
> kicking the machine, but nothing has worked.
>
> Here is all the info I can think to muster, let me know if i need to add
> anything else.
>
> Thanks,
> Coreyfro
>
You need to add an additional raid1 partition the size of your swap and
use that so you don't have the same situation arise again.

my raid5 disks all have an extra "work" partition the size of swap (3
drives) the spares are allocated

2 - mdN for swap
1 - maintenance

if you have more drives in your raid, you can allocate additional pairs
for swap and cut down on the partition size overall.

Michael


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-11-01  9:01 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-31  4:59 RAID5 crash and burn coreyfro
2004-10-31  5:18 ` Guy
2004-10-31  9:59   ` coreyfro
2004-10-31 11:26     ` Robin Bowes
2004-10-31 17:19       ` coreyfro
2004-10-31 13:15     ` Guy
2004-10-31 13:54     ` Nathan Dietsch
2004-10-31 19:41       ` Mark Hahn
2004-11-01  4:26         ` Nathan Dietsch
2004-11-01  5:29           ` Mark Hahn
2004-11-01  6:05             ` Nathan Dietsch
2004-11-01  6:32               ` Mark Hahn
2004-11-01  7:01                 ` Guy
2004-11-01  9:01                   ` Nathan Dietsch
2004-11-01  8:58                 ` Nathan Dietsch
2004-10-31 19:05 ` Michael Robinton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).