* Help: very slow software RAID 5. @ 2007-09-18 23:09 Dean S. Messing 2007-09-19 0:05 ` Justin Piszcz 0 siblings, 1 reply; 44+ messages in thread From: Dean S. Messing @ 2007-09-18 23:09 UTC (permalink / raw) To: linux-raid I'm not getting nearly the read speed I expected from a newly defined software RAID 5 array across three disk partitions (on the 3 drives, of course!). Would someone kindly point me straight? After defining the RAID 5 I did `hdparm -t /dev/md0' and got the abysmal read speed of ~65MB/sec. The individual device speeds are ~55, ~70, and ~75 MB/sec. Shouldn't this array be running (at the slowest) at about 55+70 = 125 MB/sec minus some overhead? I defined a RAID0 on the ~55 and ~70 partitions and got about 110 MB/sec. Shouldn't adding a 3rd (faster!) drive into the array make the RAID 5 speed at least this fast? Here are the details of my setup: Linux Fedora 7, kernel 2.6.22. # fdisk -l /dev/sda Disk /dev/sda: 160.0 GB, 160000000000 bytes 255 heads, 63 sectors/track, 19452 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 127 1020096 82 Linux swap / Solaris /dev/sda2 * 128 143 128520 83 Linux /dev/sda3 144 19452 155099542+ fd Linux raid autodetect # fdisk -l /dev/sdb Disk /dev/sdb: 160.0 GB, 160000000000 bytes 255 heads, 63 sectors/track, 19452 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 * 1 127 1020096 82 Linux swap / Solaris /dev/sdb2 128 143 128520 83 Linux /dev/sdb3 144 19452 155099542+ fd Linux raid autodetect # fdisk -l /dev/sdc Disk /dev/sdc: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdc1 * 1 127 1020096 82 Linux swap / Solaris /dev/sdc2 128 19436 155099542+ fd Linux raid autodetect /dev/sdc3 19437 60801 332264362+ 8e Linux LVM The RAID 5 consists of sda3, sdb3, and sdc2. These partitions have these individual read speeds: # hdparm -t /dev/sda3 /dev/sdb3 /dev/sdc2 /dev/sda3: Timing buffered disk reads: 168 MB in 3.03 seconds = 55.39 MB/sec /dev/sdb3: Timing buffered disk reads: 216 MB in 3.03 seconds = 71.35 MB/sec /dev/sdc2: Timing buffered disk reads: 228 MB in 3.02 seconds = 75.49 MB/sec After defining RAID 5 with: mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sda3 /dev/sdb3 /dev/sdc2 and waiting the 50 minutes for /proc/mdstat to show it was finished, I did `hdparm -t /dev/md0' and got ~65MB/sec. Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-18 23:09 Help: very slow software RAID 5 Dean S. Messing @ 2007-09-19 0:05 ` Justin Piszcz 2007-09-19 1:49 ` Dean S. Messing 0 siblings, 1 reply; 44+ messages in thread From: Justin Piszcz @ 2007-09-19 0:05 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid On Tue, 18 Sep 2007, Dean S. Messing wrote: > > > I'm not getting nearly the read speed I expected > from a newly defined software RAID 5 array > across three disk partitions (on the 3 drives, > of course!). > > Would someone kindly point me straight? > > After defining the RAID 5 I did `hdparm -t /dev/md0' > and got the abysmal read speed of ~65MB/sec. > The individual device speeds are ~55, ~70, > and ~75 MB/sec. > > Shouldn't this array be running (at the slowest) > at about 55+70 = 125 MB/sec minus some overhead? > I defined a RAID0 on the ~55 and ~70 partitions > and got about 110 MB/sec. > > Shouldn't adding a 3rd (faster!) drive into the > array make the RAID 5 speed at least this fast? > > > Here are the details of my setup: > > Linux Fedora 7, kernel 2.6.22. > > # fdisk -l /dev/sda > > Disk /dev/sda: 160.0 GB, 160000000000 bytes > 255 heads, 63 sectors/track, 19452 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/sda1 1 127 1020096 82 Linux swap / Solaris > /dev/sda2 * 128 143 128520 83 Linux > /dev/sda3 144 19452 155099542+ fd Linux raid autodetect > > > # fdisk -l /dev/sdb > > Disk /dev/sdb: 160.0 GB, 160000000000 bytes > 255 heads, 63 sectors/track, 19452 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/sdb1 * 1 127 1020096 82 Linux swap / Solaris > /dev/sdb2 128 143 128520 83 Linux > /dev/sdb3 144 19452 155099542+ fd Linux raid autodetect > > > > # fdisk -l /dev/sdc > > Disk /dev/sdc: 500.1 GB, 500107862016 bytes > 255 heads, 63 sectors/track, 60801 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/sdc1 * 1 127 1020096 82 Linux swap / Solaris > /dev/sdc2 128 19436 155099542+ fd Linux raid autodetect > /dev/sdc3 19437 60801 332264362+ 8e Linux LVM > > > The RAID 5 consists of sda3, sdb3, and sdc2. > These partitions have these individual read speeds: > > # hdparm -t /dev/sda3 /dev/sdb3 /dev/sdc2 > > /dev/sda3: > Timing buffered disk reads: 168 MB in 3.03 seconds = 55.39 MB/sec > > /dev/sdb3: > Timing buffered disk reads: 216 MB in 3.03 seconds = 71.35 MB/sec > > /dev/sdc2: > Timing buffered disk reads: 228 MB in 3.02 seconds = 75.49 MB/sec > > > After defining RAID 5 with: > > mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sda3 /dev/sdb3 /dev/sdc2 > > and waiting the 50 minutes for /proc/mdstat to show it was finished, > I did `hdparm -t /dev/md0' and got ~65MB/sec. > > Dean > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Without tuning you will get very slow read speed. Read the mailing list, there are about 5-10 tunable options, for me, I get 250 MiB/s no tuning (read/write), after tuning 464 MiB/s write and 622 MiB/s read. Justin. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-19 0:05 ` Justin Piszcz @ 2007-09-19 1:49 ` Dean S. Messing 2007-09-19 8:38 ` Justin Piszcz 0 siblings, 1 reply; 44+ messages in thread From: Dean S. Messing @ 2007-09-19 1:49 UTC (permalink / raw) To: linux-raid Justin Piszcz wrote: On Tue, 18 Sep 2007, Dean S. Messing wrote: : : > : > : > I'm not getting nearly the read speed I expected : > from a newly defined software RAID 5 array : > across three disk partitions (on the 3 drives, : > of course!). : > : > Would someone kindly point me straight? : > : > After defining the RAID 5 I did `hdparm -t /dev/md0' : > and got the abysmal read speed of ~65MB/sec. : > The individual device speeds are ~55, ~70, : > and ~75 MB/sec. : > : > Shouldn't this array be running (at the slowest) : > at about 55+70 = 125 MB/sec minus some overhead? : > I defined a RAID0 on the ~55 and ~70 partitions : > and got about 110 MB/sec. : > : > Shouldn't adding a 3rd (faster!) drive into the : > array make the RAID 5 speed at least this fast? : > : > : > Here are the details of my setup: : > <snip> : Without tuning you will get very slow read speed. : : Read the mailing list, there are about 5-10 tunable options, for me, I get : 250 MiB/s no tuning (read/write), after tuning 464 MiB/s write and 622 : MiB/s read. : : Justin. Thanks Justin. 5-10 tunable options! Good grief. This sounds worse than regulating my 80 year old Grandfather Clock. (I'm quite a n00bie at this.) Are there any nasty system side effects to tuning these parameters? This is not a server I'm working with. It's my research desktop machine. I do lots and lots of different things on it. I started out with RAID 0, but after reading a lot I learned that this is Dangerous. So I bought a 3rd disk to do RAID 5. I intend to put LVM on top of the RAID 5 once I get it running at the speed it is supposed to, and then copy my entire linux system onto it. Are these tuned parameters going to mess other things up? Is there official documentation on this relative to RAID 5? I don't see much online. One thing I did learn is that if I use the "--direct" switch with `hdparm' I get much greater read speed. Goes from ~65 MB/s to ~120 MB/s. I have no idea what --direct does. Yes, I've read the man page. It says that it causes bypassing of "the page cache". Ooookay. Alas using "dd" I find that the "real read speed" is still around ~65 MB/s. Sorry for these musings. I'm just uncomfortable trying to diddle with 5-10 system parameters without knowing what I'm doing. Any help or pointers to documentaion on tuning these for RAID-5 and what the tradeoffs are would be appreciated. Thanks. Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-19 1:49 ` Dean S. Messing @ 2007-09-19 8:38 ` Justin Piszcz 2007-09-19 17:49 ` Dean S. Messing 0 siblings, 1 reply; 44+ messages in thread From: Justin Piszcz @ 2007-09-19 8:38 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid One of the 5-10 tuning settings: blockdev --getra /dev/md0 Try setting it to 4096,8192,16384,32768,65536 blockdev --setra 4096 /dev/md0 Also, with a 3-disk raid5 that is the worst performance you can get using only 3 disks, while with a 10 disk raid5 it'd be closer to 90%. Reads you should get good speeds, but for writes-- probably not. Then re-benchmark. Justin. On Tue, 18 Sep 2007, Dean S. Messing wrote: > > > Justin Piszcz wrote: > On Tue, 18 Sep 2007, Dean S. Messing wrote: > : > : > > : > > : > I'm not getting nearly the read speed I expected > : > from a newly defined software RAID 5 array > : > across three disk partitions (on the 3 drives, > : > of course!). > : > > : > Would someone kindly point me straight? > : > > : > After defining the RAID 5 I did `hdparm -t /dev/md0' > : > and got the abysmal read speed of ~65MB/sec. > : > The individual device speeds are ~55, ~70, > : > and ~75 MB/sec. > : > > : > Shouldn't this array be running (at the slowest) > : > at about 55+70 = 125 MB/sec minus some overhead? > : > I defined a RAID0 on the ~55 and ~70 partitions > : > and got about 110 MB/sec. > : > > : > Shouldn't adding a 3rd (faster!) drive into the > : > array make the RAID 5 speed at least this fast? > : > > : > > : > Here are the details of my setup: > : > > > <snip> > > : Without tuning you will get very slow read speed. > : > : Read the mailing list, there are about 5-10 tunable options, for me, I get > : 250 MiB/s no tuning (read/write), after tuning 464 MiB/s write and 622 > : MiB/s read. > : > : Justin. > > Thanks Justin. > > 5-10 tunable options! Good grief. This sounds worse than regulating > my 80 year old Grandfather Clock. (I'm quite a n00bie at this.) Are > there any nasty system side effects to tuning these parameters? This > is not a server I'm working with. It's my research desktop machine. > I do lots and lots of different things on it. > > I started out with RAID 0, but after reading a lot I learned that this > is Dangerous. So I bought a 3rd disk to do RAID 5. I intend to put > LVM on top of the RAID 5 once I get it running at the speed it is > supposed to, and then copy my entire linux system onto it. Are these > tuned parameters going to mess other things up? > > Is there official documentation on this relative to RAID 5? I don't > see much online. > > One thing I did learn is that if I use the "--direct" switch with > `hdparm' I get much greater read speed. Goes from ~65 MB/s to ~120 MB/s. > > I have no idea what --direct does. Yes, I've read the man page. > It says that it causes bypassing of "the page cache". Ooookay. > > Alas using "dd" I find that the "real read speed" is still around ~65 MB/s. > > Sorry for these musings. I'm just uncomfortable trying to diddle > with 5-10 system parameters without knowing what I'm doing. > > Any help or pointers to documentaion on tuning these for RAID-5 and what > the tradeoffs are would be appreciated. > > Thanks. > Dean > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-19 8:38 ` Justin Piszcz @ 2007-09-19 17:49 ` Dean S. Messing 2007-09-19 18:25 ` Justin Piszcz 2007-09-20 15:33 ` Bill Davidsen 0 siblings, 2 replies; 44+ messages in thread From: Dean S. Messing @ 2007-09-19 17:49 UTC (permalink / raw) To: linux-raid Justin Piszcz wrote: : One of the 5-10 tuning settings: : : blockdev --getra /dev/md0 : : Try setting it to 4096,8192,16384,32768,65536 : : blockdev --setra 4096 /dev/md0 : : I discovered your January correspondence to the list about this. Yes, the read-ahead length makes a dramtic difference---for sequential data reading. However, in doing some further study on this parameter, I see that random access is going to suffer. Since I had intended to build an LVM on this RAID 5 array and put a full linux system on it, I'm not sure that large read-ahead values are a good idea. : Also, with a 3-disk raid5 that is the worst performance you can get using : only 3 disks I don't know why (for reads) I should suffer such bad performance. According to all I've read, the system is not even supposed to read the parity data on reads. So why do I not get near RAID 0 speeds w/o having to increase the Read-ahead value? : , while with a 10 disk raid5 it'd be closer to 90%. Reads you : should get good speeds, but for writes-- probably not. When you say "reads you should get good speeds" are you referring to the aforementioned 10 disk RAID 5 or my 3 disk one? : Then re-benchmark. Large Read-ahead nearly double the speed of 'hdparm -t'. (So does simply using the "--direct" flag. Can you explain this?) Also, in your opinion, is it wise to use such large read-aheads for a RAID 5 intended for the use to which I plan to put it? Aside: I have found RAID quite frustrating. With the original two disks I was getting 120-130 MB/s in RAID 0. I would think that for the investment of a 3rd drive I ought to get the modicum of redundancey I expect and keep the speed (at least on reads) w/o sacrificing anything. But it appears I actually lost something for my investment. I'm back to the speed of single drives with the modicum of redundancey that RAID 5 gives. Not a very good deal. Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-19 17:49 ` Dean S. Messing @ 2007-09-19 18:25 ` Justin Piszcz 2007-09-19 23:31 ` Dean S. Messing 2007-09-20 15:33 ` Bill Davidsen 1 sibling, 1 reply; 44+ messages in thread From: Justin Piszcz @ 2007-09-19 18:25 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid On Wed, 19 Sep 2007, Dean S. Messing wrote: > Justin Piszcz wrote: > : One of the 5-10 tuning settings: > : > : blockdev --getra /dev/md0 > : > : Try setting it to 4096,8192,16384,32768,65536 > : > : blockdev --setra 4096 /dev/md0 > : > : > > I discovered your January correspondence to the list about this. Yes, > the read-ahead length makes a dramtic difference---for sequential data > reading. However, in doing some further study on this parameter, I > see that random access is going to suffer. Since I had intended > to build an LVM on this RAID 5 array and put a full linux system on > it, I'm not sure that large read-ahead values are a good idea. > > : Also, with a 3-disk raid5 that is the worst performance you can get using > : only 3 disks > > I don't know why (for reads) I should suffer such bad performance. > According to all I've read, the system is not even supposed to read > the parity data on reads. So why do I not get near RAID 0 speeds w/o > having to increase the Read-ahead value? > > : , while with a 10 disk raid5 it'd be closer to 90%. Reads you > : should get good speeds, but for writes-- probably not. > > When you say "reads you should get good speeds" are you referring > to the aforementioned 10 disk RAID 5 or my 3 disk one? > > : Then re-benchmark. > > Large Read-ahead nearly double the speed of 'hdparm -t'. (So does > simply using the "--direct" flag. Can you explain this?) Also, in > your opinion, is it wise to use such large read-aheads for a RAID 5 > intended for the use to which I plan to put it? > > Aside: I have found RAID quite frustrating. With the original two > disks I was getting 120-130 MB/s in RAID 0. I would think that for > the investment of a 3rd drive I ought to get the modicum of > redundancey I expect and keep the speed (at least on reads) w/o > sacrificing anything. But it appears I actually lost something for my > investment. I'm back to the speed of single drives with the modicum > of redundancey that RAID 5 gives. Not a very good deal. > > Dean > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Did you try increasing the readahead and benchmarking? You can set it back to what it was when you are done. Justin. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-19 18:25 ` Justin Piszcz @ 2007-09-19 23:31 ` Dean S. Messing 2007-09-20 8:25 ` Justin Piszcz 2007-09-20 18:16 ` Michal Soltys 0 siblings, 2 replies; 44+ messages in thread From: Dean S. Messing @ 2007-09-19 23:31 UTC (permalink / raw) To: jpiszcz; +Cc: linux-raid Justin Piszcz wrote: : Dean Messing wrote: : > Jusin Piszcz wrote: : > : <snip> : > I discovered your January correspondence to the list about this. Yes, : > the read-ahead length makes a dramtic difference---for sequential data : > reading. However, ... <snip> : > : Then re-benchmark. : > : > Large Read-ahead nearly double the speed of 'hdparm -t'. (So does : > simply using the "--direct" flag. Can you explain this?) Also, in : > your opinion, is it wise to use such large read-aheads for a RAID 5 : > intended for the use to which I plan to put it? : : > Did you try increasing the readahead and benchmarking? You can set it : > back to what it was when you are done. Sorry if my earlier reply was not clear. Yes, I did increase r.a. (to 16384 and 32768) and re-ran hdparm each time. Result: r.a. = 512: hdparm -t --> ~67 MB/s " " " hdparm -t --direct --> ~121 MB/s r.a. = 4096: hdparm -t --> ~106 MB/s r.a. = 8192: hdparm -t --> ~113 MB/s r.a. = 16384: hdparm -t --> ~120 MB/s r.a. = 32768: hdparm -t --> ~121 MB/s As I said: it nearly doubled the speed give by 'hdparm -t' But (as I said) so did using the "--direct" flag, which inhibits "page cache" whatever that is. Also (as I asked) what is the downside? From what I have read, random access reads will take a hit. Is this correct? Thanks very much for your help! Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-19 23:31 ` Dean S. Messing @ 2007-09-20 8:25 ` Justin Piszcz 2007-09-20 18:16 ` Michal Soltys 1 sibling, 0 replies; 44+ messages in thread From: Justin Piszcz @ 2007-09-20 8:25 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid On Wed, 19 Sep 2007, Dean S. Messing wrote: > > Justin Piszcz wrote: > : Dean Messing wrote: > : > Jusin Piszcz wrote: > : > : > <snip> > > : > I discovered your January correspondence to the list about this. Yes, > : > the read-ahead length makes a dramtic difference---for sequential data > : > reading. However, ... > > <snip> > > : > : Then re-benchmark. > : > > : > Large Read-ahead nearly double the speed of 'hdparm -t'. (So does > : > simply using the "--direct" flag. Can you explain this?) Also, in > : > your opinion, is it wise to use such large read-aheads for a RAID 5 > : > intended for the use to which I plan to put it? > : > > : > Did you try increasing the readahead and benchmarking? You can set it > : > back to what it was when you are done. > > Sorry if my earlier reply was not clear. > > Yes, I did increase r.a. (to 16384 and 32768) and re-ran hdparm each time. > Result: > > r.a. = 512: hdparm -t --> ~67 MB/s > " " " hdparm -t --direct --> ~121 MB/s > > r.a. = 4096: hdparm -t --> ~106 MB/s > > r.a. = 8192: hdparm -t --> ~113 MB/s > > r.a. = 16384: hdparm -t --> ~120 MB/s > > r.a. = 32768: hdparm -t --> ~121 MB/s > > > As I said: it nearly doubled the speed give by 'hdparm -t' > > But (as I said) so did using the "--direct" flag, which inhibits > "page cache" whatever that is. > > Also (as I asked) what is the downside? From what I have read, > random access reads will take a hit. Is this correct? > > Thanks very much for your help! > > Dean > > Hmm good question, in all of my benchmarking I made sure the majority of the benchmarks increased in overall speed with bonnie++ I did not notice any degradation myself-- although perhaps I was not benchmarking for that workload. Justin. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-19 23:31 ` Dean S. Messing 2007-09-20 8:25 ` Justin Piszcz @ 2007-09-20 18:16 ` Michal Soltys 2007-09-20 19:06 ` Dean S. Messing 1 sibling, 1 reply; 44+ messages in thread From: Michal Soltys @ 2007-09-20 18:16 UTC (permalink / raw) To: linux-raid Dean S. Messing wrote: > > Also (as I asked) what is the downside? From what I have read, random > access reads will take a hit. Is this correct? > > Thanks very much for your help! > > Dean > Besides bonnie++ you should probably check iozone. It will allow you to test very specific settings quite thoroughly. Although with current multi-gigabyte memory systems the test runs may take a bit time. http://www.iozone.org/ There's nice introduction to the progam there, along with some example graph results. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-20 18:16 ` Michal Soltys @ 2007-09-20 19:06 ` Dean S. Messing 0 siblings, 0 replies; 44+ messages in thread From: Dean S. Messing @ 2007-09-20 19:06 UTC (permalink / raw) To: linux-raid Michal Soltys writes: : Dean S. Messing wrote: : > : > Also (as I asked) what is the downside? From what I have read, random : > access reads will take a hit. Is this correct? : > : > Thanks very much for your help! : > : > Dean : > : : Besides bonnie++ you should probably check iozone. It will allow you to test : very specific settings quite thoroughly. Although with current : multi-gigabyte memory systems the test runs may take a bit time. : : http://www.iozone.org/ : : There's nice introduction to the progam there, along with some example graph : results. Thanks very much, Michal. I'll have a look. Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-19 17:49 ` Dean S. Messing 2007-09-19 18:25 ` Justin Piszcz @ 2007-09-20 15:33 ` Bill Davidsen 2007-09-20 18:47 ` Dean S. Messing 1 sibling, 1 reply; 44+ messages in thread From: Bill Davidsen @ 2007-09-20 15:33 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid Dean S. Messing wrote: > Justin Piszcz wrote: > : One of the 5-10 tuning settings: > : > : blockdev --getra /dev/md0 > : > : Try setting it to 4096,8192,16384,32768,65536 > : > : blockdev --setra 4096 /dev/md0 > : > : > > I discovered your January correspondence to the list about this. Yes, > the read-ahead length makes a dramtic difference---for sequential data > reading. However, in doing some further study on this parameter, I > see that random access is going to suffer. Since I had intended > to build an LVM on this RAID 5 array and put a full linux system on > it, I'm not sure that large read-ahead values are a good idea. > > : Also, with a 3-disk raid5 that is the worst performance you can get using > : only 3 disks > > I don't know why (for reads) I should suffer such bad performance. > According to all I've read, the system is not even supposed to read > the parity data on reads. So why do I not get near RAID 0 speeds w/o > having to increase the Read-ahead value? > > : , while with a 10 disk raid5 it'd be closer to 90%. Reads you > : should get good speeds, but for writes-- probably not. > > When you say "reads you should get good speeds" are you referring > to the aforementioned 10 disk RAID 5 or my 3 disk one? > > : Then re-benchmark. > > Large Read-ahead nearly double the speed of 'hdparm -t'. (So does > simply using the "--direct" flag. Can you explain this?) Also, in > your opinion, is it wise to use such large read-aheads for a RAID 5 > intended for the use to which I plan to put it? > Do you want to tune it to work well now or work well in the final configuration? There is no magic tuning which is best for every use, if there was it would be locked in and you couldn't change it. > Aside: I have found RAID quite frustrating. With the original two > disks I was getting 120-130 MB/s in RAID 0. I would think that for > the investment of a 3rd drive I ought to get the modicum of > redundancey I expect and keep the speed (at least on reads) w/o > sacrificing anything. But it appears I actually lost something for my > investment. I'm back to the speed of single drives with the modicum > of redundancey that RAID 5 gives. Not a very good deal. RAID-5 and RAID-1 performance are popular topic, reading the archives may shed more light on that. After you get to LVM you can do read ahead tuning on individual areas, which will allow you to do faster random access on one part and faster sequential on another. *But* when you run both types of access on the same physical device one or the other will suffer, and with careful tuning both can be slow. When you get to the point where you know exactly what you are going to do and how you are going to do it (layout) you can ask a better question about tuning. PS: adding another drive and going to RAID-10 with "far" configuration will give you speed and reliability, at the cost of capacity. Aren't shoices fun? -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-20 15:33 ` Bill Davidsen @ 2007-09-20 18:47 ` Dean S. Messing 2007-09-20 21:08 ` Michael Tokarev 0 siblings, 1 reply; 44+ messages in thread From: Dean S. Messing @ 2007-09-20 18:47 UTC (permalink / raw) To: linux-raid Bill Davidsen wrote: : Dean S. Messing wrote: > snip : Do you want to tune it to work well now or work well in the final : configuration? There is no magic tuning which is best for every use, if : there was it would be locked in and you couldn't change it. I want it to work well in the final config. I'm just now learning about benchmarking with tools like bonnie++. In my naivety, I thought that `hdparm -t' "told all", at least for reads. : > Aside: I have found RAID quite frustrating. With the original two : > disks I was getting 120-130 MB/s in RAID 0. I would think that for : > the investment of a 3rd drive I ought to get the modicum of : > redundancey I expect and keep the speed (at least on reads) w/o : > sacrificing anything. But it appears I actually lost something for my : > investment. I'm back to the speed of single drives with the modicum : > of redundancey that RAID 5 gives. Not a very good deal. : RAID-5 and RAID-1 performance are popular topic, reading the archives : may shed more light on that. So I'm seeing. I just finished wading through a long April 07 discussion on "write-though" vs. "write-back" for RAID 5. : After you get to LVM you can do read ahead : tuning on individual areas, which will allow you to do faster random : access on one part and faster sequential on another. *But* when you run : both types of access on the same physical device one or the other will : suffer, and with careful tuning both can be slow. This is why simply bumping up the read ahead parameter as has been suggested to me seems suspect. If this was the right fix, it seems that it would be getting set automatically by the default installation of mdadm. : When you get to the point where you know exactly what you are going to : do and how you are going to do it (layout) you can ask a better question : about tuning. Well (in my extreme naivete) I had hoped that I could (just) -- buy the extra SATA drive, -- configure RAID 5 on all three drives, -- have it present itself as a single device with the speed of RAID 0 (on two drives), and the safety net of RAID 5, -- install Fedora 7 on the array, -- use LVM to partition as I liked, -- and forget about it. Instead, this has turned into a many hour exercise in futility. This is my research machine (for signal/image processing) and the machine I "live on". It does many different things. What I really need is more disk speed (but I can't afford very high speed drives). That's what attracted me to RAID 0 --- which seems to have no downside EXCEPT safety :-). So I'm not sure I'll ever figure out "the right" tuning. I'm at the point of abandoning RAID entirely and just putting the three disks together as a big LV and being done with it. (I don't have quite the moxy to define a RAID 0 array underneath it. :-) : PS: adding another drive and going to RAID-10 with "far" configuration : will give you speed and reliability, at the cost of capacity. Aren't : shoices fun? I don't know hat "far" configuration is, though I understand basically what RAID-10 is. Having that much "wasted" space is too costly, and besides the machine can't take but three drives internally. If I wished to add a 4th I'd need to buy a SATA controller. I had thought RAID 5 did exactly what I wanted. Unfortunately ... Which suggests a question for you, David. If I were to invest in a "true hardware" RAID SATA controller (is there such a thing) would RAID 5 across the three drives behave just like RAID 0 on two drives + 1 disk redundancy? In other words just abandon Linux software raid? At this point I would be willing to spring for such a card if it were not too expensive, and if I could find a slot to put it in. (My system is slot-challenged). Thanks for you remarks, David. I wish I had the time to learn how to do all this properly with multiple LV's, different read-aheads and write-through/write-back settings on different logical devices, but my head is swimming and my time is short. Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-20 18:47 ` Dean S. Messing @ 2007-09-20 21:08 ` Michael Tokarev 2007-09-21 0:58 ` Dean S. Messing 0 siblings, 1 reply; 44+ messages in thread From: Michael Tokarev @ 2007-09-20 21:08 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid Dean S. Messing wrote: [] > [] That's what > attracted me to RAID 0 --- which seems to have no downside EXCEPT > safety :-). > > So I'm not sure I'll ever figure out "the right" tuning. I'm at the > point of abandoning RAID entirely and just putting the three disks > together as a big LV and being done with it. (I don't have quite the > moxy to define a RAID 0 array underneath it. :-) "Putting three disks together as a big LV" - that's exactly what "linear" md module. It's almost as unsafe as raid0, but with linear read/write speed equal to speed of single drive... Note also that the more drives you add to raid0-like config, the more chances of failure you'll have - because raid0 fails when ANY drive fails. Ditto - for certain extent - for linear md module and for "one big LV" which is basically the same thing. By the way, before abandoming "R" in "RAID", I'd check whenever the resulting speed with raid5 (after at least read-ahead tuning) is acceptable, and use that if yes. If no, maybe raid10 over the same 3 drives will give better results. /mjt ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-20 21:08 ` Michael Tokarev @ 2007-09-21 0:58 ` Dean S. Messing 2007-09-21 13:00 ` Bill Davidsen 2007-09-25 9:31 ` Goswin von Brederlow 0 siblings, 2 replies; 44+ messages in thread From: Dean S. Messing @ 2007-09-21 0:58 UTC (permalink / raw) To: linux-raid Michael Tokarev writes: : Dean S. Messing wrote: : [] : > [] That's what : > attracted me to RAID 0 --- which seems to have no downside EXCEPT : > safety :-). : > : > So I'm not sure I'll ever figure out "the right" tuning. I'm at the : > point of abandoning RAID entirely and just putting the three disks : > together as a big LV and being done with it. (I don't have quite the : > moxy to define a RAID 0 array underneath it. :-) : : "Putting three disks together as a big LV" - that's exactly what : "linear" md module. : It's almost as unsafe as raid0, but with : linear read/write speed equal to speed of single drive... I understand I only get the speed of a single drive was I was not aware of the safety factor. I had intended to use snapshotting off to a cheap USB drive each evening. Will that not keep me safe within a day's worth of data change? I only learned about "snapshots" yesterday. I'm utterly new to the disk array/LVM game. For that matter why not run a RAID-0 + LVM across two of the three drives and snapshot to the third? : Note also that the more drives you add to raid0-like config, : the more chances of failure you'll have - because raid0 fails : when ANY drive fails. Ditto - for certain extent - for linear : md module and for "one big LV" which is basically the same thing. I understand the probability increases for additional drives. : By the way, before abandoming "R" in "RAID", I'd check whenever : the resulting speed with raid5 (after at least read-ahead tuning) : is acceptable, and use that if yes. My problem is not quite knowing what "acceptable" is. I bought a Dell Precision 490 with two relatively fast SATA II drives. With RAID 0 I attain speeds of nearly 140 MB/s (using 2 drives) for reads and writes and the system is very snappy for everything, from processing 4Kx2K video to building a 'locate' datebase, to searching my very large mail archives for technical info. When I see the speed loss of software RAID 5 (writes are at 55MB/s and random reads are at 54 MB/s) for everything but seq. reads (and that only if I increase read-ahead from 512 to 16384 to get read speeds of about 110 MB/s I lose heart, esp. since I don't know the other consequences of increasing read-ahead by so much. : If no, maybe raid10 over : the same 3 drives will give better results. Does RAID10 work on three drives? I though one needed 4 drives, with striping across a pair of mirrored pairs. Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-21 0:58 ` Dean S. Messing @ 2007-09-21 13:00 ` Bill Davidsen 2007-09-21 20:01 ` Dean S. Messing 2007-09-21 20:21 ` Dean S. Messing 2007-09-25 9:31 ` Goswin von Brederlow 1 sibling, 2 replies; 44+ messages in thread From: Bill Davidsen @ 2007-09-21 13:00 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid Dean S. Messing wrote: > Michael Tokarev writes: > : Dean S. Messing wrote: > : [] > : > [] That's what > : > attracted me to RAID 0 --- which seems to have no downside EXCEPT > : > safety :-). > : > > : > So I'm not sure I'll ever figure out "the right" tuning. I'm at the > : > point of abandoning RAID entirely and just putting the three disks > : > together as a big LV and being done with it. (I don't have quite the > : > moxy to define a RAID 0 array underneath it. :-) > : > : "Putting three disks together as a big LV" - that's exactly what > : "linear" md module. > : It's almost as unsafe as raid0, but with > : linear read/write speed equal to speed of single drive... > > I understand I only get the speed of a single drive was I was not > aware of the safety factor. I had intended to use snapshotting off > to a cheap USB drive each evening. Will that not keep me safe within a > day's worth of data change? I only learned about "snapshots" yesterday. > I'm utterly new to the disk array/LVM game. > But your read speed need not be limited if you tune the readahead. There's also the question of how much transfer speed you actually *need*. If your application is CPU-bound faster will not be the same as "runs in less time," and random access is limited by the seek speed of your drives, although some RAID tuning does apply to random writes. > For that matter why not run a RAID-0 + LVM across two of the three drives > and snapshot to the third? > > : Note also that the more drives you add to raid0-like config, > : the more chances of failure you'll have - because raid0 fails > : when ANY drive fails. Ditto - for certain extent - for linear > : md module and for "one big LV" which is basically the same thing. > > I understand the probability increases for additional drives. > > : By the way, before abandoming "R" in "RAID", I'd check whenever > : the resulting speed with raid5 (after at least read-ahead tuning) > : is acceptable, and use that if yes. > > My problem is not quite knowing what "acceptable" is. I bought a Dell > Precision 490 with two relatively fast SATA II drives. With RAID 0 I > attain speeds of nearly 140 MB/s (using 2 drives) for reads and writes > and the system is very snappy for everything, from processing 4Kx2K > video to building a 'locate' datebase, to searching my very large mail > archives for technical info. > When you process video and monitor the system with vmstat, do you see significant iowait time? No, neither do I, with a modest readahead I am totally CPU limited. If you are searching your mail database, if you just use a text tool which reads everything, that's pure sequential access. And unless you actually *use* the locate command, building that database is just a way to beat your disks (and it's more sequential than you would expect). You can turn it off and do your bit to avoid global warming. > When I see the speed loss of software RAID 5 (writes are at 55MB/s and > random reads are at 54 MB/s) for everything but seq. reads (and that > only if I increase read-ahead from 512 to 16384 to get read speeds of > about 110 MB/s I lose heart, esp. since I don't know the other > consequences of increasing read-ahead by so much. > Assuming that your have enough memory, there would be a small slowdown in random reading a lot of small records. You should know what your application would do, but that access is typical of looking things up in a database or processing small records, like a DNS or mail server. Numbers from bonnie or similar benchmarks are nice, but they show details of various performance area, and if you don't match "what you do" to "what works best" you make bad choices. In other words if your application can only read 10MB/s the benchmark is telling you your disk is fast enough to keep up with the CPU. > : If no, maybe raid10 over > : the same 3 drives will give better results. > > Does RAID10 work on three drives? I though one needed 4 drives, > with striping across a pair of mirrored pairs. No, that's 0+1, RAID-10 works across any number of drives. Have you actually take 10-15 minutes to read "man md" and get the overview of how RAID works, or are you reading bits and pieces about individual features? -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-21 13:00 ` Bill Davidsen @ 2007-09-21 20:01 ` Dean S. Messing 2007-09-21 20:21 ` Dean S. Messing 1 sibling, 0 replies; 44+ messages in thread From: Dean S. Messing @ 2007-09-21 20:01 UTC (permalink / raw) To: linux-raid Bill Davidsen wrote: Dean Messing wrote: : > I understand I only get the speed of a single drive was I was not : > aware of the safety factor. I had intended to use snapshotting off : > to a cheap USB drive each evening. Will that not keep me safe within a : > day's worth of data change? I only learned about "snapshots" yesterday. : > I'm utterly new to the disk array/LVM game. : > : : But your read speed need not be limited if you tune the readahead. : There's also the question of how much transfer speed you actually : *need*. If your application is CPU-bound faster will not be the same as : "runs in less time," and random access is limited by the seek speed of : your drives, although some RAID tuning does apply to random writes. This is my main workstation. I do many things on it, not just "an application". I live on the machine. It will run XP on top of VMware. It will transcode video. It will run map software, emacs, mail, mplayer, re-compile large applications like ImageMagick, Transcode, and Scilab, and much more. The video/image processing (whch can be random access) is just an example. Increasing readahead does increase the sequential read speed for RAID5. But the random read speed suffers and the write speed suffers a loss of 20% over single disk writes (according to the bonnie++ numbers). RAID0 on the other machine "spoiled me", I'm afraid. Question: From what I've read, when reading in RAID5, parity is not read. Why then does read speed suffer so badly with the default Readahead parameter? : When you process video and monitor the system with vmstat, do you see : significant iowait time? No, neither do I, with a modest readahead I am : totally CPU limited. If you are searching your mail database, if you : just use a text tool which reads everything, that's pure sequential : access. And unless you actually *use* the locate command, building that : database is just a way to beat your disks (and it's more sequential than : you would expect). You can turn it off and do your bit to avoid global : warming. I have not used vmstat, but I am dealing with 4Kx2K 24bit uncompressed video frames. mplayer, for one, is quite disk i/o limited. So are several spatio-temporal adaptive interpolation algorithms we work on for upscaling video (part of my research). I can see this from the gkrellm disk meter both during video i/o and when swap space is getting used. Funny you should mention "locate". I use it heavily to find stuff. I typically generate results on any of 4 different machines, and then `scp' the results around between the machines. So I rebuild the database relatively often. On another machine that I bought from Dell, configured with RAID0, everything is very snappy. Rebuilding the "locate" db simply flew. It is that uniform snappiness I was hoping (against hope?) to duplicate on this current workstation with a third drive and RAID5. : > : If no, maybe raid10 over : > : the same 3 drives will give better results. : > : > Does RAID10 work on three drives? I though one needed 4 drives, : > with striping across a pair of mirrored pairs. : : No, that's 0+1, RAID-10 works across any number of drives. : : Have you actually take 10-15 minutes to read "man md" and get the : overview of how RAID works, or are you reading bits and pieces about : individual features? I confess I have not read the md man page. (I shall, right after this.) I have read the mdadm page pretty thoroughly. And I've read parts of lots of other stuff in the last few days. It has all uniformly said that RAID-10 is "striped mirrors" and requires 4 drives. One such example (which I just googled) is: <http://www.pcguide.com/ref/hdd/perf/raid/levels/multLevel01-c.html> <http://www.pcguide.com/ref/hdd/perf/raid/levels/multXY-c.html> I suppose Linux Software Raid is more general and allows 3 drives in a RAID-10 config. I'll find out in a few minutes. In my last note I asked: > : > : "Putting three disks together as a big LV" - that's exactly what > : "linear" md module. > : It's almost as unsafe as raid0, but with > : linear read/write speed equal to speed of single drive... > > I understand I only get the speed of a single drive but I was not > aware of the safety factor. I had intended to use snapshotting off > to a cheap USB drive each evening. Will that not keep me safe within a > day's worth of data change? I only learned about "snapshots" yesterday. > I'm utterly new to the disk array/LVM game. <snip> > For that matter why not run a RAID-0 + LVM across two of the three drives > and snapshot to the third? What do you think about the RAID-0 + LVM plus daily snapshots? I am not running a server. In the (fairly remote) chance that I do have a RAID 0 failure, I can tolerate the couple of hours it will take to rebuild the file system and be back up and running (in ordinary non-RAID mode). Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-21 13:00 ` Bill Davidsen 2007-09-21 20:01 ` Dean S. Messing @ 2007-09-21 20:21 ` Dean S. Messing 1 sibling, 0 replies; 44+ messages in thread From: Dean S. Messing @ 2007-09-21 20:21 UTC (permalink / raw) To: linux-raid This is to both Bill Davidsen and Michael Tokarev. I just realised in re-reading previous messages that I badly screwed up my attributions in my just-sent message. I attributed to Bill some technical remarks by Michael. I apologise to both of you! Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-21 0:58 ` Dean S. Messing 2007-09-21 13:00 ` Bill Davidsen @ 2007-09-25 9:31 ` Goswin von Brederlow 2007-09-25 18:16 ` Dean S. Messing 1 sibling, 1 reply; 44+ messages in thread From: Goswin von Brederlow @ 2007-09-25 9:31 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid "Dean S. Messing" <deanm@sharplabs.com> writes: > Michael Tokarev writes: > : Dean S. Messing wrote: > : [] > : > [] That's what > : > attracted me to RAID 0 --- which seems to have no downside EXCEPT > : > safety :-). > : > > : > So I'm not sure I'll ever figure out "the right" tuning. I'm at the > : > point of abandoning RAID entirely and just putting the three disks > : > together as a big LV and being done with it. (I don't have quite the > : > moxy to define a RAID 0 array underneath it. :-) > : > : "Putting three disks together as a big LV" - that's exactly what > : "linear" md module. > : It's almost as unsafe as raid0, but with > : linear read/write speed equal to speed of single drive... > > I understand I only get the speed of a single drive was I was not > aware of the safety factor. I had intended to use snapshotting off > to a cheap USB drive each evening. Will that not keep me safe within a > day's worth of data change? I only learned about "snapshots" yesterday. > I'm utterly new to the disk array/LVM game. > > For that matter why not run a RAID-0 + LVM across two of the three drives > and snapshot to the third? LVM is not the same as LVM. What I mean is that you still have choices left. One thing you have to think about though. An lvm volume group will not start cleanly with a disk missing but you can force it to start anyway. So a lost disk does not mean all data is lost. But it does mean that any logical volume with data on the missing disk will have serious data corruption. Also lvm can do raid0 itself. For each logical volume you create you can specify the number of stripes to use. So I would abandon all thoughts of raid0 and replace them with using lvm. Run one LV with 2 stripes on the first two disks and snapshot on the third. > : Note also that the more drives you add to raid0-like config, > : the more chances of failure you'll have - because raid0 fails > : when ANY drive fails. Ditto - for certain extent - for linear > : md module and for "one big LV" which is basically the same thing. > > I understand the probability increases for additional drives. > > : By the way, before abandoming "R" in "RAID", I'd check whenever > : the resulting speed with raid5 (after at least read-ahead tuning) > : is acceptable, and use that if yes. > > My problem is not quite knowing what "acceptable" is. I bought a Dell > Precision 490 with two relatively fast SATA II drives. With RAID 0 I > attain speeds of nearly 140 MB/s (using 2 drives) for reads and writes > and the system is very snappy for everything, from processing 4Kx2K > video to building a 'locate' datebase, to searching my very large mail > archives for technical info. > > When I see the speed loss of software RAID 5 (writes are at 55MB/s and > random reads are at 54 MB/s) for everything but seq. reads (and that > only if I increase read-ahead from 512 to 16384 to get read speeds of > about 110 MB/s I lose heart, esp. since I don't know the other > consequences of increasing read-ahead by so much. > > : If no, maybe raid10 over > : the same 3 drives will give better results. > > Does RAID10 work on three drives? I though one needed 4 drives, > with striping across a pair of mirrored pairs. I tested Raid10 and with far copies I got the full speed of all disks combined just like a raid0 would for reading and half speed for writing (as it has to write everything twice). I got pretty damn close to the theoretical limit it could get, which was surprising. > Dean MfG Goswin ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-25 9:31 ` Goswin von Brederlow @ 2007-09-25 18:16 ` Dean S. Messing 2007-09-25 21:46 ` Goswin von Brederlow 2007-09-27 22:17 ` Bill Davidsen 0 siblings, 2 replies; 44+ messages in thread From: Dean S. Messing @ 2007-09-25 18:16 UTC (permalink / raw) To: linux-raid; +Cc: brederlo Goswin von Brederlow writes: : Dean S. Messing writes: : > Michael Tokarev writes: : > : Dean S. Messing wrote: : > : [] : > : > [] That's what : > : > attracted me to RAID 0 --- which seems to have no downside EXCEPT : > : > safety :-). : > : > : > : > So I'm not sure I'll ever figure out "the right" tuning. I'm at the : > : > point of abandoning RAID entirely and just putting the three disks : > : > together as a big LV and being done with it. (I don't have quite the : > : > moxy to define a RAID 0 array underneath it. :-) : > : : > : "Putting three disks together as a big LV" - that's exactly what : > : "linear" md module. : > : It's almost as unsafe as raid0, but with : > : linear read/write speed equal to speed of single drive... : > : > I understand I only get the speed of a single drive but I was not : > aware of the safety factor. I had intended to use snapshotting off : > to a cheap USB drive each evening. Will that not keep me safe within a : > day's worth of data change? I only learned about "snapshots" yesterday. : > I'm utterly new to the disk array/LVM game. : > : > For that matter why not run a RAID-0 + LVM across two of the three drives : > and snapshot to the third? : : LVM is not the same as LVM. What I mean is that you still have choices : left. Sorry, Goswin. Even though you gave your meaning, I still don't understand you here. (I must be dense this morning.) What does "LVM is not the same as LVM" mean? : : One thing you have to think about though. An lvm volume group will not : start cleanly with a disk missing but you can force it to start : anyway. So a lost disk does not mean all data is lost. But it does : mean that any logical volume with data on the missing disk will have : serious data corruption. If I am taking daily LVM snapshots will I not be able to reconstruct the file system as of the last snapshot? That's all I require. I have also discovered "smartctl" and have read that if the short smartctl tests are run daily and the long test weekly that the chances of being caught "with my pants down" are quite low, even in a two disk RAID-0 config. What is your opinion? : Also lvm can do raid0 itself. For each logical volume you create you : can specify the number of stripes to use. So I would abandon all : thoughts of raid0 and replace them with using lvm. : : Run one LV with 2 stripes on the first two disks and snapshot on the : third. Good idea. I waw aware of striped LV but did not think it would run nearly as fast as RAID-0. Do you think two LV stripes will equal RAID-0 for all kinds of read/write disk use? There would seem to be lots more than two RAID=0 stripes in the default case. (I do know enough to not run Striped LV with RAID-0 :-) <snip> : : I tested Raid10 and with far copies I got the full speed of all disks : combined just like a raid0 would for reading and half speed for : writing (as it has to write everything twice). I got pretty damn : close to the theoretical limit it could get, which was surprising. Very interesting! On three drives? When you said "half speed for writes", did you mean "half the RAID-0 read speed" or "half the physical device read speed"? I hate the thought of half speed writes. Some of what I do requires more writing than reading---up-conversion of Full HD video to 4Kx2K video, for example. Given your test, I'll run some tests with a three device RAID-10 with "far" copies. But I would really like to know if I'm playing with fire putting my whole system on a RAID-0/non-striped LVM device (or striped LVM device w/o RAID) with daily snapshots, and good smartctl monitoring. Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-25 18:16 ` Dean S. Messing @ 2007-09-25 21:46 ` Goswin von Brederlow 2007-09-25 23:50 ` Dean S. Messing 2007-09-27 22:17 ` Bill Davidsen 1 sibling, 1 reply; 44+ messages in thread From: Goswin von Brederlow @ 2007-09-25 21:46 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid, brederlo "Dean S. Messing" <deanm@sharplabs.com> writes: > Goswin von Brederlow writes: > : Dean S. Messing writes: > : > Michael Tokarev writes: > : > : Dean S. Messing wrote: > : > : [] > : > : > [] That's what > : > : > attracted me to RAID 0 --- which seems to have no downside EXCEPT > : > : > safety :-). > : > : > > : > : > So I'm not sure I'll ever figure out "the right" tuning. I'm at the > : > : > point of abandoning RAID entirely and just putting the three disks > : > : > together as a big LV and being done with it. (I don't have quite the > : > : > moxy to define a RAID 0 array underneath it. :-) > : > : > : > : "Putting three disks together as a big LV" - that's exactly what > : > : "linear" md module. > : > : It's almost as unsafe as raid0, but with > : > : linear read/write speed equal to speed of single drive... > : > > : > I understand I only get the speed of a single drive but I was not > : > aware of the safety factor. I had intended to use snapshotting off > : > to a cheap USB drive each evening. Will that not keep me safe within a > : > day's worth of data change? I only learned about "snapshots" yesterday. > : > I'm utterly new to the disk array/LVM game. > : > > : > For that matter why not run a RAID-0 + LVM across two of the three drives > : > and snapshot to the third? > : > : LVM is not the same as LVM. What I mean is that you still have choices > : left. > > Sorry, Goswin. Even though you gave your meaning, I still don't > understand you here. (I must be dense this morning.) > What does "LVM is not the same as LVM" mean? The ultimate risk and speed of lvm depends on the striping and distribution of LVs accross the disks. > : > : One thing you have to think about though. An lvm volume group will not > : start cleanly with a disk missing but you can force it to start > : anyway. So a lost disk does not mean all data is lost. But it does > : mean that any logical volume with data on the missing disk will have > : serious data corruption. > > If I am taking daily LVM snapshots will I not be able to reconstruct > the file system as of the last snapshot? That's all I require. A snapshot will only hold the differences between creation and now. So Not at all. What you would have to do is have the original on USB and work in a snapshot. But then there is no lvm command to commit a snapshot back to the original device to store the changes. I'm afraid you need to rsync the data to another disk or volume to make a backup. > I have also discovered "smartctl" and have read that if the short smartctl > tests are run daily and the long test weekly that the chances of being > caught "with my pants down" are quite low, even in a two disk RAID-0 > config. What is your opinion? Smart is a big fat lier. I have a disk with failure iminent that's been running for 3 years now. I have disks with 345638756348756 ECC errors. A disk that runs at 130° and gets warmer when I put a fan in front of it. and on and on and on. Smart certainly is no replacement of a backup. Raid5 is also no replacement for a backup. Imagine a faulty cable or a bug in your kernel that writes wrong data to the disks and corrupts your filesystem. The raid will be perfectly fine and could properly construct your data on a disk failure, the broken data. Big help. Raid basicaly only protects you from the downtime of having to restore the backup RIGHT NOW when a disk fails. Buys you the time to wait for the weekend to fix things or whatever. > : Also lvm can do raid0 itself. For each logical volume you create you > : can specify the number of stripes to use. So I would abandon all > : thoughts of raid0 and replace them with using lvm. > : > : Run one LV with 2 stripes on the first two disks and snapshot on the > : third. > > Good idea. I waw aware of striped LV but did not think it would run > nearly as fast as RAID-0. Do you think two LV stripes will equal > RAID-0 for all kinds of read/write disk use? There would seem to be > lots more than two RAID=0 stripes in the default case. (I do know > enough to not run Striped LV with RAID-0 :-) They are conceptually identicall and all else being the same they should behave the same. Beware though that lvm does not set the read ahead correctly (only the default size) while raid will set the read ahead to the sum of the disks read ahead (-1 or -2 disks for raid4/5/6). So by default all is not the same. So set the readahead to the same if you want to compare the two. > <snip> > : > : I tested Raid10 and with far copies I got the full speed of all disks > : combined just like a raid0 would for reading and half speed for > : writing (as it has to write everything twice). I got pretty damn > : close to the theoretical limit it could get, which was surprising. > > Very interesting! On three drives? When you said "half speed for > writes", did you mean "half the RAID-0 read speed" or "half the > physical device read speed"? Both. The raid10 has to physically write every data block twice so the throughput of what you get as user is half of that the hardware has to do. So naturally you only get 50% of the disk/raid0 speed. With raid10 and writing large blocks of data at a time I got like 48% of the combined disks speed on write, meaning they managed 96% of the combine speed. > I hate the thought of half speed writes. Some > of what I do requires more writing than reading---up-conversion of > Full HD video to 4Kx2K video, for example. Given your test, I'll run > some tests with a three device RAID-10 with "far" copies. For writes you get 50% of 300% (slowest) disk speed. So you sill have 150% speed total with raid10. With raid5 you have 200% (slowest) disk speed for continious writes and maybe 50% (slowest) disk speed for fragments. And for reads you get 300% (slowest) disk speed with raid10 but only 200% for raid5. Theoretically that is. But as I said the raid10 tests I did showed it being damn close to theoretical speed. But all such tests are highly dependend on the usage pattern. So don't believe the bonnie output or worse hdparm. Run the application you will be using and see if that gets faster or slower. > But I would really like to know if I'm playing with fire putting my > whole system on a RAID-0/non-striped LVM device (or striped LVM device > w/o RAID) with daily snapshots, and good smartctl monitoring. You are. A disk might fail at any time and the snapshot only protects you from filesystem corruption and/or accidental deletions, not disk failure. Also don't forget that snapshots will slow you down. Every first time a block gets written after the snapshot it first has to read the old block, write it to the snapshot and only then can write the new data. > Dean MfG Goswin - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-25 21:46 ` Goswin von Brederlow @ 2007-09-25 23:50 ` Dean S. Messing 2007-09-26 1:45 ` Goswin von Brederlow 2007-09-27 22:40 ` Help: very slow software RAID 5 Bill Davidsen 0 siblings, 2 replies; 44+ messages in thread From: Dean S. Messing @ 2007-09-25 23:50 UTC (permalink / raw) Cc: linux-raid, brederlo Goswin von Brederlow writes: : Dean Mesing writes: : > Goswin von Brederlow writes: : > : LVM is not the same as LVM. What I mean is that you still have choices : > : left. : > : > Sorry, Goswin. Even though you gave your meaning, I still don't : > understand you here. (I must be dense this morning.) : > What does "LVM is not the same as LVM" mean? : : The ultimate risk and speed of lvm depends on the striping and : distribution of LVs accross the disks. Even w/o LV striping, I don't know enough about LV organisation to try to recover from a serious disk crash. : > : : > : One thing you have to think about though. An lvm volume group will not : > : start cleanly with a disk missing but you can force it to start : > : anyway. So a lost disk does not mean all data is lost. But it does : > : mean that any logical volume with data on the missing disk will have : > : serious data corruption. : > : > If I am taking daily LVM snapshots will I not be able to reconstruct : > the file system as of the last snapshot? That's all I require. : : A snapshot will only hold the differences between creation and now. So : Not at all. : : What you would have to do is have the original on USB and work in a : snapshot. But then there is no lvm command to commit a snapshot back : to the original device to store the changes. : : I'm afraid you need to rsync the data to another disk or volume to : make a backup. Ok. I think I understand. The snapshot could not be used to restore to a master backup of the original since that backup in not in LV format. If I'm using an ext3 filesystem (which I plan to do) would Full and Incremental dumps to a cheap 'n big USB drive (using the dump/restore suite) not work? : > I have also discovered "smartctl" and have read that if the short smartctl : > tests are run daily and the long test weekly that the chances of being : > caught "with my pants down" are quite low, even in a two disk RAID-0 : > config. What is your opinion? : : Smart is a big fat lier. I have a disk with failure iminent that's : been running for 3 years now. I have disks with 345638756348756 ECC : errors. A disk that runs at 130° and gets warmer when I put a fan in : front of it. and on and on and on. Ouch. Ok, I get the picture. : Smart certainly is no replacement of a backup. Raid5 is also no : replacement for a backup. I did not mean to imply I would forego backups. I've been using Unix for too long (26 years) to be that foolish. I simply thought that Smart would allow me to run RAID-0 or striped LV (and do backups!) with reduced risk of having an actual disk failure since I would be able to deal with a weak drive before it failed. Thanks for disabusing me of my fantasy. : Imagine a faulty cable or a bug in your : kernel that writes wrong data to the disks and corrupts your : filesystem. The raid will be perfectly fine and could properly : construct your data on a disk failure, the broken data. Big help. Agreed, though RAID-0 or striped LV won't be able to reconstruct anything upon disk failure. : Raid basicaly only protects you from the downtime of having to restore : the backup RIGHT NOW when a disk fails. Buys you the time to wait for : the weekend to fix things or whatever. Indeed. : > : Also lvm can do raid0 itself. For each logical volume you create you : > : can specify the number of stripes to use. So I would abandon all : > : thoughts of raid0 and replace them with using lvm. : > : : > : Run one LV with 2 stripes on the first two disks and snapshot on the : > : third. : > : > Good idea. I waw aware of striped LV but did not think it would run : > nearly as fast as RAID-0. Do you think two LV stripes will equal : > RAID-0 for all kinds of read/write disk use? There would seem to be : > lots more than two RAID=0 stripes in the default case. (I do know : > enough to not run Striped LV with RAID-0 :-) : : They are conceptually identicall and all else being the same they : should behave the same. I will surely test it. : Beware though that lvm does not set the read ahead correctly (only the : default size) while raid will set the read ahead to the sum of the : disks read ahead (-1 or -2 disks for raid4/5/6). So by default all is : not the same. So set the readahead to the same if you want to compare : the two. Ok, good to know. However I'm not so sure RAID-4,5,6 actually sets the readahead "correctly". My whole dilemma started when I saw how slowly RAID-5 was running on three drives---slower than the physical device speed of two of the three drives. Justin Piszcz suggested tweeking the parameters (in particular, readahead). Indeed, increasing read-ahead did increase seq. read speeds, but at a cost to random reads. And writes were still slow. For RAID-0, everything is faster, which makes the whole system snappy. : > <snip> : > : : > : I tested Raid10 and with far copies I got the full speed of all disks : > : combined just like a raid0 would for reading and half speed for : > : writing (as it has to write everything twice). I got pretty damn : > : close to the theoretical limit it could get, which was surprising. : > : > Very interesting! On three drives? When you said "half speed for : > writes", did you mean "half the RAID-0 read speed" or "half the : > physical device read speed"? : : Both. The raid10 has to physically write every data block twice so the : throughput of what you get as user is half of that the hardware has to : do. So naturally you only get 50% of the disk/raid0 speed. : : With raid10 and writing large blocks of data at a time I got like 48% : of the combined disks speed on write, meaning they managed 96% of the : combine speed. : : : > I hate the thought of half speed writes. Some : > of what I do requires more writing than reading---up-conversion of : > Full HD video to 4Kx2K video, for example. Given your test, I'll run : > some tests with a three device RAID-10 with "far" copies. : : For writes you get 50% of 300% (slowest) disk speed. So you sill have : 150% speed total with raid10. With raid5 you have 200% (slowest) disk : speed for continious writes and maybe 50% (slowest) disk speed for : fragments. I don't get anywhere near 200% disk speed for writes for 3 disk sequential writes in RAID-5. I barely get 100% of the slowest drive in the array. Here are my numbers (again): Three drives: one is 55 MB/s, one is 71 MB/s and one is 75 MB/s. In RAID-5 with no parameter tweeking: Seq. reads: 63767 MB/s random reads: 61410 MB/s Seq. writes: 56970 MB/s random writes: 53688 MB/s Increasing the read-ahead from 512 to 32768: Seq. reads: 113852 MB/s random reads: 54175 MB/s Seq. writes: 56337 MB/s random writes: 53743 MB/s For 2 disk RAID-0 on the two faster disks: Seq. reads: 139973 MB/s random reads: 63233 MB/s Seq. writes: 114407 MB/s random writes: 59745 MB/s Plus random file creation and reading doubles with RAID-0 vis-a-vis RAID-5. : And for reads you get 300% (slowest) disk speed with raid10 but only : 200% for raid5. Again, I don't get these speeds. Seq. reads are about 170% of the average of my three physical drives if I turn up the look-ahead. Then random access reads drops to slightly less than my slowest drive. : Theoretically that is. But as I said the raid10 tests I did showed it : being damn close to theoretical speed. But all such tests are highly : dependend on the usage pattern. So don't believe the bonnie output or : worse hdparm. Run the application you will be using and see if that : gets faster or slower. The "application" is all my daily work---which is quite varied. I'm trying to build the snappiest system I can given my limitations. : > But I would really like to know if I'm playing with fire putting my : > whole system on a RAID-0/non-striped LVM device (or striped LVM device : > w/o RAID) with daily snapshots, and good smartctl monitoring. : : You are. A disk might fail at any time and the snapshot only protects : you from filesystem corruption and/or accidental deletions, not disk : failure. I got it. : Also don't forget that snapshots will slow you down. Every first time : a block gets written after the snapshot it first has to read the old : block, write it to the snapshot and only then can write the new data. I did not understand this. So what you are sayig is that a snapshot is "living". That is, you don't just make it in an instant of time. Every time something not included in the snapshot changes the original gets written to the snapshot? That's quite different than what I thought. So I won't be using snapshots for backing up, I see. Dump and Restore? Dean - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-25 23:50 ` Dean S. Messing @ 2007-09-26 1:45 ` Goswin von Brederlow 2007-09-27 6:23 ` Dean S. Messing 2007-09-27 22:40 ` Help: very slow software RAID 5 Bill Davidsen 1 sibling, 1 reply; 44+ messages in thread From: Goswin von Brederlow @ 2007-09-26 1:45 UTC (permalink / raw) To: Dean S. Messing; +Cc: brederlo, linux-raid "Dean S. Messing" <deanm@sharplabs.com> writes: > Goswin von Brederlow writes: > : Dean Mesing writes: > : > Goswin von Brederlow writes: > : > : LVM is not the same as LVM. What I mean is that you still have choices > : > : left. > : > > : > Sorry, Goswin. Even though you gave your meaning, I still don't > : > understand you here. (I must be dense this morning.) > : > What does "LVM is not the same as LVM" mean? > : > : The ultimate risk and speed of lvm depends on the striping and > : distribution of LVs accross the disks. > > Even w/o LV striping, I don't know enough about LV organisation to > try to recover from a serious disk crash. Even with out striping you can have 1GB data of an LV on disk1, then 1GB on disk2, then 1GB on disk3. When disk2 dies you loose 1GB in the middle of the fs and that is rather damaging. Now why would anyone split up an LV like that? Sounds seriously stupid, right? Well. Thinking about what happens over longer time when you resize LVs a few times. It will just use the next free space unless the LV is flaged continious. Allocations will fragment unless you take care not to (for example pvmove other LVs out of the way). > : > : > : > : One thing you have to think about though. An lvm volume group will not > : > : start cleanly with a disk missing but you can force it to start > : > : anyway. So a lost disk does not mean all data is lost. But it does > : > : mean that any logical volume with data on the missing disk will have > : > : serious data corruption. > : > > : > If I am taking daily LVM snapshots will I not be able to reconstruct > : > the file system as of the last snapshot? That's all I require. > : > : A snapshot will only hold the differences between creation and now. So > : Not at all. > : > : What you would have to do is have the original on USB and work in a > : snapshot. But then there is no lvm command to commit a snapshot back > : to the original device to store the changes. > : > : I'm afraid you need to rsync the data to another disk or volume to > : make a backup. > > Ok. I think I understand. The snapshot could not be used to restore > to a master backup of the original since that backup in not in LV format. > > If I'm using an ext3 filesystem (which I plan to do) would Full and > Incremental dumps to a cheap 'n big USB drive (using the dump/restore > suite) not work? Probably. But why not rsync? It will copy all changes and the data on the USB disk will be accessible directly without restore. Very handy if you only need one file. > : Smart certainly is no replacement of a backup. Raid5 is also no > : replacement for a backup. > > I did not mean to imply I would forego backups. I've been using Unix > for too long (26 years) to be that foolish. I simply thought that > Smart would allow me to run RAID-0 or striped LV (and do backups!) > with reduced risk of having an actual disk failure since I would be > able to deal with a weak drive before it failed. Thanks for > disabusing me of my fantasy. If it works right, and the numbers are probably obviously wrong if not, you can see the number of bad blocks. If that starts rising then you know the disk won't last long anymore. But when was the last time one of your disks died by bad blocks apearing? Mine always sieze up and won't spin up anymore or the heads won't seek anymore or the electronic dies. Never had a disk where the magnetization failed and more and more bad blocks appeared. > : Beware though that lvm does not set the read ahead correctly (only the > : default size) while raid will set the read ahead to the sum of the > : disks read ahead (-1 or -2 disks for raid4/5/6). So by default all is > : not the same. So set the readahead to the same if you want to compare > : the two. > > Ok, good to know. However I'm not so sure RAID-4,5,6 actually sets > the readahead "correctly". My whole dilemma started when I saw > how slowly RAID-5 was running on three drives---slower than the physical > device speed of two of the three drives. > > Justin Piszcz suggested tweeking the parameters (in particular, > readahead). Indeed, increasing read-ahead did increase seq. read > speeds, but at a cost to random reads. And writes were still slow. > > For RAID-0, everything is faster, which makes the whole system snappy. Untuned I have this: # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md1 : active raid5 sdd2[3] sdc2[2] sdb2[1] sda2[0] 583062912 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] # blockdev --getra /dev/sda 256 # blockdev --getra /dev/md1 768 # blockdev --getra /dev/r/home 256 You see that the disk and LV is at the default of 256 blocks read ahead. But the raid it at (4-1)*256 == 768 blocks. You usualy can still raise those number a good bit. Esspecialy if you are working with large files and streaming access, like movies. :) > : For writes you get 50% of 300% (slowest) disk speed. So you sill have > : 150% speed total with raid10. With raid5 you have 200% (slowest) disk > : speed for continious writes and maybe 50% (slowest) disk speed for > : fragments. > > I don't get anywhere near 200% disk speed for writes for > 3 disk sequential writes in RAID-5. I barely get 100% > of the slowest drive in the array. As I said below: theoretically. > : > But I would really like to know if I'm playing with fire putting my > : > whole system on a RAID-0/non-striped LVM device (or striped LVM device > : > w/o RAID) with daily snapshots, and good smartctl monitoring. > : > : You are. A disk might fail at any time and the snapshot only protects > : you from filesystem corruption and/or accidental deletions, not disk > : failure. > > I got it. I hope you are sufficiently scared now to consider all the consequences. You seem to plan doing regular backups. That is good. That means what you actualy risk with raid0 (or imho preferably striped lv) is loosing yesterdays work and todays time to restore the backup. Now you can gamble that you won't have a disk failure too often, maybe not for years and the speedup of plain raid0 will save you more time commulative than you loose in those 2 days. I probably will. But due to Murphys law the failure will happen at the worst time and obviously you will be mad as hell at that time. For a single person and a single raid it all comes down to luck in the end. At work we just got a job of building a storage cluster with ~1000 disks. At that size the luck becomes statistics. A "the disk will probably not fail for years" becomes "10 disks will die". So my outlook at raid saftey might be a bit bleak. > : Also don't forget that snapshots will slow you down. Every first time > : a block gets written after the snapshot it first has to read the old > : block, write it to the snapshot and only then can write the new data. > > I did not understand this. So what you are sayig is that a snapshot > is "living". That is, you don't just make it in an instant of time. > Every time something not included in the snapshot changes the original > gets written to the snapshot? That's quite different than what I thought. Yes. That is actualy the beauty of the snapshot. You only need enough space to safe changes. You don't make a full copy. A snapshot is like an incremental backup. Without the full backup it depends on it is worthless. But it is incremental with the time reversed. It is all the changes from fluid now to a fixed point in time. > So I won't be using snapshots for backing up, I see. Dump and Restore? Actualy you should but not in the way you imagined. Make a snapshot of the filesystem to fix the contents of a fixed point in time. Then you can run your dump, rsync, tar, whatever software you use to backup on the snapshot. The normal FS can be used and changed meanwhile without risking races with the backup process. The backup will be from exactly the point in time when you made the snapshot even if it takes hours to do. > Dean MfG Goswin ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-26 1:45 ` Goswin von Brederlow @ 2007-09-27 6:23 ` Dean S. Messing 2007-09-27 9:51 ` Michal Soltys 0 siblings, 1 reply; 44+ messages in thread From: Dean S. Messing @ 2007-09-27 6:23 UTC (permalink / raw) To: brederlo; +Cc: linux-raid > Goswin von Brederlow writes: : > Dean Mesing writes: : > If I'm using an ext3 filesystem (which I plan to do) would Full and : > Incremental dumps to a cheap 'n big USB drive (using the dump/restore : > suite) not work? : : Probably. But why not rsync? It will copy all changes and the data on : the USB disk will be accessible directly without restore. Very handy : if you only need one file. I don't see how one would do incrementals. My backup system uses currently does a monthly full backup, a weekly level 3 (which saves everything that has changed since the last level 3 a week ago) and daily level 5's (which save everything that changed today). I keep 3 months worth of these. So basically if a file existed for more than 24 hours w/in the last three months I've got it somewhere in my backup partition. If I accidently delete a file and don't notice it for 10 days, no problem. I'm not sure rsync can do this. (I already use rsync to keep various directories on my 5 machines in sync). : If it works right, and the numbers are probably obviously wrong if : not, you can see the number of bad blocks. If that starts rising then : you know the disk won't last long anymore. But when was the last time : one of your disks died by bad blocks apearing? Mine always sieze up : and won't spin up anymore or the heads won't seek anymore or the : electronic dies. Never had a disk where the magnetization failed and : more and more bad blocks appeared. Actually I've never had a disk stop spinning. It's always other stuff where it stops doing I/O or gives corrupt data. : Untuned I have this: : : # cat /proc/mdstat : Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] : md1 : active raid5 sdd2[3] sdc2[2] sdb2[1] sda2[0] : 583062912 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] : # blockdev --getra /dev/sda : 256 : # blockdev --getra /dev/md1 : 768 : # blockdev --getra /dev/r/home : 256 : : You see that the disk and LV is at the default of 256 blocks read : ahead. But the raid it at (4-1)*256 == 768 blocks. : : You usualy can still raise those number a good bit. Esspecialy if you : are working with large files and streaming access, like movies. :) Someone on the Fedora list who is running 4 50 MB/s drives in RAID 5 array was getting read speeds of 120 MB/s or so. Not 300% but not to bad. He also had an untuned md device readahead of 768. With 3 devices I have an un-tuned one of 512, but going to 768 makes little difference. I must go up to 16384 to see any decent read improvement. I wonder why four drives works so much better than three. <snip> : I hope you are sufficiently scared now to consider all the : consequences. You seem to plan doing regular backups. That is : good. That means what you actualy risk with raid0 (or imho preferably : striped lv) is loosing yesterdays work and todays time to restore the : backup. Now you can gamble that you won't have a disk failure too : often, maybe not for years and the speedup of plain raid0 will save : you more time commulative than you loose in those 2 days. I'm not sure if I could quantify the time savings quite so pragmatically. But using a very snappy machine is simply a pleasure. That counts for something. I'm not afraid of restoring if I need to. : I probably will. But due to Murphys law the failure will happen at the : worst time and obviously you will be mad as hell at that time. For a : single person and a single raid it all comes down to luck in the end. Agreed. The other option, if I can swing it with my boss, is to purchase a 3ware true hardware RAID-5 card that presents the disks as one device. They are about $450 and the RAID-5 runs (from what I hear) quite fast for both read and writes (uses write-back with battery backup to get write speeds up). But you've given me some things to explore regarding RAID-10 and LV striping. Thanks. : At work we just got a job of building a storage cluster with ~1000 : disks. At that size the luck becomes statistics. A "the disk will : probably not fail for years" becomes "10 disks will die". So my : outlook at raid saftey might be a bit bleak. With that many disks, one is sure to fail every month or so unless they are top quality drives. Thanks again. Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-27 6:23 ` Dean S. Messing @ 2007-09-27 9:51 ` Michal Soltys 2007-09-27 22:10 ` Backups w/ rsync (was: Help: very slow software RAID 5.) Dean S. Messing 0 siblings, 1 reply; 44+ messages in thread From: Michal Soltys @ 2007-09-27 9:51 UTC (permalink / raw) To: linux-raid; +Cc: Dean S. Messing Dean S. Messing wrote: > > I don't see how one would do incrementals. My backup system uses > currently does a monthly full backup, a weekly level 3 (which > saves everything that has changed since the last level 3 a week ago) and > daily level 5's (which save everything that changed today). > Rsync is fantastic tool for incremental backups. Everything that didn't change can be hardlinked to previous entry. And time of performing the backup is pretty much neglible. Essentially - you have equivalent of full backups at almost minimal time and space cost possible. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Backups w/ rsync (was: Help: very slow software RAID 5.) 2007-09-27 9:51 ` Michal Soltys @ 2007-09-27 22:10 ` Dean S. Messing 2007-09-28 7:57 ` Backups w/ rsync Michael Tokarev 2007-09-28 14:48 ` Bill Davidsen 0 siblings, 2 replies; 44+ messages in thread From: Dean S. Messing @ 2007-09-27 22:10 UTC (permalink / raw) To: linux-raid Michal Soltys writes: : Dean S. Messing wrote: : > : > I don't see how one would do incrementals. My backup system uses : > currently does a monthly full backup, a weekly level 3 (which : > saves everything that has changed since the last level 3 a week ago) and : > daily level 5's (which save everything that changed today). : > : : Rsync is fantastic tool for incremental backups. Everything that didn't : change can be hardlinked to previous entry. And time of performing the : backup is pretty much neglible. Essentially - you have equivalent of : full backups at almost minimal time and space cost possible. It has been some time since I read the rsync man page. I see that there is (among the bazillion and one switches) a "--link-dest=DIR" switch which I suppose does what you describe. I'll have to experiment with this and think things through. Thanks, Michal. Dean P.S. I changed the Subject: to reflect the new subject. Not sure if that starts a new thread or not. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-27 22:10 ` Backups w/ rsync (was: Help: very slow software RAID 5.) Dean S. Messing @ 2007-09-28 7:57 ` Michael Tokarev 2007-09-28 10:23 ` Goswin von Brederlow 2007-09-29 0:11 ` Dean S. Messing 2007-09-28 14:48 ` Bill Davidsen 1 sibling, 2 replies; 44+ messages in thread From: Michael Tokarev @ 2007-09-28 7:57 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid Dean S. Messing wrote: > Michal Soltys writes: [] > : Rsync is fantastic tool for incremental backups. Everything that didn't > : change can be hardlinked to previous entry. And time of performing the > : backup is pretty much neglible. Essentially - you have equivalent of > : full backups at almost minimal time and space cost possible. > > It has been some time since I read the rsync man page. I see that > there is (among the bazillion and one switches) a "--link-dest=DIR" > switch which I suppose does what you describe. I'll have to > experiment with this and think things through. Thanks, Michal. I haven't actually read the rsync manpage to this detail, but I do use rsync for backups this way, but a bit differently - yet more understandable without referring to manpages... ;) the procedure is something like this: cd /backups rm -rf tmp/ cp -al $yesterday tmp/ rsync -r --delete -t ... /filesystem tmp mv tmp $today That is, link the previous backup to temp (which takes no space except directories), rsync current files to there (rsync will break links for changed files), and rename temp to $today. /mjt ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-28 7:57 ` Backups w/ rsync Michael Tokarev @ 2007-09-28 10:23 ` Goswin von Brederlow 2007-09-28 11:18 ` Michal Soltys 2007-09-29 0:11 ` Dean S. Messing 1 sibling, 1 reply; 44+ messages in thread From: Goswin von Brederlow @ 2007-09-28 10:23 UTC (permalink / raw) To: Michael Tokarev; +Cc: Dean S. Messing, linux-raid Michael Tokarev <mjt@tls.msk.ru> writes: > Dean S. Messing wrote: >> Michal Soltys writes: > [] >> : Rsync is fantastic tool for incremental backups. Everything that didn't >> : change can be hardlinked to previous entry. And time of performing the >> : backup is pretty much neglible. Essentially - you have equivalent of >> : full backups at almost minimal time and space cost possible. >> >> It has been some time since I read the rsync man page. I see that >> there is (among the bazillion and one switches) a "--link-dest=DIR" >> switch which I suppose does what you describe. I'll have to >> experiment with this and think things through. Thanks, Michal. > > I haven't actually read the rsync manpage to this detail, but I > do use rsync for backups this way, but a bit differently - yet > more understandable without referring to manpages... ;) > > the procedure is something like this: > > cd /backups > rm -rf tmp/ > cp -al $yesterday tmp/ > rsync -r --delete -t ... /filesystem tmp > mv tmp $today > > That is, link the previous backup to temp (which takes no space > except directories), rsync current files to there (rsync will > break links for changed files), and rename temp to $today. I was thinking Michal Soltys ment it this way. You can probably replace the cp invocation with an rsync one but that hardly changes things. I don't think you can do this in a single rsync call. Please correct me if I'm wrong. MfG Goswin ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-28 10:23 ` Goswin von Brederlow @ 2007-09-28 11:18 ` Michal Soltys 2007-09-28 12:47 ` Goswin von Brederlow 0 siblings, 1 reply; 44+ messages in thread From: Michal Soltys @ 2007-09-28 11:18 UTC (permalink / raw) To: linux-raid Goswin von Brederlow wrote: > > I was thinking Michal Soltys ment it this way. You can probably > replace the cp invocation with an rsync one but that hardly changes > things. > > I don't think you can do this in a single rsync call. Please correct > me if I'm wrong. > something along this way: rsync <other options> --link-dest /backup/2007-01-01/ \ rsync://user@server/module /backup/2007-01-02/ It will create backup of .../module in ...-02 hardlinking to ...-01 (if possible). So, no need for cp -l. There's similar example in rsync man. Also - multiple --link-dest are supported too. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-28 11:18 ` Michal Soltys @ 2007-09-28 12:47 ` Goswin von Brederlow 2007-09-28 14:17 ` Michal Soltys 0 siblings, 1 reply; 44+ messages in thread From: Goswin von Brederlow @ 2007-09-28 12:47 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid Michal Soltys <nozo@ziu.info> writes: > Goswin von Brederlow wrote: >> >> I was thinking Michal Soltys ment it this way. You can probably >> replace the cp invocation with an rsync one but that hardly changes >> things. >> >> I don't think you can do this in a single rsync call. Please correct >> me if I'm wrong. >> > > something along this way: > > rsync <other options> --link-dest /backup/2007-01-01/ \ > rsync://user@server/module /backup/2007-01-02/ > > It will create backup of .../module in ...-02 hardlinking to ...-01 > (if possible). > > So, no need for cp -l. There's similar example in rsync man. Also - > multiple --link-dest are supported too. Thanks, should have looked at --link-dest before replying. I wonder how long rsync had that option. I wrote my own rsync script years ago. Maybe it predates this. MfG Goswin ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-28 12:47 ` Goswin von Brederlow @ 2007-09-28 14:17 ` Michal Soltys 0 siblings, 0 replies; 44+ messages in thread From: Michal Soltys @ 2007-09-28 14:17 UTC (permalink / raw) To: linux-raid Goswin von Brederlow wrote: > > Thanks, should have looked at --link-dest before replying. I wonder > how long rsync had that option. I wrote my own rsync script years > ago. Maybe it predates this. > According to news file, since ~ 2002-9, so quite a bit of time. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-28 7:57 ` Backups w/ rsync Michael Tokarev 2007-09-28 10:23 ` Goswin von Brederlow @ 2007-09-29 0:11 ` Dean S. Messing 2007-09-29 8:43 ` Michael Tokarev 1 sibling, 1 reply; 44+ messages in thread From: Dean S. Messing @ 2007-09-29 0:11 UTC (permalink / raw) To: linux-raid Michael Tokarev writes: : Dean S. Messing wrote: : > Michal Soltys writes: : [] : > : Rsync is fantastic tool for incremental backups. Everything that didn't : > : change can be hardlinked to previous entry. And time of performing the : > : backup is pretty much neglible. Essentially - you have equivalent of : > : full backups at almost minimal time and space cost possible. : > : > It has been some time since I read the rsync man page. I see that : > there is (among the bazillion and one switches) a "--link-dest=DIR" : > switch which I suppose does what you describe. I'll have to : > experiment with this and think things through. Thanks, Michal. : : I haven't actually read the rsync manpage to this detail, but I : do use rsync for backups this way, but a bit differently - yet : more understandable without referring to manpages... ;) : : the procedure is something like this: : : cd /backups : rm -rf tmp/ : cp -al $yesterday tmp/ : rsync -r --delete -t ... /filesystem tmp : mv tmp $today : : That is, link the previous backup to temp (which takes no space : except directories), rsync current files to there (rsync will : break links for changed files), and rename temp to $today. Very nice. The breaking of the hardlink is the key. I wondered about this when Michal using rsync yesterday. I just tested the idea. It does indeed work. One question: why do you not use "-a" instead of "-r -t"? It would seem that one would want to preserve permissions, and group and user ownerships. Also, is there a reason to _not_ preserve sym-links in the backup. Your script appears to copy the referent. Dean P.S. I think this thread has wandered from the topic of "linux-raid". I'm happy to cease and desist if this Off Topic discussion offends. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-29 0:11 ` Dean S. Messing @ 2007-09-29 8:43 ` Michael Tokarev 0 siblings, 0 replies; 44+ messages in thread From: Michael Tokarev @ 2007-09-29 8:43 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid Dean S. Messing wrote: > Michael Tokarev writes: [] > : the procedure is something like this: > : > : cd /backups > : rm -rf tmp/ > : cp -al $yesterday tmp/ > : rsync -r --delete -t ... /filesystem tmp > : mv tmp $today > : > : That is, link the previous backup to temp (which takes no space > : except directories), rsync current files to there (rsync will > : break links for changed files), and rename temp to $today. > > Very nice. The breaking of the hardlink is the key. I wondered about > this when Michal using rsync yesterday. I just tested the idea. It > does indeed work. Well, others in this thread already presented other, simpler ways, namely using --link-dest rsync option. I was just too lazy to read the man page, but I already knew other tools can do the work ;) > One question: why do you not use "-a" instead of "-r -t"? It would > seem that one would want to preserve permissions, and group and user > ownerships. Also, is there a reason to _not_ preserve sym-links > in the backup. Your script appears to copy the referent. Note the above -- "SOMETHING like this". I was typing from memory, it's not an actual script, just to show an idea. Sure real script does more than that, including error checking too. /mjt ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-27 22:10 ` Backups w/ rsync (was: Help: very slow software RAID 5.) Dean S. Messing 2007-09-28 7:57 ` Backups w/ rsync Michael Tokarev @ 2007-09-28 14:48 ` Bill Davidsen 2007-09-28 14:57 ` Wolfgang Denk 2007-09-28 15:11 ` Jon Nelson 1 sibling, 2 replies; 44+ messages in thread From: Bill Davidsen @ 2007-09-28 14:48 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid Dean S. Messing wrote: > It has been some time since I read the rsync man page. I see that > there is (among the bazillion and one switches) a "--link-dest=DIR" > switch which I suppose does what you describe. I'll have to > experiment with this and think things through. Thanks, Michal. > Be aware that rsync is useful for making a *copy* of your files, which isn't always the best backup. If the goal is to preserve data and be able to recover in time of disaster, it's probably not optimal, while if you need frequent access to old or deleted files it's fine. For example, full and incremental backup methods such as dump and restore are usually faster to take and restore than a copy, and allow easy incremental backups. Consider: touch bkup_full_new timestamp=$(date +%Y%m%d-%T) find /home -depth | cpio -o -Hcrc | gzip -3 >/mnt/USBbkup/full-$timestamp && mv -f bkup_full_new bkup_full && touch bkup_incr Now you can do an incremental (since last full or incremental) or partial (since last full): touch bkup_incr_new timestamp=$(date +%Y%m%d-%T) find /home -cnewer bkup_incr | cpio -o -Hcrc | gzip -3 >/mnt/USBbkup/incr-$timestamp && mv -f bkup_incr_new bkup_incr timestamp=$(date +%Y%m%d-%T) find /home -cnewer bkup_full | cpio -o -Hcrc | gzip -3 >/mnt/USBbkup/part-$timestamp The advantage of the incr is that files are smaller, the advantage of partial is that you only restore full+part (two total), and the advantage of rsync is that deleted files will really be deleted (that's why I say it a copy, not a backup). Hope this is useful. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-28 14:48 ` Bill Davidsen @ 2007-09-28 14:57 ` Wolfgang Denk 2007-09-28 16:50 ` Bill Davidsen 2007-10-01 4:45 ` Michal Soltys 2007-09-28 15:11 ` Jon Nelson 1 sibling, 2 replies; 44+ messages in thread From: Wolfgang Denk @ 2007-09-28 14:57 UTC (permalink / raw) To: Bill Davidsen; +Cc: Dean S. Messing, linux-raid Dear Bill, in message <46FD1442.70707@tmr.com> you wrote: > > Be aware that rsync is useful for making a *copy* of your files, which > isn't always the best backup. If the goal is to preserve data and be > able to recover in time of disaster, it's probably not optimal, while if > you need frequent access to old or deleted files it's fine. If you want to do real backups you should use real tools, like bacula etc. > Now you can do an incremental (since last full or incremental) or > partial (since last full): > > touch bkup_incr_new > timestamp=$(date +%Y%m%d-%T) > find /home -cnewer bkup_incr | cpio -o -Hcrc | > gzip -3 >/mnt/USBbkup/incr-$timestamp && > mv -f bkup_incr_new bkup_incr > > timestamp=$(date +%Y%m%d-%T) > find /home -cnewer bkup_full | cpio -o -Hcrc | > gzip -3 >/mnt/USBbkup/part-$timestamp Now have Johnny Loser downloading some stuff, say: $ wget -N ftp://ftp.kernel.org/pub/linux/kernel/v2.6/linux-2.6.12.tar.gz Are you aware that this file will never be backed up by your script? Also, what about permission / owner changes etc.? A backup tool should never work based on timestamps alone. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de All he had was nothing, but that was something, and now it had been taken away. - Terry Pratchett, _Sourcery_ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-28 14:57 ` Wolfgang Denk @ 2007-09-28 16:50 ` Bill Davidsen 2007-10-01 4:45 ` Michal Soltys 1 sibling, 0 replies; 44+ messages in thread From: Bill Davidsen @ 2007-09-28 16:50 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Dean S. Messing, linux-raid Wolfgang Denk wrote: > Dear Bill, > > in message <46FD1442.70707@tmr.com> you wrote: > >> Be aware that rsync is useful for making a *copy* of your files, which >> isn't always the best backup. If the goal is to preserve data and be >> able to recover in time of disaster, it's probably not optimal, while if >> you need frequent access to old or deleted files it's fine. >> > > If you want to do real backups you should use real tools, like bacula > etc. > > >> Now you can do an incremental (since last full or incremental) or >> partial (since last full): >> >> touch bkup_incr_new >> timestamp=$(date +%Y%m%d-%T) >> find /home -cnewer bkup_incr | cpio -o -Hcrc | >> gzip -3 >/mnt/USBbkup/incr-$timestamp && >> mv -f bkup_incr_new bkup_incr >> >> timestamp=$(date +%Y%m%d-%T) >> find /home -cnewer bkup_full | cpio -o -Hcrc | >> gzip -3 >/mnt/USBbkup/part-$timestamp >> > > Now have Johnny Loser downloading some stuff, say: > > $ wget -N ftp://ftp.kernel.org/pub/linux/kernel/v2.6/linux-2.6.12.tar.gz > > Are you aware that this file will never be backed up by your script? > > Also, what about permission / owner changes etc.? > Do note the use of -cnewer, which is *not* the same as the modified time, and which exactly addresses your points. Ownership changes, etc, will causes a backup of the contents as well, but they will be preserved. > A backup tool should never work based on timestamps alone. > Feel free to use a tool which does a checksum of every file if you feel it's needed on your system. I'm not trying to defend against people playing with ctime or system time of day, just protect normal day-to-day data. > Best regards, > > Wolfgang Denk > > -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-28 14:57 ` Wolfgang Denk 2007-09-28 16:50 ` Bill Davidsen @ 2007-10-01 4:45 ` Michal Soltys 1 sibling, 0 replies; 44+ messages in thread From: Michal Soltys @ 2007-10-01 4:45 UTC (permalink / raw) To: linux-raid Wolfgang Denk wrote: > Dear Bill, > > in message <46FD1442.70707@tmr.com> you wrote: >> >> Be aware that rsync is useful for making a *copy* of your files, which >> isn't always the best backup. If the goal is to preserve data and be >> able to recover in time of disaster, it's probably not optimal, while if >> you need frequent access to old or deleted files it's fine. > > If you want to do real backups you should use real tools, like bacula > etc. > I wouldn't agree here. All depends on how you organize yuor things, write scripts, and so on. It isn't any less "real" solution than amanda or bacula. It's much more DIY solution though, so not everyone will be inclined to use it. ps. Sorry for offtopic. Last in this subject from me. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-28 14:48 ` Bill Davidsen 2007-09-28 14:57 ` Wolfgang Denk @ 2007-09-28 15:11 ` Jon Nelson 2007-09-28 16:25 ` Bill Davidsen 1 sibling, 1 reply; 44+ messages in thread From: Jon Nelson @ 2007-09-28 15:11 UTC (permalink / raw) Cc: linux-raid Please note: I'm having trouble w/gmail's formatting... so please forgive this if it looks horrible. :-| On 9/28/07, Bill Davidsen <davidsen@tmr.com> wrote: > > Dean S. Messing wrote: > > It has been some time since I read the rsync man page. I see that > > there is (among the bazillion and one switches) a "--link-dest=DIR" > > switch which I suppose does what you describe. I'll have to > > experiment with this and think things through. Thanks, Michal. > > > > Be aware that rsync is useful for making a *copy* of your files, which > isn't always the best backup. If the goal is to preserve data and be > able to recover in time of disaster, it's probably not optimal, while if > you need frequent access to old or deleted files it's fine. You are absolutely right when you say it isn't always the best backup. There IS no 'best' backup. For example, full and incremental backup methods such as dump and > restore are usually faster to take and restore than a copy, and allow > easy incremental backups. If "copy" meant "full data copy" and not "hard link where possible", I'd agree with you. However... I use a nightly rsync (with --link-dest) to backup more than 40 GiB to a drbd-backed drive. I'll explain why I use drbd in just a moment. Technically, I have a 3 disk raid5 (Linux Software Raid) which is the primary store for the data. Then I have a second drive (non-raid) that is used as a drbd backing store, which I rsync *to* from filesystems built off of the raid. I keep *30 days* of nightly backups on the drbd volume. The average difference between nightly backups is about 45MB, or a bit less than 10%. The total disk usage is (on average) about 10% more than a single backup. On an AMD x86-64 dual core (3600 de-clocked to run at 1GHz) the entire process takes between 1 and 2 minutes, from start to finish. Using hard links means I can snapshot ~175,000 files, about 40GiB, in under 2 minutes - something I'd have a hard time doing with dump+restore. I could easily make incremental or differential copies, and maybe even in that time frame, but I'm not sure I much advantage in that. Furthermore, as you state, dump+restore does *not* include the removal of files which for some scenarios is a huge deal. The long and short of it is this: using hard links (via rsync or cp or whatever) to do snapshot backups can be really, really fast and have significant advantages but there are, as with all things, some downsides. Those downsides are fairly easily mitigated, however. In my case, I can lose 1 drive of the raid and I'm OK. If I lose 2, then the other drive (not part of the raid) has the data I care about. If I lose the entire machine, the *other* machine (the other end of the drbd, only woken up every other day or so) has the data. Going back 30 days. And a bare-metal "restore" is as fast as your I/O is. I back my /really/ important stuff up on DLT. Thanks again to drbd, when the secondary comes up it communicates with the primary and is able to figure out only which blocks have changed and only copies those. On a nightly basis that is usually a couple of hundred megabytes, and at 12MiB/s that doesn't take terribly long to take care of. -- Jon ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-28 15:11 ` Jon Nelson @ 2007-09-28 16:25 ` Bill Davidsen 2007-09-28 16:52 ` Jon Nelson 0 siblings, 1 reply; 44+ messages in thread From: Bill Davidsen @ 2007-09-28 16:25 UTC (permalink / raw) To: Jon Nelson; +Cc: linux-raid Jon Nelson wrote: > Please note: I'm having trouble w/gmail's formatting... so please > forgive this if it looks horrible. :-| > > On 9/28/07, Bill Davidsen <davidsen@tmr.com> wrote: > >> Dean S. Messing wrote: >> >>> It has been some time since I read the rsync man page. I see that >>> there is (among the bazillion and one switches) a "--link-dest=DIR" >>> switch which I suppose does what you describe. I'll have to >>> experiment with this and think things through. Thanks, Michal. >>> >>> >> Be aware that rsync is useful for making a *copy* of your files, which >> isn't always the best backup. If the goal is to preserve data and be >> able to recover in time of disaster, it's probably not optimal, while if >> you need frequent access to old or deleted files it's fine. >> > > > You are absolutely right when you say it isn't always the best backup. There > IS no 'best' backup. > > For example, full and incremental backup methods such as dump and > >> restore are usually faster to take and restore than a copy, and allow >> easy incremental backups. >> > > > If "copy" meant "full data copy" and not "hard link where possible", I'd > agree with you. However... > > I use a nightly rsync (with --link-dest) to backup more than 40 GiB to a > drbd-backed drive. I'll explain why I use drbd in just a moment. > > Technically, I have a 3 disk raid5 (Linux Software Raid) which is the > primary store for the data. Then I have a second drive (non-raid) that is > used as a drbd backing store, which I rsync *to* from filesystems built off > of the raid. I keep *30 days* of nightly backups on the drbd volume. The > average difference between nightly backups is about 45MB, or a bit less than > 10%. The total disk usage is (on average) about 10% more than a single > backup. On an AMD x86-64 dual core (3600 de-clocked to run at 1GHz) the > entire process takes between 1 and 2 minutes, from start to finish. > > Using hard links means I can snapshot ~175,000 files, about 40GiB, in under > 2 minutes - something I'd have a hard time doing with dump+restore. I could > easily make incremental or differential copies, and maybe even in that time > frame, but I'm not sure I much advantage in that. Furthermore, as you state, > dump+restore does *not* include the removal of files which for some > scenarios is a huge deal. > What I don't understand is how you use hard links... because a hard link needs to be in the same filesystem, and because a hard link is just another pointer to the inode and doesn't make a physical copy of the data to another device or to anywhere, really. > The long and short of it is this: using hard links (via rsync or cp or > whatever) to do snapshot backups can be really, really fast and have > significant advantages but there are, as with all things, some downsides. > Those downsides are fairly easily mitigated, however. In my case, I can lose > 1 drive of the raid and I'm OK. If I lose 2, then the other drive (not part > of the raid) has the data I care about. If I lose the entire machine, the > *other* machine (the other end of the drbd, only woken up every other day or > so) has the data. Going back 30 days. And a bare-metal "restore" is as fast > as your I/O is. I back my /really/ important stuff up on DLT. > > Thanks again to drbd, when the secondary comes up it communicates with the > primary and is able to figure out only which blocks have changed and only > copies those. On a nightly basis that is usually a couple of hundred > megabytes, and at 12MiB/s that doesn't take terribly long to take care of. > -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Backups w/ rsync 2007-09-28 16:25 ` Bill Davidsen @ 2007-09-28 16:52 ` Jon Nelson 0 siblings, 0 replies; 44+ messages in thread From: Jon Nelson @ 2007-09-28 16:52 UTC (permalink / raw) Cc: linux-raid On 9/28/07, Bill Davidsen <davidsen@tmr.com> wrote: > What I don't understand is how you use hard links... because a hard link > needs to be in the same filesystem, and because a hard link is just > another pointer to the inode and doesn't make a physical copy of the > data to another device or to anywhere, really. Yes, I know how hard links work. There is (one) physical copy of the data when it goes from the filesystem on the raid to the filesystem on the drbd. Subsequent "copies" of the same file, assuming the file has not changed, are all hard links on the drbd-backed filesystem. Thus, I have one *physical* copy of the data and a whole bunch of hard links. Now, since I'm using drbd I actually have *two* physical copies (for a total of three if you include the original) because the *other* machine has a block-for-block copy of the drbd device (or it did, as of a few days ago). link-dest basically works like this: Assuming we are going to "copy" (using that word loosely here) file "A" from "/source" to "/dest/backup.tmp/", and we've told rsync that "/dest/backup.1/A" might exist: If "/dest/backup.1/A" does not exist: make a physical copy from "/source/A" to "/dest/backup.tmp/A". If it does exist, and the two files are considered identical, simply hardlink "/dest/backup.tmp/A" to "/dest/backup.1/A". When all files are copied, move every "/dest/backup.N" (N is a number) to "/dest/backup.N+1" If /dest/backup.31 exists, delete it. Move /dest/backup.tmp to /dest/backup.1 (which was just renamed /dest/backup.2) I can do all of this, for 175K files (40G), in under 2 minutes on modest hardware. I end up with: 1+1 physical copies of the data (local drbd copy and remote drbd copy) There is more but if I may suggest: if you want more details contact me off-line, I'm pretty sure the linux-raid folks couldn't care less about rsync and drbd. -- Jon ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-25 23:50 ` Dean S. Messing 2007-09-26 1:45 ` Goswin von Brederlow @ 2007-09-27 22:40 ` Bill Davidsen 2007-09-28 23:38 ` Dean S. Messing 1 sibling, 1 reply; 44+ messages in thread From: Bill Davidsen @ 2007-09-27 22:40 UTC (permalink / raw) To: Dean S. Messing; +Cc: brederlo, linux-raid Dean S. Messing wrote: > Again, I don't get these speeds. Seq. reads are about > 170% of the average of my three physical drives if I turn up > the look-ahead. Then random access reads drops to slightly less > than my slowest drive. > As nearly as I can tell, Dean was talking about RAID-10 at that point (I also suggested that) which you haven't tried. For small numbers of drives, assume the read speed will be (N - 1) * S for large sequential read, using RAID-10. Where S is the speed of a single drive. Random read depends on so many things I can't begin to quantify them in anything less than a full white paper, but for a single thread assume somewhere around S and aggregate (N - 1) * S again. Writes depend a lot on system tuning, stripe size, stripe_cache_size, chunk size, etc. Fortunately the best way to boost write speed is to have lots of memory and let the kernel buffer. Finally, when you create your ext filesystem, think of: - ext2 - no journal - noatime mounts to avoid journal writes - manually make the journal file *large* to spread head motion over drives - consider moving journal file to a dedicated device (that old 20GB PATA drive?) - use the ext3 "stride" tuning stuff (I'm quantifying that in the next ten days). Or just make a RAID-10 "far" array and stop agonizing over this stuff, there is no config which is best for everything, you must realize "fast, cheap, reliable - pick two" is the design paradigm of RAID, and the more you optimize for one usage pattern the more you impact some other. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-27 22:40 ` Help: very slow software RAID 5 Bill Davidsen @ 2007-09-28 23:38 ` Dean S. Messing 2007-09-29 14:52 ` Bill Davidsen 0 siblings, 1 reply; 44+ messages in thread From: Dean S. Messing @ 2007-09-28 23:38 UTC (permalink / raw) To: linux-raid Bill Davidsen wrote: : Dean S. Messing wrote: : > Again, I don't get these speeds. Seq. reads are about : > 170% of the average of my three physical drives if I turn up : > the look-ahead. Then random access reads drops to slightly less : > than my slowest drive. : > : As nearly as I can tell, Dean was talking about RAID-10 at that point (I : also suggested that) which you haven't tried. I was talking about the three drive RAID-5 on which I ran bonnie++ measurements. I have not (yet) tried RAID-10. : For small numbers of : drives, assume the read speed will be (N - 1) * S for large sequential : read, using RAID-10. Where S is the speed of a single drive. Random read : depends on so many things I can't begin to quantify them in anything : less than a full white paper, but for a single thread assume somewhere : around S and aggregate (N - 1) * S again. Writes depend a lot on system : tuning, stripe size, stripe_cache_size, chunk size, etc. Fortunately the : best way to boost write speed is to have lots of memory and let the : kernel buffer. How does one "let the kernel buffer"? (I have plenty of memory for most things.) I know about "write-back" vs. "write-through" to reduce the write asymmetry of RAID-5. Is this what you mean by a kernel buffer? : Finally, when you create your ext filesystem, think of: : - ext2 - no journal : - noatime mounts to avoid journal writes : - manually make the journal file *large* to spread head motion over drives : - consider moving journal file to a dedicated device (that old 20GB : PATA drive?) : - use the ext3 "stride" tuning stuff (I'm quantifying that in the next : ten days). : : Or just make a RAID-10 "far" array and stop agonizing over this stuff, : there is no config which is best for everything, you must realize "fast, : cheap, reliable - pick two" is the design paradigm of RAID, and the more : you optimize for one usage pattern the more you impact some other. Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-28 23:38 ` Dean S. Messing @ 2007-09-29 14:52 ` Bill Davidsen 0 siblings, 0 replies; 44+ messages in thread From: Bill Davidsen @ 2007-09-29 14:52 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid Dean S. Messing wrote: > Bill Davidsen wrote: > : Dean S. Messing wrote: > : > Again, I don't get these speeds. Seq. reads are about > : > 170% of the average of my three physical drives if I turn up > : > the look-ahead. Then random access reads drops to slightly less > : > than my slowest drive. > : > > : As nearly as I can tell, Dean was talking about RAID-10 at that point (I > : also suggested that) which you haven't tried. > > I was talking about the three drive RAID-5 on which I ran bonnie++ measurements. > I have not (yet) tried RAID-10. > > : For small numbers of > : drives, assume the read speed will be (N - 1) * S for large sequential > : read, using RAID-10. Where S is the speed of a single drive. Random read > : depends on so many things I can't begin to quantify them in anything > : less than a full white paper, but for a single thread assume somewhere > : around S and aggregate (N - 1) * S again. Writes depend a lot on system > : tuning, stripe size, stripe_cache_size, chunk size, etc. Fortunately the > : best way to boost write speed is to have lots of memory and let the > : kernel buffer. > > How does one "let the kernel buffer"? (I have plenty of memory for > most things.) I know about "write-back" vs. "write-through" to reduce > the write asymmetry of RAID-5. Is this what you mean by a kernel > buffer? > Just by having adequate memory you will get kernel buffering (unless you use fsync or similar), and performance goes up if you increase your stripe_cache_size, although you hit diminishing returns on that somewhere between 8-32MB. > : Finally, when you create your ext filesystem, think of: > : - ext2 - no journal > : - noatime mounts to avoid journal writes > Please try this before you reach any conclusions. Doing measurements on a filesystem instead of raw raid arrays adds bottlenecks. > : - manually make the journal file *large* to spread head motion over drives > : - consider moving journal file to a dedicated device (that old 20GB > : PATA drive?) > : - use the ext3 "stride" tuning stuff (I'm quantifying that in the next > : ten days). > : > : Or just make a RAID-10 "far" array and stop agonizing over this stuff, > : there is no config which is best for everything, you must realize "fast, > : cheap, reliable - pick two" is the design paradigm of RAID, and the more > : you optimize for one usage pattern the more you impact some other. > > Dean > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-25 18:16 ` Dean S. Messing 2007-09-25 21:46 ` Goswin von Brederlow @ 2007-09-27 22:17 ` Bill Davidsen 2007-09-28 23:21 ` Dean S. Messing 1 sibling, 1 reply; 44+ messages in thread From: Bill Davidsen @ 2007-09-27 22:17 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid, brederlo Dean S. Messing wrote: > I have also discovered "smartctl" and have read that if the short smartctl > tests are run daily and the long test weekly that the chances of being > caught "with my pants down" are quite low, even in a two disk RAID-0 > config. What is your opinion? > There's a good paper on using smartctl to predict the health of disks, and if you can't find it I probably have a copy somewhere, since I gave a presentation on RAID issues which included it. But the basic premise was that if you see errors of certain types, the drives are likely to fail soon. It did *not* say that absent these warnings the drives were unlikely to fail, un fact most drives which did fail did so without warning. So for about 90% of the failures there is no warning. I had servers a few years ago, running 6TB/server, on lots of small fast drives, and I concluded that the predictive value of SMART was so small that it didn't justify looking at the reports. Take that as my opinion, assume that drives fail without warning. I'm getting around to replying to several things you have said in various posts, so that people who are threading answers will be happy... -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Help: very slow software RAID 5. 2007-09-27 22:17 ` Bill Davidsen @ 2007-09-28 23:21 ` Dean S. Messing 0 siblings, 0 replies; 44+ messages in thread From: Dean S. Messing @ 2007-09-28 23:21 UTC (permalink / raw) To: linux-raid, brederlo : Dean S. Messing wrote: : > I have also discovered "smartctl" and have read that if the short smartctl : > tests are run daily and the long test weekly that the chances of being : > caught "with my pants down" are quite low, even in a two disk RAID-0 : > config. What is your opinion? : > : : There's a good paper on using smartctl to predict the health of disks, : and if you can't find it I probably have a copy somewhere, since I gave : a presentation on RAID issues which included it. But the basic premise : was that if you see errors of certain types, the drives are likely to : fail soon. It did *not* say that absent these warnings the drives were : unlikely to fail, un fact most drives which did fail did so without : warning. So for about 90% of the failures there is no warning. : : I had servers a few years ago, running 6TB/server, on lots of small fast : drives, and I concluded that the predictive value of SMART was so small : that it didn't justify looking at the reports. Take that as my opinion, : assume that drives fail without warning. From what you and another poster said (about the False Alarm rate of Smartctl) I'll put my trust in backups, alone. I agree: if it predicts such a low % of failures, there's no point to waste time reading the reports and having a false sense of security. : I'm getting around to replying to several things you have said in : various posts, so that people who are threading answers will be happy... I'll look forward to your comments, especially on my misconceptions. I've learned a great deal already. Dean ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2007-10-01 4:45 UTC | newest] Thread overview: 44+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-09-18 23:09 Help: very slow software RAID 5 Dean S. Messing 2007-09-19 0:05 ` Justin Piszcz 2007-09-19 1:49 ` Dean S. Messing 2007-09-19 8:38 ` Justin Piszcz 2007-09-19 17:49 ` Dean S. Messing 2007-09-19 18:25 ` Justin Piszcz 2007-09-19 23:31 ` Dean S. Messing 2007-09-20 8:25 ` Justin Piszcz 2007-09-20 18:16 ` Michal Soltys 2007-09-20 19:06 ` Dean S. Messing 2007-09-20 15:33 ` Bill Davidsen 2007-09-20 18:47 ` Dean S. Messing 2007-09-20 21:08 ` Michael Tokarev 2007-09-21 0:58 ` Dean S. Messing 2007-09-21 13:00 ` Bill Davidsen 2007-09-21 20:01 ` Dean S. Messing 2007-09-21 20:21 ` Dean S. Messing 2007-09-25 9:31 ` Goswin von Brederlow 2007-09-25 18:16 ` Dean S. Messing 2007-09-25 21:46 ` Goswin von Brederlow 2007-09-25 23:50 ` Dean S. Messing 2007-09-26 1:45 ` Goswin von Brederlow 2007-09-27 6:23 ` Dean S. Messing 2007-09-27 9:51 ` Michal Soltys 2007-09-27 22:10 ` Backups w/ rsync (was: Help: very slow software RAID 5.) Dean S. Messing 2007-09-28 7:57 ` Backups w/ rsync Michael Tokarev 2007-09-28 10:23 ` Goswin von Brederlow 2007-09-28 11:18 ` Michal Soltys 2007-09-28 12:47 ` Goswin von Brederlow 2007-09-28 14:17 ` Michal Soltys 2007-09-29 0:11 ` Dean S. Messing 2007-09-29 8:43 ` Michael Tokarev 2007-09-28 14:48 ` Bill Davidsen 2007-09-28 14:57 ` Wolfgang Denk 2007-09-28 16:50 ` Bill Davidsen 2007-10-01 4:45 ` Michal Soltys 2007-09-28 15:11 ` Jon Nelson 2007-09-28 16:25 ` Bill Davidsen 2007-09-28 16:52 ` Jon Nelson 2007-09-27 22:40 ` Help: very slow software RAID 5 Bill Davidsen 2007-09-28 23:38 ` Dean S. Messing 2007-09-29 14:52 ` Bill Davidsen 2007-09-27 22:17 ` Bill Davidsen 2007-09-28 23:21 ` Dean S. Messing
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).