* RAID-0/5/6 performances @ 2013-12-05 19:24 Piergiorgio Sartor 2013-12-05 21:57 ` NeilBrown 2013-12-06 9:24 ` Stan Hoeppner 0 siblings, 2 replies; 7+ messages in thread From: Piergiorgio Sartor @ 2013-12-05 19:24 UTC (permalink / raw) To: linux-raid Hi all, I've a system, with an LSI 2308 SAS controller and 5 2.5" HDD attached. Each HDD can do around 100MB/sec read/write. This was tested will all HDDs in parallel, to make sure the controller can sustain them. Single disk has same performance. I was testing RAID 0/5/6 perfomances and I found something I could not clearly understand. The test was done with "dd", I wanted to know the maximum possible performance. Specifically, for reading: dd if=/dev/md127 of=/dev/null bs=4k For writing: dd if=/dev/zero of=/dev/md127 bs=4k conv=fdatasync Note than large block size did not change the results. I guess the page size is quite optimal. I tested each RAID with 4 and 5 HDDs, with chunk size of 512k, 64k and 16k. The "stripe_cache_size" was set to the max 32768. The results were observed with "iostat -k 5", taking care to consider variations and ramp up. The table, with MB/sec, the number are the HDDs the "r" is read, "w" is write: chunk RAID 4r 4w 5r 5w 512k 0 400 400 500 500 512k 5 260 300 360 400 512k 6 55 180 100 290 64k 0 400 400 440 500 64k 5 150 300 160 400 64k 6 100 180 140 290 16k 0 380 400 350 500 16k 5 100 300 130 390 16k 6 80 180 100 290 Now, RAID-0/5 seem to perform as expected, depending on the number of HDDs. Expecially with large chunk size. Write performances are not a problem, even if those are CPU intensive, with parity RAID. RAID-0/5 do not react well with small chunk. RAID-6, on the other hand, seems to have an idea of its own. First of all, it does not seem to respect proportionality. I would think a 4 HDDs RAID-6 should more or less read as fast as 2 HDDs. I can understand some loss, due to the parity skip, but not so much. In fact it improves with smaller chunk. With 5 HDDs, I would expect something better than 100MB/sec. Any idea on this? Am I doing something wrong? Some suggestion on tuning something in order to try to improve RAID-6? Thanks, bye, -- piergiorgio ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID-0/5/6 performances 2013-12-05 19:24 RAID-0/5/6 performances Piergiorgio Sartor @ 2013-12-05 21:57 ` NeilBrown 2013-12-05 22:29 ` Piergiorgio Sartor 2013-12-06 22:47 ` Piergiorgio Sartor 2013-12-06 9:24 ` Stan Hoeppner 1 sibling, 2 replies; 7+ messages in thread From: NeilBrown @ 2013-12-05 21:57 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2560 bytes --] On Thu, 5 Dec 2013 20:24:54 +0100 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote: > Hi all, > > I've a system, with an LSI 2308 SAS controller > and 5 2.5" HDD attached. > Each HDD can do around 100MB/sec read/write. > This was tested will all HDDs in parallel, to > make sure the controller can sustain them. > Single disk has same performance. > > I was testing RAID 0/5/6 perfomances and I found > something I could not clearly understand. > > The test was done with "dd", I wanted to know the > maximum possible performance. > Specifically, for reading: > > dd if=/dev/md127 of=/dev/null bs=4k > > For writing: > > dd if=/dev/zero of=/dev/md127 bs=4k conv=fdatasync > > Note than large block size did not change the > results. I guess the page size is quite optimal. > > I tested each RAID with 4 and 5 HDDs, with chunk > size of 512k, 64k and 16k. > The "stripe_cache_size" was set to the max 32768. > > The results were observed with "iostat -k 5", > taking care to consider variations and ramp up. > > The table, with MB/sec, the number are the HDDs > the "r" is read, "w" is write: > > chunk RAID 4r 4w 5r 5w > 512k 0 400 400 500 500 > 512k 5 260 300 360 400 > 512k 6 55 180 100 290 > > 64k 0 400 400 440 500 > 64k 5 150 300 160 400 > 64k 6 100 180 140 290 > > 16k 0 380 400 350 500 > 16k 5 100 300 130 390 > 16k 6 80 180 100 290 > > Now, RAID-0/5 seem to perform as expected, > depending on the number of HDDs. Expecially > with large chunk size. > Write performances are not a problem, even > if those are CPU intensive, with parity RAID. > RAID-0/5 do not react well with small chunk. > RAID-6, on the other hand, seems to have an > idea of its own. > First of all, it does not seem to respect > proportionality. I would think a 4 HDDs > RAID-6 should more or less read as fast as > 2 HDDs. I can understand some loss, due to > the parity skip, but not so much. In fact it > improves with smaller chunk. > With 5 HDDs, I would expect something better > than 100MB/sec. > > Any idea on this? Am I doing something wrong? > Some suggestion on tuning something in order > to try to improve RAID-6? > > Thanks, > > bye, > Does look strange. First thing I would check is the read-ahead size. md sets it for you but might be messing up some how. Have a look at /sys/block/mdX/bdi/read_ahead_kb for each configuration and see if making it some uniform large number has any effect. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID-0/5/6 performances 2013-12-05 21:57 ` NeilBrown @ 2013-12-05 22:29 ` Piergiorgio Sartor 2013-12-06 22:47 ` Piergiorgio Sartor 1 sibling, 0 replies; 7+ messages in thread From: Piergiorgio Sartor @ 2013-12-05 22:29 UTC (permalink / raw) To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid On Fri, Dec 06, 2013 at 08:57:12AM +1100, NeilBrown wrote: > On Thu, 5 Dec 2013 20:24:54 +0100 Piergiorgio Sartor > <piergiorgio.sartor@nexgo.de> wrote: > > > Hi all, > > > > I've a system, with an LSI 2308 SAS controller > > and 5 2.5" HDD attached. > > Each HDD can do around 100MB/sec read/write. > > This was tested will all HDDs in parallel, to > > make sure the controller can sustain them. > > Single disk has same performance. > > > > I was testing RAID 0/5/6 perfomances and I found > > something I could not clearly understand. > > > > The test was done with "dd", I wanted to know the > > maximum possible performance. > > Specifically, for reading: > > > > dd if=/dev/md127 of=/dev/null bs=4k > > > > For writing: > > > > dd if=/dev/zero of=/dev/md127 bs=4k conv=fdatasync > > > > Note than large block size did not change the > > results. I guess the page size is quite optimal. > > > > I tested each RAID with 4 and 5 HDDs, with chunk > > size of 512k, 64k and 16k. > > The "stripe_cache_size" was set to the max 32768. > > > > The results were observed with "iostat -k 5", > > taking care to consider variations and ramp up. > > > > The table, with MB/sec, the number are the HDDs > > the "r" is read, "w" is write: > > > > chunk RAID 4r 4w 5r 5w > > 512k 0 400 400 500 500 > > 512k 5 260 300 360 400 > > 512k 6 55 180 100 290 > > > > 64k 0 400 400 440 500 > > 64k 5 150 300 160 400 > > 64k 6 100 180 140 290 > > > > 16k 0 380 400 350 500 > > 16k 5 100 300 130 390 > > 16k 6 80 180 100 290 > > > > Now, RAID-0/5 seem to perform as expected, > > depending on the number of HDDs. Expecially > > with large chunk size. > > Write performances are not a problem, even > > if those are CPU intensive, with parity RAID. > > RAID-0/5 do not react well with small chunk. > > RAID-6, on the other hand, seems to have an > > idea of its own. > > First of all, it does not seem to respect > > proportionality. I would think a 4 HDDs > > RAID-6 should more or less read as fast as > > 2 HDDs. I can understand some loss, due to > > the parity skip, but not so much. In fact it > > improves with smaller chunk. > > With 5 HDDs, I would expect something better > > than 100MB/sec. > > > > Any idea on this? Am I doing something wrong? > > Some suggestion on tuning something in order > > to try to improve RAID-6? > > > > Thanks, > > > > bye, > > > > Does look strange. > First thing I would check is the read-ahead size. > md sets it for you but might be messing up some how. > Have a look at > /sys/block/mdX/bdi/read_ahead_kb > for each configuration and see if making it some uniform large number has any > effect. Hi Neil, thanks for the hint, I knew I needed _the_ expert :-) Using a chunk of 64k (best of the table above), with 5 HDDs RAID-6, the default read_ahead_kb is 384. I tried to increase it, with following improvement: read_ahead_kb 5r 1024 --> 200MB/sec 4096 --> 300MB/sec 8192 --> 310MB/sec 32768 --> 310MB/sec So, it seems that between 4k and 8k the max is reached, which is somehow what I would expect for a 5 HDDs RAID-6. I'll try (tomorrow) with different chunk to see what changes. In any case, 384 seems a bit too little. Maybe 5 HDDs are not a real RAID-6 use case, I do not know. Thanks again, bye, pg -- piergiorgio ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID-0/5/6 performances 2013-12-05 21:57 ` NeilBrown 2013-12-05 22:29 ` Piergiorgio Sartor @ 2013-12-06 22:47 ` Piergiorgio Sartor 1 sibling, 0 replies; 7+ messages in thread From: Piergiorgio Sartor @ 2013-12-06 22:47 UTC (permalink / raw) To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid On Fri, Dec 06, 2013 at 08:57:12AM +1100, NeilBrown wrote: > On Thu, 5 Dec 2013 20:24:54 +0100 Piergiorgio Sartor > <piergiorgio.sartor@nexgo.de> wrote: > > > Hi all, > > > > I've a system, with an LSI 2308 SAS controller > > and 5 2.5" HDD attached. > > Each HDD can do around 100MB/sec read/write. > > This was tested will all HDDs in parallel, to > > make sure the controller can sustain them. > > Single disk has same performance. > > > > I was testing RAID 0/5/6 perfomances and I found > > something I could not clearly understand. > > > > The test was done with "dd", I wanted to know the > > maximum possible performance. > > Specifically, for reading: > > > > dd if=/dev/md127 of=/dev/null bs=4k > > > > For writing: > > > > dd if=/dev/zero of=/dev/md127 bs=4k conv=fdatasync > > > > Note than large block size did not change the > > results. I guess the page size is quite optimal. > > > > I tested each RAID with 4 and 5 HDDs, with chunk > > size of 512k, 64k and 16k. > > The "stripe_cache_size" was set to the max 32768. > > > > The results were observed with "iostat -k 5", > > taking care to consider variations and ramp up. > > > > The table, with MB/sec, the number are the HDDs > > the "r" is read, "w" is write: > > > > chunk RAID 4r 4w 5r 5w > > 512k 0 400 400 500 500 > > 512k 5 260 300 360 400 > > 512k 6 55 180 100 290 > > > > 64k 0 400 400 440 500 > > 64k 5 150 300 160 400 > > 64k 6 100 180 140 290 > > > > 16k 0 380 400 350 500 > > 16k 5 100 300 130 390 > > 16k 6 80 180 100 290 > > > > Now, RAID-0/5 seem to perform as expected, > > depending on the number of HDDs. Expecially > > with large chunk size. > > Write performances are not a problem, even > > if those are CPU intensive, with parity RAID. > > RAID-0/5 do not react well with small chunk. > > RAID-6, on the other hand, seems to have an > > idea of its own. > > First of all, it does not seem to respect > > proportionality. I would think a 4 HDDs > > RAID-6 should more or less read as fast as > > 2 HDDs. I can understand some loss, due to > > the parity skip, but not so much. In fact it > > improves with smaller chunk. > > With 5 HDDs, I would expect something better > > than 100MB/sec. > > > > Any idea on this? Am I doing something wrong? > > Some suggestion on tuning something in order > > to try to improve RAID-6? > > > > Thanks, > > > > bye, > > > > Does look strange. > First thing I would check is the read-ahead size. > md sets it for you but might be messing up some how. > Have a look at > /sys/block/mdX/bdi/read_ahead_kb > for each configuration and see if making it some uniform large number has any > effect. Hi again Neil, I tested some "read_ahead_kb" configuration, with RAID-6, 4 and 5 HDDs and 512k chunk size. Increasing the value to very large numbers, like 65536 or 131072 did indeed improved read performances. I tested from 4096 to 131072 doubling the value at each run. So, for 4 HDDs I got around 150MB/sec and for 5 HDDs around 190 MB/sec. This is better than the 55 and 100 I got before, but still below the expected 200 and 300 I get with chunk size 64k. Anyhow, I guess the read-ahead tuning did the trick. Thanks againg, bye, -- piergiorgio ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID-0/5/6 performances 2013-12-05 19:24 RAID-0/5/6 performances Piergiorgio Sartor 2013-12-05 21:57 ` NeilBrown @ 2013-12-06 9:24 ` Stan Hoeppner 2013-12-06 18:13 ` Piergiorgio Sartor 1 sibling, 1 reply; 7+ messages in thread From: Stan Hoeppner @ 2013-12-06 9:24 UTC (permalink / raw) To: Piergiorgio Sartor, linux-raid On 12/5/2013 1:24 PM, Piergiorgio Sartor wrote: > The "stripe_cache_size" was set to the max 32768. You don't want to set this so high. Doing this will: 1. Usually decrease throughput 2. Eat a huge amount of memory. With 5 drives: ((32768*4096)*5)/1048576 = 640 MB RAM consumed for the stripe buffer For 5 or fewer pieces of spinning rust a value of 2048 or less should be sufficient. Test 512, 1024, 2048, 4096, and 8192. You should see your throughput go up and then back down. Find the sweet spot and use that value. If two of these yield throughput within 5% of one another, use the lower value as it eats less RAM. -- Stan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID-0/5/6 performances 2013-12-06 9:24 ` Stan Hoeppner @ 2013-12-06 18:13 ` Piergiorgio Sartor 2013-12-06 23:29 ` Stan Hoeppner 0 siblings, 1 reply; 7+ messages in thread From: Piergiorgio Sartor @ 2013-12-06 18:13 UTC (permalink / raw) To: Stan Hoeppner; +Cc: Piergiorgio Sartor, linux-raid On Fri, Dec 06, 2013 at 03:24:18AM -0600, Stan Hoeppner wrote: > On 12/5/2013 1:24 PM, Piergiorgio Sartor wrote: > > > The "stripe_cache_size" was set to the max 32768. > > You don't want to set this so high. Doing this will: > > 1. Usually decrease throughput > 2. Eat a huge amount of memory. With 5 drives: > > ((32768*4096)*5)/1048576 = 640 MB RAM consumed for the stripe buffer > > For 5 or fewer pieces of spinning rust a value of 2048 or less should be > sufficient. Test 512, 1024, 2048, 4096, and 8192. You should see your > throughput go up and then back down. Find the sweet spot and use that > value. If two of these yield throughput within 5% of one another, use > the lower value as it eats less RAM. Hi Stan, thanks for the reply, I was looking forward to it, since you always provide useful information. I checked two systems, one, different, with RAID-5, the other the actual RAID-6. In the first one, 2048 seems to be the best stripe cache size, while more results in slower writing speed, albeit not too much. For the RAID-6, it seems 32768 is the best value. There is one difference, the RAID-5 has chunk size of 512k (default), while the RAID-6 has still the 64k. BTW, why is that? I mean why large stripe cache results in lower writing speed? Thanks, bye, -- piergiorgio ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID-0/5/6 performances 2013-12-06 18:13 ` Piergiorgio Sartor @ 2013-12-06 23:29 ` Stan Hoeppner 0 siblings, 0 replies; 7+ messages in thread From: Stan Hoeppner @ 2013-12-06 23:29 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: linux-raid On 12/6/2013 12:13 PM, Piergiorgio Sartor wrote: > On Fri, Dec 06, 2013 at 03:24:18AM -0600, Stan Hoeppner wrote: >> On 12/5/2013 1:24 PM, Piergiorgio Sartor wrote: >> >>> The "stripe_cache_size" was set to the max 32768. >> >> You don't want to set this so high. Doing this will: >> >> 1. Usually decrease throughput >> 2. Eat a huge amount of memory. With 5 drives: >> >> ((32768*4096)*5)/1048576 = 640 MB RAM consumed for the stripe buffer >> >> For 5 or fewer pieces of spinning rust a value of 2048 or less should be >> sufficient. Test 512, 1024, 2048, 4096, and 8192. You should see your >> throughput go up and then back down. Find the sweet spot and use that >> value. If two of these yield throughput within 5% of one another, use >> the lower value as it eats less RAM. > > Hi Stan, > > thanks for the reply, I was looking forward to it, > since you always provide useful information. > > I checked two systems, one, different, with RAID-5, > the other the actual RAID-6. > > In the first one, 2048 seems to be the best stripe > cache size, while more results in slower writing > speed, albeit not too much. > > For the RAID-6, it seems 32768 is the best value. > > There is one difference, the RAID-5 has chunk size > of 512k (default), while the RAID-6 has still the 64k. > > BTW, why is that? I mean why large stripe cache > results in lower writing speed? I don't have the answer to this question. It has been asked before. I can only speculate that the larger cache table introduces overhead of some kind. You may want to ask Neil directly. Note that you're using dd for testing this. dd produces single stream serial IO. If you test other IO patterns, such as parallel or asynchronous, with software such as FIO, the results may be a bit different. -- Stan ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-12-06 23:29 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-05 19:24 RAID-0/5/6 performances Piergiorgio Sartor 2013-12-05 21:57 ` NeilBrown 2013-12-05 22:29 ` Piergiorgio Sartor 2013-12-06 22:47 ` Piergiorgio Sartor 2013-12-06 9:24 ` Stan Hoeppner 2013-12-06 18:13 ` Piergiorgio Sartor 2013-12-06 23:29 ` Stan Hoeppner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).