* RAID 5 write performance advice @ 2005-08-24 8:24 Mirko Benz 2005-08-24 12:46 ` Ming Zhang 0 siblings, 1 reply; 14+ messages in thread From: Mirko Benz @ 2005-08-24 8:24 UTC (permalink / raw) To: linux-raid Hello, We have recently tested Linux 2.6.12 SW RAID versus HW Raid. For SW Raid we used Linux 2.6.12 with 8 Seagate SATA NCQ disks no spare on a Dual Xeon platform. For HW Raid we used a Arc-1120 SATA Raid controller and a Fibre Channel Raid System (Dual 2 Gb, Infortrend). READ SW:877 ARC:693 IFT:366 (MB/s @64k BS using disktest with raw device) Read SW Raid performance is better than HW Raid. The FC RAID is limited by the interface. WRITE SW:140 ARC:371 IFT:352 For SW RAID 5 we needed to adjust the scheduling policy. By default we got only 60 MB/s. SW RAID 0 write performance @64k is 522 MB/s. Based on the performance numbers it looks like Linux SW RAID reads every data element of a stripe + parity in parallel, performs xor operations and than writes the data back to disk in parallel. The HW Raid controllers seem to be a bit smarter in this regard. When they encounter a large write with enough data for a full stripe they seem to spare the read and perform only the xor + write in parallel. Hence no seek is required and in can be closer to RAID0 write performance. We have an application were large amounts of data need to be sequentially written to disk (e.g. 100 MB at once). The storage system has an USV so write caching can be utilized. I would like to have an advice if write performance similar to HW Raid controllers is possible with Linux or if there is something else that we could apply. Thanks in advance, Mirko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-24 8:24 RAID 5 write performance advice Mirko Benz @ 2005-08-24 12:46 ` Ming Zhang 2005-08-24 13:43 ` Mirko Benz 0 siblings, 1 reply; 14+ messages in thread From: Ming Zhang @ 2005-08-24 12:46 UTC (permalink / raw) To: Mirko Benz; +Cc: Linux RAID On Wed, 2005-08-24 at 10:24 +0200, Mirko Benz wrote: > Hello, > > We have recently tested Linux 2.6.12 SW RAID versus HW Raid. For SW Raid > we used Linux 2.6.12 with 8 Seagate SATA NCQ disks no spare on a Dual > Xeon platform. For HW Raid we used a Arc-1120 SATA Raid controller and a > Fibre Channel Raid System (Dual 2 Gb, Infortrend). > > READ SW:877 ARC:693 IFT:366 > (MB/s @64k BS using disktest with raw device) > > Read SW Raid performance is better than HW Raid. The FC RAID is limited > by the interface. > > WRITE SW:140 ARC:371 IFT:352 > > For SW RAID 5 we needed to adjust the scheduling policy. By default we > got only 60 MB/s. SW RAID 0 write performance @64k is 522 MB/s. how u test and get these number? what is u raid5 configuration? chunk size? > > Based on the performance numbers it looks like Linux SW RAID reads every > data element of a stripe + parity in parallel, performs xor operations > and than writes the data back to disk in parallel. > > The HW Raid controllers seem to be a bit smarter in this regard. When > they encounter a large write with enough data for a full stripe they > seem to spare the read and perform only the xor + write in parallel. > Hence no seek is required and in can be closer to RAID0 write performance. this is stripe write and linux MD have this. > > We have an application were large amounts of data need to be > sequentially written to disk (e.g. 100 MB at once). The storage system > has an USV so write caching can be utilized. > > I would like to have an advice if write performance similar to HW Raid > controllers is possible with Linux or if there is something else that we > could apply. > > Thanks in advance, > Mirko > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-24 12:46 ` Ming Zhang @ 2005-08-24 13:43 ` Mirko Benz 2005-08-24 13:49 ` Ming Zhang 2005-08-24 21:32 ` Neil Brown 0 siblings, 2 replies; 14+ messages in thread From: Mirko Benz @ 2005-08-24 13:43 UTC (permalink / raw) To: mingz; +Cc: Linux RAID Hello, The RAID5 configuration is: 8 SATA disks, 8 port Marvel SATA PCI-X controller chip (SuperMicro board), Dual Xeon, 1 GB RAM, stripe size 64K, no spare disk. Measurements are performed to the ram md device with: disktest -PT -T30 -h1 -K8 -B65536 -ID /dev/md0 using the default stripe size (64K). 128K stripe size does not make a real difference. We have also increased the RAID 5 stripe cache by setting NR_STRIPES to a larger value but without any perceptible effect. If Linux uses "stripe write" why is it so much slower than HW Raid? Is it disabled by default? 8 disks: 7 data disks + parity @ 64k stripe size = 448k data per stripe The request size was smaller (tested up to 256K) than the size of a stripe. We have seen errors for larger request sizes (e.g. 1 MB). Does Linux require the request size to be larger than a stripe to take advantage of "stripe write"? Regards, Mirko Ming Zhang schrieb: >On Wed, 2005-08-24 at 10:24 +0200, Mirko Benz wrote: > > >>Hello, >> >>We have recently tested Linux 2.6.12 SW RAID versus HW Raid. For SW Raid >>we used Linux 2.6.12 with 8 Seagate SATA NCQ disks no spare on a Dual >>Xeon platform. For HW Raid we used a Arc-1120 SATA Raid controller and a >>Fibre Channel Raid System (Dual 2 Gb, Infortrend). >> >>READ SW:877 ARC:693 IFT:366 >>(MB/s @64k BS using disktest with raw device) >> >>Read SW Raid performance is better than HW Raid. The FC RAID is limited >>by the interface. >> >>WRITE SW:140 ARC:371 IFT:352 >> >>For SW RAID 5 we needed to adjust the scheduling policy. By default we >>got only 60 MB/s. SW RAID 0 write performance @64k is 522 MB/s. >> >> >how u test and get these number? > >what is u raid5 configuration? chunk size? > > > >>Based on the performance numbers it looks like Linux SW RAID reads every >>data element of a stripe + parity in parallel, performs xor operations >>and than writes the data back to disk in parallel. >> >>The HW Raid controllers seem to be a bit smarter in this regard. When >>they encounter a large write with enough data for a full stripe they >>seem to spare the read and perform only the xor + write in parallel. >>Hence no seek is required and in can be closer to RAID0 write performance. >> >> >this is stripe write and linux MD have this. > > > > > >>We have an application were large amounts of data need to be >>sequentially written to disk (e.g. 100 MB at once). The storage system >>has an USV so write caching can be utilized. >> >>I would like to have an advice if write performance similar to HW Raid >>controllers is possible with Linux or if there is something else that we >>could apply. >> >>Thanks in advance, >>Mirko >> >>- >>To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-24 13:43 ` Mirko Benz @ 2005-08-24 13:49 ` Ming Zhang 2005-08-24 21:32 ` Neil Brown 1 sibling, 0 replies; 14+ messages in thread From: Ming Zhang @ 2005-08-24 13:49 UTC (permalink / raw) To: Mirko Benz; +Cc: Linux RAID On Wed, 2005-08-24 at 15:43 +0200, Mirko Benz wrote: > Hello, > > The RAID5 configuration is: 8 SATA disks, 8 port Marvel SATA PCI-X > controller chip (SuperMicro board), Dual Xeon, 1 GB RAM, stripe size > 64K, no spare disk. u have good luck on this Marvel SATA. When I use it with supermicro, I got very bad performance even with RAID0. > > Measurements are performed to the ram md device with: > disktest -PT -T30 -h1 -K8 -B65536 -ID /dev/md0 > using the default stripe size (64K). 128K stripe size does not make a > real difference. > when u run this command, also run "iostat 1" at another console and see how many read and write. also run "time dd if=/dev/zero of=/dev/mdX bs=1M" and see speed u get and see what iostat tell u. > We have also increased the RAID 5 stripe cache by setting NR_STRIPES to > a larger value but without any perceptible effect. > > If Linux uses "stripe write" why is it so much slower than HW Raid? Is > it disabled by default? > > 8 disks: 7 data disks + parity @ 64k stripe size = 448k data per stripe > The request size was smaller (tested up to 256K) than the size of a stripe. > We have seen errors for larger request sizes (e.g. 1 MB). Does Linux > require the request size to be larger than a stripe to take advantage > of "stripe write"? > > Regards, > Mirko > > Ming Zhang schrieb: > > >On Wed, 2005-08-24 at 10:24 +0200, Mirko Benz wrote: > > > > > >>Hello, > >> > >>We have recently tested Linux 2.6.12 SW RAID versus HW Raid. For SW Raid > >>we used Linux 2.6.12 with 8 Seagate SATA NCQ disks no spare on a Dual > >>Xeon platform. For HW Raid we used a Arc-1120 SATA Raid controller and a > >>Fibre Channel Raid System (Dual 2 Gb, Infortrend). > >> > >>READ SW:877 ARC:693 IFT:366 > >>(MB/s @64k BS using disktest with raw device) > >> > >>Read SW Raid performance is better than HW Raid. The FC RAID is limited > >>by the interface. > >> > >>WRITE SW:140 ARC:371 IFT:352 > >> > >>For SW RAID 5 we needed to adjust the scheduling policy. By default we > >>got only 60 MB/s. SW RAID 0 write performance @64k is 522 MB/s. > >> > >> > >how u test and get these number? > > > >what is u raid5 configuration? chunk size? > > > > > > > >>Based on the performance numbers it looks like Linux SW RAID reads every > >>data element of a stripe + parity in parallel, performs xor operations > >>and than writes the data back to disk in parallel. > >> > >>The HW Raid controllers seem to be a bit smarter in this regard. When > >>they encounter a large write with enough data for a full stripe they > >>seem to spare the read and perform only the xor + write in parallel. > >>Hence no seek is required and in can be closer to RAID0 write performance. > >> > >> > >this is stripe write and linux MD have this. > > > > > > > > > > > >>We have an application were large amounts of data need to be > >>sequentially written to disk (e.g. 100 MB at once). The storage system > >>has an USV so write caching can be utilized. > >> > >>I would like to have an advice if write performance similar to HW Raid > >>controllers is possible with Linux or if there is something else that we > >>could apply. > >> > >>Thanks in advance, > >>Mirko > >> > >>- > >>To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >>the body of a message to majordomo@vger.kernel.org > >>More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > > > > > > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-24 13:43 ` Mirko Benz 2005-08-24 13:49 ` Ming Zhang @ 2005-08-24 21:32 ` Neil Brown 2005-08-25 16:38 ` Mirko Benz 1 sibling, 1 reply; 14+ messages in thread From: Neil Brown @ 2005-08-24 21:32 UTC (permalink / raw) To: Mirko Benz; +Cc: mingz, Linux RAID On Wednesday August 24, mirko.benz@web.de wrote: > Hello, > > The RAID5 configuration is: 8 SATA disks, 8 port Marvel SATA PCI-X > controller chip (SuperMicro board), Dual Xeon, 1 GB RAM, stripe size > 64K, no spare disk. > > Measurements are performed to the ram md device with: > disktest -PT -T30 -h1 -K8 -B65536 -ID /dev/md0 > using the default stripe size (64K). 128K stripe size does not make a > real difference. May I suggest you try creating a filesystem on the device and doing tests in the filesystem? I have found the raw device slower that filesystem access before, and a quick test shows writing to the filesystem (ext3) is about 4 times as fast as writing to /dev/md1 on a 6 drive raid5 array. > > If Linux uses "stripe write" why is it so much slower than HW Raid? Is > it disabled by default? No, it is never disabled. However it can only work if raid5 gets a full stripe of data before being asked to flush that data. Writing to /dev/md0 directly may cause flushes too often. Does your application actually require writing to the raw device, or will you be using a filesystem? NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-24 21:32 ` Neil Brown @ 2005-08-25 16:38 ` Mirko Benz 2005-08-25 16:54 ` Ming Zhang 2005-09-01 19:44 ` djani22 0 siblings, 2 replies; 14+ messages in thread From: Mirko Benz @ 2005-08-25 16:38 UTC (permalink / raw) To: Neil Brown; +Cc: mingz, Linux RAID Hello, We intend to export a lvm/md volume via iSCSI or SRP using InfiniBand to remote clients. There is no local file system processing on the storage platform. The clients may have a variety of file systems including ext3, GFS. Single disk write performance is: 58,5 MB/s. With large sequential write operations I would expect something like 90% of n-1 * single_disk_performance if stripe write can be utilized. So roughly 400 MB/s – which the HW RAID devices achieve. RAID setup: Personalities : [raid0] [raid5] md0 : active raid5 sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0] 1094035712 blocks level 5, 64k chunk, algorithm 2 [8/8] [UUUUUUUU] We have assigned the deadline scheduler to every disk in the RAID. The default scheduler gives much lower results. *** dd TEST *** time dd if=/dev/zero of=/dev/md0 bs=1M 5329911808 bytes transferred in 28,086199 seconds (189769779 bytes/sec) iostat 5 output: avg-cpu: %user %nice %sys %iowait %idle 0,10 0,00 87,80 7,30 4,80 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda 0,00 0,00 0,00 0 0 sda 0,00 0,00 0,00 0 0 sdb 1976,10 1576,10 53150,60 7912 266816 sdc 2072,31 1478,88 53150,60 7424 266816 sdd 2034,06 1525,10 53150,60 7656 266816 sde 1988,05 1439,04 53147,41 7224 266800 sdf 1975,10 1499,60 53147,41 7528 266800 sdg 1383,07 1485,26 53145,82 7456 266792 sdh 1562,55 1311,55 53145,82 6584 266792 sdi 1586,85 1295,62 53145,82 6504 266792 sdj 0,00 0,00 0,00 0 0 sdk 0,00 0,00 0,00 0 0 sdl 0,00 0,00 0,00 0 0 sdm 0,00 0,00 0,00 0 0 sdn 0,00 0,00 0,00 0 0 md0 46515,54 0,00 372124,30 0 1868064 Comments: Large write should not see any read operations. But there are some??? *** disktest *** disktest -w -PT -T30 -h1 -K8 -B512k -ID /dev/md0 | 2005/08/25-17:27:04 | STAT | 4072 | v1.1.12 | /dev/md0 | Write throughput: 160152507.7B/s (152.73MB/s), IOPS 305.7/s. | 2005/08/25-17:27:05 | STAT | 4072 | v1.1.12 | /dev/md0 | Write throughput: 160694272.0B/s (153.25MB/s), IOPS 306.6/s. | 2005/08/25-17:27:06 | STAT | 4072 | v1.1.12 | /dev/md0 | Write throughput: 160339606.6B/s (152.91MB/s), IOPS 305.8/s. iostat 5 output: avg-cpu: %user %nice %sys %iowait %idle 38,96 0,00 50,25 5,29 5,49 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda 0,00 0,00 0,00 0 0 sda 1,20 0,00 11,18 0 56 sdb 986,43 0,00 39702,99 0 198912 sdc 922,75 0,00 39728,54 0 199040 sdd 895,81 0,00 39728,54 0 199040 sde 880,84 0,00 39728,54 0 199040 sdf 839,92 0,00 39728,54 0 199040 sdg 842,91 0,00 39728,54 0 199040 sdh 1557,49 0,00 79431,54 0 397952 sdi 2246,71 0,00 104411,98 0 523104 sdj 0,00 0,00 0,00 0 0 sdk 0,00 0,00 0,00 0 0 sdl 0,00 0,00 0,00 0 0 sdm 0,00 0,00 0,00 0 0 sdn 0,00 0,00 0,00 0 0 md0 1550,70 0,00 317574,45 0 1591048 Comments: Zero read requests – as it should be. But the write requests are not proportional. sdh and sdi have significantly more requests??? The write requests to the disks of the RAID should be 1/7 higher than to the md device. But there are significantly more write operations. All these operations are to the raw device. Setting up a ext3 fs we get around 127 MB/s with dd. Any idea? --Mirko - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-25 16:38 ` Mirko Benz @ 2005-08-25 16:54 ` Ming Zhang 2005-08-26 7:51 ` Mirko Benz 2005-09-01 19:44 ` djani22 1 sibling, 1 reply; 14+ messages in thread From: Ming Zhang @ 2005-08-25 16:54 UTC (permalink / raw) To: Mirko Benz; +Cc: Neil Brown, Linux RAID On Thu, 2005-08-25 at 18:38 +0200, Mirko Benz wrote: > Hello, > > We intend to export a lvm/md volume via iSCSI or SRP using InfiniBand to > remote clients. There is no local file system processing on the storage > platform. The clients may have a variety of file systems including ext3, > GFS. > > Single disk write performance is: 58,5 MB/s. With large sequential write > operations I would expect something like 90% of n-1 * > single_disk_performance if stripe write can be utilized. So roughly 400 > MB/s – which the HW RAID devices achieve. change to RAID0 and test to see if u controller will be a bottleneck. > > RAID setup: > Personalities : [raid0] [raid5] > md0 : active raid5 sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0] > 1094035712 blocks level 5, 64k chunk, algorithm 2 [8/8] [UUUUUUUU] > > We have assigned the deadline scheduler to every disk in the RAID. The > default scheduler gives much lower results. > > *** dd TEST *** > > time dd if=/dev/zero of=/dev/md0 bs=1M > 5329911808 bytes transferred in 28,086199 seconds (189769779 bytes/sec) > > iostat 5 output: > avg-cpu: %user %nice %sys %iowait %idle > 0,10 0,00 87,80 7,30 4,80 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > hda 0,00 0,00 0,00 0 0 > sda 0,00 0,00 0,00 0 0 > sdb 1976,10 1576,10 53150,60 7912 266816 > sdc 2072,31 1478,88 53150,60 7424 266816 > sdd 2034,06 1525,10 53150,60 7656 266816 > sde 1988,05 1439,04 53147,41 7224 266800 > sdf 1975,10 1499,60 53147,41 7528 266800 > sdg 1383,07 1485,26 53145,82 7456 266792 > sdh 1562,55 1311,55 53145,82 6584 266792 > sdi 1586,85 1295,62 53145,82 6504 266792 > sdj 0,00 0,00 0,00 0 0 > sdk 0,00 0,00 0,00 0 0 > sdl 0,00 0,00 0,00 0 0 > sdm 0,00 0,00 0,00 0 0 > sdn 0,00 0,00 0,00 0 0 > md0 46515,54 0,00 372124,30 0 1868064 > > Comments: Large write should not see any read operations. But there are > some??? i always saw those small number reads and i feel it is reasonable since u stripe is 7 * 64KB > > > *** disktest *** > > disktest -w -PT -T30 -h1 -K8 -B512k -ID /dev/md0 > > | 2005/08/25-17:27:04 | STAT | 4072 | v1.1.12 | /dev/md0 | Write > throughput: 160152507.7B/s (152.73MB/s), IOPS 305.7/s. > | 2005/08/25-17:27:05 | STAT | 4072 | v1.1.12 | /dev/md0 | Write > throughput: 160694272.0B/s (153.25MB/s), IOPS 306.6/s. > | 2005/08/25-17:27:06 | STAT | 4072 | v1.1.12 | /dev/md0 | Write > throughput: 160339606.6B/s (152.91MB/s), IOPS 305.8/s. so here 152/7 = 21, large than what u sdc sdd got. > > iostat 5 output: > avg-cpu: %user %nice %sys %iowait %idle > 38,96 0,00 50,25 5,29 5,49 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > hda 0,00 0,00 0,00 0 0 > sda 1,20 0,00 11,18 0 56 > sdb 986,43 0,00 39702,99 0 198912 > sdc 922,75 0,00 39728,54 0 199040 > sdd 895,81 0,00 39728,54 0 199040 > sde 880,84 0,00 39728,54 0 199040 > sdf 839,92 0,00 39728,54 0 199040 > sdg 842,91 0,00 39728,54 0 199040 > sdh 1557,49 0,00 79431,54 0 397952 > sdi 2246,71 0,00 104411,98 0 523104 > sdj 0,00 0,00 0,00 0 0 > sdk 0,00 0,00 0,00 0 0 > sdl 0,00 0,00 0,00 0 0 > sdm 0,00 0,00 0,00 0 0 > sdn 0,00 0,00 0,00 0 0 > md0 1550,70 0,00 317574,45 0 1591048 > > Comments: > Zero read requests – as it should be. But the write requests are not > proportional. sdh and sdi have significantly more requests??? yes, interesting. > The write requests to the disks of the RAID should be 1/7 higher than to > the md device. > But there are significantly more write operations. > > All these operations are to the raw device. Setting up a ext3 fs we get > around 127 MB/s with dd. > > Any idea? > > --Mirko > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-25 16:54 ` Ming Zhang @ 2005-08-26 7:51 ` Mirko Benz 2005-08-26 14:26 ` Ming Zhang 2005-08-26 14:30 ` Ming Zhang 0 siblings, 2 replies; 14+ messages in thread From: Mirko Benz @ 2005-08-26 7:51 UTC (permalink / raw) To: mingz; +Cc: Neil Brown, Linux RAID Hello, We have created a RAID 0 for the same environment: Personalities : [raid0] [raid5] md0 : active raid0 sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0] 1250326528 blocks 64k chunks *** dd TEST *** time dd if=/dev/zero of=/dev/md0 bs=1M 14967373824 bytes transferred in 32,060497 seconds (466847843 bytes/sec) iostat 5 output: avg-cpu: %user %nice %sys %iowait %idle 0,00 0,00 89,60 9,50 0,90 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda 0,00 0,00 0,00 0 0 sda 0,00 0,00 0,00 0 0 sdb 455,31 0,00 116559,52 0 581632 sdc 455,51 0,00 116540,28 0 581536 sdd 450,10 0,00 116545,09 0 581560 sde 454,11 0,00 116559,52 0 581632 sdf 452,30 0,00 116559,52 0 581632 sdg 454,71 0,00 116553,11 0 581600 sdh 453,31 0,00 116533,87 0 581504 sdi 453,91 0,00 116556,31 0 581616 sdj 0,00 0,00 0,00 0 0 sdk 0,00 0,00 0,00 0 0 sdl 0,00 0,00 0,00 0 0 sdm 0,00 0,00 0,00 0 0 sdn 0,00 0,00 0,00 0 0 md0 116556,11 0,00 932448,90 0 4652920 Comments: 466 MB / 8 = 58,25 MB/s which is about the same as a dd to a single disk (58,5 MB/s). So the controller + I/O subsystem is not the bottleneck. Regards, Mirko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-26 7:51 ` Mirko Benz @ 2005-08-26 14:26 ` Ming Zhang 2005-08-26 14:30 ` Ming Zhang 1 sibling, 0 replies; 14+ messages in thread From: Ming Zhang @ 2005-08-26 14:26 UTC (permalink / raw) To: Mirko Benz; +Cc: Neil Brown, Linux RAID seems i need to change the mail thread to "why my RAID0 write speed so slow!" :P i also use a Marvell 8 port PCI-X card. 8 SATA DISK RAID0, each single disk can give me around 55MB/s, but the RAID0 can only give me 203MB/s. I tried different io scheduler, all lead to same write speed at my side. 02:01.0 SCSI storage controller: Marvell MV88SX5081 8-port SATA I PCI-X Controller (rev 03) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32, Cache Line Size 08 Interrupt: pin A routed to IRQ 24 Region 0: Memory at fa000000 (64-bit, non-prefetchable) Capabilities: [40] Power Management version 2 Flags: PMEClk+ DSI- D1- D2- AuxCurrent=0mA PME (D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [60] PCI-X non-bridge device. Command: DPERE- ERO- RBC=0 OST=3 Status: Bus=2 Dev=1 Func=0 64bit+ 133MHz+ SCD- USC-, DC=simple, DMMRBC=0, DMOST=3, DMCRS=0, RSCEM- On Fri, 2005-08-26 at 09:51 +0200, Mirko Benz wrote: > Hello, > > We have created a RAID 0 for the same environment: > Personalities : [raid0] [raid5] > md0 : active raid0 sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0] > 1250326528 blocks 64k chunks > Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10] [faulty] md0 : active raid0 sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1] sda [0] 3125690368 blocks 64k chunks SCSI disks: 400GB SATA host: scsi11 Channel: 00 Id: 00 Lun: 00 Vendor: Hitachi Model: HDS724040KLSA80 Rev: KFAO Type: Direct-Access ANSI SCSI revision: 03 > *** dd TEST *** > > time dd if=/dev/zero of=/dev/md0 bs=1M > 14967373824 bytes transferred in 32,060497 seconds (466847843 bytes/sec) > > iostat 5 output: > avg-cpu: %user %nice %sys %iowait %idle > 0,00 0,00 89,60 9,50 0,90 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > hda 0,00 0,00 0,00 0 0 > sda 0,00 0,00 0,00 0 0 > sdb 455,31 0,00 116559,52 0 581632 > sdc 455,51 0,00 116540,28 0 581536 > sdd 450,10 0,00 116545,09 0 581560 > sde 454,11 0,00 116559,52 0 581632 > sdf 452,30 0,00 116559,52 0 581632 > sdg 454,71 0,00 116553,11 0 581600 > sdh 453,31 0,00 116533,87 0 581504 > sdi 453,91 0,00 116556,31 0 581616 > sdj 0,00 0,00 0,00 0 0 > sdk 0,00 0,00 0,00 0 0 > sdl 0,00 0,00 0,00 0 0 > sdm 0,00 0,00 0,00 0 0 > sdn 0,00 0,00 0,00 0 0 > md0 116556,11 0,00 932448,90 0 4652920 > > Comments: 466 MB / 8 = 58,25 MB/s which is about the same as a dd to a > single disk (58,5 MB/s). So the controller + I/O subsystem is not the > bottleneck. > > Regards, > Mirko > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-26 7:51 ` Mirko Benz 2005-08-26 14:26 ` Ming Zhang @ 2005-08-26 14:30 ` Ming Zhang 2005-08-26 15:29 ` Mirko Benz 1 sibling, 1 reply; 14+ messages in thread From: Ming Zhang @ 2005-08-26 14:30 UTC (permalink / raw) To: Mirko Benz; +Cc: Neil Brown, Linux RAID i would like to suggest u to do a 4+1 raid5 configuration and see what happen. Ming On Fri, 2005-08-26 at 09:51 +0200, Mirko Benz wrote: > Hello, > > We have created a RAID 0 for the same environment: > Personalities : [raid0] [raid5] > md0 : active raid0 sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0] > 1250326528 blocks 64k chunks > > *** dd TEST *** > > time dd if=/dev/zero of=/dev/md0 bs=1M > 14967373824 bytes transferred in 32,060497 seconds (466847843 bytes/sec) > > iostat 5 output: > avg-cpu: %user %nice %sys %iowait %idle > 0,00 0,00 89,60 9,50 0,90 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > hda 0,00 0,00 0,00 0 0 > sda 0,00 0,00 0,00 0 0 > sdb 455,31 0,00 116559,52 0 581632 > sdc 455,51 0,00 116540,28 0 581536 > sdd 450,10 0,00 116545,09 0 581560 > sde 454,11 0,00 116559,52 0 581632 > sdf 452,30 0,00 116559,52 0 581632 > sdg 454,71 0,00 116553,11 0 581600 > sdh 453,31 0,00 116533,87 0 581504 > sdi 453,91 0,00 116556,31 0 581616 > sdj 0,00 0,00 0,00 0 0 > sdk 0,00 0,00 0,00 0 0 > sdl 0,00 0,00 0,00 0 0 > sdm 0,00 0,00 0,00 0 0 > sdn 0,00 0,00 0,00 0 0 > md0 116556,11 0,00 932448,90 0 4652920 > > Comments: 466 MB / 8 = 58,25 MB/s which is about the same as a dd to a > single disk (58,5 MB/s). So the controller + I/O subsystem is not the > bottleneck. > > Regards, > Mirko > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-26 14:30 ` Ming Zhang @ 2005-08-26 15:29 ` Mirko Benz 2005-08-26 17:05 ` Ming Zhang 2005-08-28 23:28 ` Neil Brown 0 siblings, 2 replies; 14+ messages in thread From: Mirko Benz @ 2005-08-26 15:29 UTC (permalink / raw) To: mingz; +Cc: Neil Brown, Linux RAID Hello, Here are the results using 5 disks for RAID 5 – basically the same results but with lower values. Again, much slower than it could be. *** dd TEST *** time dd if=/dev/zero of=/dev/md0 bs=1M 2819620864 bytes transferred in 23,410720 seconds (120441442 bytes/sec) iostat 5 output: avg-cpu: %user %nice %sys %iowait %idle 0,10 0,00 55,40 36,20 8,30 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda 0,00 0,00 0,00 0 0 sda 0,00 0,00 0,00 0 0 sdb 345,09 1999,20 71501,40 9976 356792 sdc 348,10 2412,83 71434,07 12040 356456 sdd 356,31 2460,92 71748,30 12280 358024 sde 351,50 2456,11 71058,92 12256 354584 sdf 348,10 2008,82 70935,47 10024 353968 sdg 0,00 0,00 0,00 0 0 sdh 0,00 0,00 0,00 0 0 sdi 0,00 0,00 0,00 0 0 sdj 0,00 0,00 0,00 0 0 sdk 0,00 0,00 0,00 0 0 sdl 0,00 0,00 0,00 0 0 sdm 0,00 0,00 0,00 0 0 sdn 0,00 0,00 0,00 0 0 md0 35226,65 0,00 281813,23 0 1406248 disktest gives 99 MB/s but shows the same behaviour (unbalanced usage of disks). Regards, Mirko Ming Zhang schrieb: >i would like to suggest u to do a 4+1 raid5 configuration and see what >happen. > >Ming > >On Fri, 2005-08-26 at 09:51 +0200, Mirko Benz wrote: > > >>Hello, >> >>We have created a RAID 0 for the same environment: >>Personalities : [raid0] [raid5] >>md0 : active raid0 sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0] >> 1250326528 blocks 64k chunks >> >>*** dd TEST *** >> >>time dd if=/dev/zero of=/dev/md0 bs=1M >>14967373824 bytes transferred in 32,060497 seconds (466847843 bytes/sec) >> >>iostat 5 output: >>avg-cpu: %user %nice %sys %iowait %idle >> 0,00 0,00 89,60 9,50 0,90 >> >>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>hda 0,00 0,00 0,00 0 0 >>sda 0,00 0,00 0,00 0 0 >>sdb 455,31 0,00 116559,52 0 581632 >>sdc 455,51 0,00 116540,28 0 581536 >>sdd 450,10 0,00 116545,09 0 581560 >>sde 454,11 0,00 116559,52 0 581632 >>sdf 452,30 0,00 116559,52 0 581632 >>sdg 454,71 0,00 116553,11 0 581600 >>sdh 453,31 0,00 116533,87 0 581504 >>sdi 453,91 0,00 116556,31 0 581616 >>sdj 0,00 0,00 0,00 0 0 >>sdk 0,00 0,00 0,00 0 0 >>sdl 0,00 0,00 0,00 0 0 >>sdm 0,00 0,00 0,00 0 0 >>sdn 0,00 0,00 0,00 0 0 >>md0 116556,11 0,00 932448,90 0 4652920 >> >>Comments: 466 MB / 8 = 58,25 MB/s which is about the same as a dd to a >>single disk (58,5 MB/s). So the controller + I/O subsystem is not the >>bottleneck. >> >>Regards, >>Mirko >> >> >> > > > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-26 15:29 ` Mirko Benz @ 2005-08-26 17:05 ` Ming Zhang 2005-08-28 23:28 ` Neil Brown 1 sibling, 0 replies; 14+ messages in thread From: Ming Zhang @ 2005-08-26 17:05 UTC (permalink / raw) To: Mirko Benz; +Cc: Neil Brown, Linux RAID i think this is out of my mind now. i think 4+1 DISK with 64KB chunk size is 256KB which fit the 1M size well. So there should not have any read happen... ming On Fri, 2005-08-26 at 17:29 +0200, Mirko Benz wrote: > Hello, > > Here are the results using 5 disks for RAID 5 – basically the same > results but with lower values. > Again, much slower than it could be. > > *** dd TEST *** > > time dd if=/dev/zero of=/dev/md0 bs=1M > 2819620864 bytes transferred in 23,410720 seconds (120441442 bytes/sec) > > iostat 5 output: > avg-cpu: %user %nice %sys %iowait %idle > 0,10 0,00 55,40 36,20 8,30 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > hda 0,00 0,00 0,00 0 0 > sda 0,00 0,00 0,00 0 0 > sdb 345,09 1999,20 71501,40 9976 356792 > sdc 348,10 2412,83 71434,07 12040 356456 > sdd 356,31 2460,92 71748,30 12280 358024 > sde 351,50 2456,11 71058,92 12256 354584 > sdf 348,10 2008,82 70935,47 10024 353968 > sdg 0,00 0,00 0,00 0 0 > sdh 0,00 0,00 0,00 0 0 > sdi 0,00 0,00 0,00 0 0 > sdj 0,00 0,00 0,00 0 0 > sdk 0,00 0,00 0,00 0 0 > sdl 0,00 0,00 0,00 0 0 > sdm 0,00 0,00 0,00 0 0 > sdn 0,00 0,00 0,00 0 0 > md0 35226,65 0,00 281813,23 0 1406248 > > disktest gives 99 MB/s but shows the same behaviour (unbalanced usage of > disks). > > Regards, > Mirko > > Ming Zhang schrieb: > > >i would like to suggest u to do a 4+1 raid5 configuration and see what > >happen. > > > >Ming > > > >On Fri, 2005-08-26 at 09:51 +0200, Mirko Benz wrote: > > > > > >>Hello, > >> > >>We have created a RAID 0 for the same environment: > >>Personalities : [raid0] [raid5] > >>md0 : active raid0 sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0] > >> 1250326528 blocks 64k chunks > >> > >>*** dd TEST *** > >> > >>time dd if=/dev/zero of=/dev/md0 bs=1M > >>14967373824 bytes transferred in 32,060497 seconds (466847843 bytes/sec) > >> > >>iostat 5 output: > >>avg-cpu: %user %nice %sys %iowait %idle > >> 0,00 0,00 89,60 9,50 0,90 > >> > >>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > >>hda 0,00 0,00 0,00 0 0 > >>sda 0,00 0,00 0,00 0 0 > >>sdb 455,31 0,00 116559,52 0 581632 > >>sdc 455,51 0,00 116540,28 0 581536 > >>sdd 450,10 0,00 116545,09 0 581560 > >>sde 454,11 0,00 116559,52 0 581632 > >>sdf 452,30 0,00 116559,52 0 581632 > >>sdg 454,71 0,00 116553,11 0 581600 > >>sdh 453,31 0,00 116533,87 0 581504 > >>sdi 453,91 0,00 116556,31 0 581616 > >>sdj 0,00 0,00 0,00 0 0 > >>sdk 0,00 0,00 0,00 0 0 > >>sdl 0,00 0,00 0,00 0 0 > >>sdm 0,00 0,00 0,00 0 0 > >>sdn 0,00 0,00 0,00 0 0 > >>md0 116556,11 0,00 932448,90 0 4652920 > >> > >>Comments: 466 MB / 8 = 58,25 MB/s which is about the same as a dd to a > >>single disk (58,5 MB/s). So the controller + I/O subsystem is not the > >>bottleneck. > >> > >>Regards, > >>Mirko > >> > >> > >> > > > > > > > > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-26 15:29 ` Mirko Benz 2005-08-26 17:05 ` Ming Zhang @ 2005-08-28 23:28 ` Neil Brown 1 sibling, 0 replies; 14+ messages in thread From: Neil Brown @ 2005-08-28 23:28 UTC (permalink / raw) To: Mirko Benz; +Cc: mingz, Linux RAID On Friday August 26, mirko.benz@web.de wrote: > Hello, > > Here are the results using 5 disks for RAID 5 ^[$,1rs^[(B basically the same > results but with lower values. > Again, much slower than it could be. Yes, there definitely does seem to be something wrong with raid5 write speed, doesn't there!! I will try to look into it over the next couple of weeks. NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID 5 write performance advice 2005-08-25 16:38 ` Mirko Benz 2005-08-25 16:54 ` Ming Zhang @ 2005-09-01 19:44 ` djani22 1 sibling, 0 replies; 14+ messages in thread From: djani22 @ 2005-09-01 19:44 UTC (permalink / raw) To: Mirko Benz; +Cc: linux-raid ----- Original Message ----- From: "Mirko Benz" <mirko.benz@web.de> To: "Neil Brown" <neilb@cse.unsw.edu.au> Cc: <mingz@ele.uri.edu>; "Linux RAID" <linux-raid@vger.kernel.org> Sent: Thursday, August 25, 2005 6:38 PM Subject: Re: RAID 5 write performance advice > Hello, > > We intend to export a lvm/md volume via iSCSI or SRP using InfiniBand to > remote clients. There is no local file system processing on the storage > platform. The clients may have a variety of file systems including ext3, > GFS. > > Single disk write performance is: 58,5 MB/s. With large sequential write > operations I would expect something like 90% of n-1 * > single_disk_performance if stripe write can be utilized. So roughly 400 > MB/s – which the HW RAID devices achieve. > > RAID setup: > Personalities : [raid0] [raid5] > md0 : active raid5 sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0] > 1094035712 blocks level 5, 64k chunk, algorithm 2 [8/8] [UUUUUUUU] > > We have assigned the deadline scheduler to every disk in the RAID. The > default scheduler gives much lower results. I recommend you to try not just another iosched, try they settings too! For me, this helps... ;-) (the settings is in the kernel's Documentation/block -dir) Additionally try to set sched's, in another layer, eg: lvm.... (if it is possible, i don't know...) I use 8TB disk in one big raid0 array, and my config is this: 4 PC each 11x200GB hdd with RAID5, exports the four 2TB space. In the clients I use the default anticipatory sched with this settings: antic_expire : 6 read_batch_expire: 500 read_expire 125 write_batch_expire 125 write_expire 250 And one dual Xeon system, the TOP-client inside the 8TB raid0 (from 4 disk node). I use GNBD, and for the gnbd devices, the deadline sched is the best, with this settings: fifo_batch: 16 front_merges 0 read_expire 50 write_expire 5000 writes_starved 255 ( - 1024 depends on what I want....) I try the LVM too, but I dropped it, because too low performace... :( I try to grow with raid's linear mode insted. ;-) Thanks to Neilbrown! ;-) ( I didn't test the patch yet) > > *** dd TEST *** > > time dd if=/dev/zero of=/dev/md0 bs=1M > 5329911808 bytes transferred in 28,086199 seconds (189769779 bytes/sec) > > iostat 5 output: > avg-cpu: %user %nice %sys %iowait %idle > 0,10 0,00 87,80 7,30 4,80 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > hda 0,00 0,00 0,00 0 0 > sda 0,00 0,00 0,00 0 0 > sdb 1976,10 1576,10 53150,60 7912 266816 > sdc 2072,31 1478,88 53150,60 7424 266816 > sdd 2034,06 1525,10 53150,60 7656 266816 > sde 1988,05 1439,04 53147,41 7224 266800 > sdf 1975,10 1499,60 53147,41 7528 266800 > sdg 1383,07 1485,26 53145,82 7456 266792 > sdh 1562,55 1311,55 53145,82 6584 266792 > sdi 1586,85 1295,62 53145,82 6504 266792 > sdj 0,00 0,00 0,00 0 0 > sdk 0,00 0,00 0,00 0 0 > sdl 0,00 0,00 0,00 0 0 > sdm 0,00 0,00 0,00 0 0 > sdn 0,00 0,00 0,00 0 0 > md0 46515,54 0,00 372124,30 0 1868064 > > Comments: Large write should not see any read operations. But there are > some??? > > > *** disktest *** > > disktest -w -PT -T30 -h1 -K8 -B512k -ID /dev/md0 > > | 2005/08/25-17:27:04 | STAT | 4072 | v1.1.12 | /dev/md0 | Write > throughput: 160152507.7B/s (152.73MB/s), IOPS 305.7/s. > | 2005/08/25-17:27:05 | STAT | 4072 | v1.1.12 | /dev/md0 | Write > throughput: 160694272.0B/s (153.25MB/s), IOPS 306.6/s. > | 2005/08/25-17:27:06 | STAT | 4072 | v1.1.12 | /dev/md0 | Write > throughput: 160339606.6B/s (152.91MB/s), IOPS 305.8/s. > > iostat 5 output: > avg-cpu: %user %nice %sys %iowait %idle > 38,96 0,00 50,25 5,29 5,49 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > hda 0,00 0,00 0,00 0 0 > sda 1,20 0,00 11,18 0 56 > sdb 986,43 0,00 39702,99 0 198912 > sdc 922,75 0,00 39728,54 0 199040 > sdd 895,81 0,00 39728,54 0 199040 > sde 880,84 0,00 39728,54 0 199040 > sdf 839,92 0,00 39728,54 0 199040 > sdg 842,91 0,00 39728,54 0 199040 > sdh 1557,49 0,00 79431,54 0 397952 > sdi 2246,71 0,00 104411,98 0 523104 > sdj 0,00 0,00 0,00 0 0 > sdk 0,00 0,00 0,00 0 0 > sdl 0,00 0,00 0,00 0 0 > sdm 0,00 0,00 0,00 0 0 > sdn 0,00 0,00 0,00 0 0 > md0 1550,70 0,00 317574,45 0 1591048 > > Comments: > Zero read requests – as it should be. But the write requests are not > proportional. sdh and sdi have significantly more requests??? I have get this too in 2.6.13-rc3, but it is gone in rc6, and 2.6.13! What version of kernel do you use? Janos > The write requests to the disks of the RAID should be 1/7 higher than to > the md device. > But there are significantly more write operations. > > All these operations are to the raw device. Setting up a ext3 fs we get > around 127 MB/s with dd. > > Any idea? > > --Mirko > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-09-01 19:44 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-08-24 8:24 RAID 5 write performance advice Mirko Benz 2005-08-24 12:46 ` Ming Zhang 2005-08-24 13:43 ` Mirko Benz 2005-08-24 13:49 ` Ming Zhang 2005-08-24 21:32 ` Neil Brown 2005-08-25 16:38 ` Mirko Benz 2005-08-25 16:54 ` Ming Zhang 2005-08-26 7:51 ` Mirko Benz 2005-08-26 14:26 ` Ming Zhang 2005-08-26 14:30 ` Ming Zhang 2005-08-26 15:29 ` Mirko Benz 2005-08-26 17:05 ` Ming Zhang 2005-08-28 23:28 ` Neil Brown 2005-09-01 19:44 ` djani22
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).