* Re: SATA RAID5 speed drop of 100 MB/s [not found] ` <20070622214859.GC6970@alinoe.com> @ 2007-06-23 7:03 ` Jeff Garzik 2007-06-23 7:54 ` Tejun Heo ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Jeff Garzik @ 2007-06-23 7:03 UTC (permalink / raw) To: Carlo Wood Cc: Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list [-- Attachment #1: Type: text/plain, Size: 1122 bytes --] Carlo Wood wrote: > The dmesg output of 33480a0ede8dcc7e6483054279008f972bd56fd3 (thus > "before") is: [...] > And the dmesg output of 551c012d7eea3dc5ec063c7ff9c718d39e77634f (thus > "after") is: [...] Your disk configurations are quite radically different between the two kernels (see attached diff for key highlights). The new behavior of the more recent kernel (551c012d7...) is that it now fully drives your hardware :) The reset problems go away, NCQ is enabled, and if you had 3.0Gbps drives (you don't) they would be driven at a faster speed. Given that some drives might be better tuned for benchmarks in non-queued mode, and that a major behavior difference is that your drives are now NCQ-enabled, the first thing I would suggest you try is disabling NCQ: http://linux-ata.org/faq.html#ncq Other indicators are the other changes in the "ahci 0000:00:1f.2: flags:" line, which do affect other behaviors, though none so important to RAID5 performance as NCQ, I would think. Turning on NCQ also potentially affects barrier behavior in RAID, though I'm guessing that is not a factor here. Jeff [-- Attachment #2: diff.txt --] [-- Type: text/plain, Size: 1673 bytes --] -ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 6 ports ? Gbps 0x3f impl SATA mode -ahci 0000:00:1f.2: flags: 64bit ilck stag led pmp pio +ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 6 ports 3 Gbps 0x3f impl SATA mode +ahci 0000:00:1f.2: flags: 64bit ncq ilck stag pm led clo pmp pio slum part scsi0 : ahci -ata1: softreset failed (port busy but CLO unavailable) -ata1: softreset failed, retrying in 5 secs ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) -ata1.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 0/32) +ata1.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 31/32) ata1.00: ata1: dev 0 multi count 0 ata1.00: configured for UDMA/133 scsi1 : ahci -ata2: softreset failed (port busy but CLO unavailable) -ata2: softreset failed, retrying in 5 secs ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) -ata2.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 0/32) +ata2.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 31/32) ata2.00: ata2: dev 0 multi count 0 ata2.00: configured for UDMA/133 scsi2 : ahci ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) -ata3.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 0/32) +ata3.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 31/32) ata3.00: ata3: dev 0 multi count 0 ata3.00: configured for UDMA/133 scsi3 : ahci ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) -ata4.00: ATA-7, max UDMA/133, 625142448 sectors: LBA48 NCQ (depth 0/32) +ata4.00: ATA-7, max UDMA/133, 625142448 sectors: LBA48 NCQ (depth 31/32) ata4.00: configured for UDMA/133 scsi4 : ahci ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-23 7:03 ` SATA RAID5 speed drop of 100 MB/s Jeff Garzik @ 2007-06-23 7:54 ` Tejun Heo 2007-06-23 12:53 ` Carlo Wood 2007-06-24 0:54 ` Eyal Lebedinsky 2 siblings, 0 replies; 17+ messages in thread From: Tejun Heo @ 2007-06-23 7:54 UTC (permalink / raw) To: Jeff Garzik Cc: Carlo Wood, Manoj Kasichainula, linux-kernel, IDE/ATA development list Jeff Garzik wrote: > Carlo Wood wrote: >> The dmesg output of 33480a0ede8dcc7e6483054279008f972bd56fd3 (thus >> "before") is: > [...] >> And the dmesg output of 551c012d7eea3dc5ec063c7ff9c718d39e77634f (thus >> "after") is: > [...] > > Your disk configurations are quite radically different between the two > kernels (see attached diff for key highlights). > > The new behavior of the more recent kernel (551c012d7...) is that it now > fully drives your hardware :) The reset problems go away, NCQ is > enabled, and if you had 3.0Gbps drives (you don't) they would be driven > at a faster speed. > > Given that some drives might be better tuned for benchmarks in > non-queued mode, and that a major behavior difference is that your > drives are now NCQ-enabled, the first thing I would suggest you try is > disabling NCQ: > http://linux-ata.org/faq.html#ncq > > Other indicators are the other changes in the "ahci 0000:00:1f.2: > flags:" line, which do affect other behaviors, though none so important > to RAID5 performance as NCQ, I would think. > > Turning on NCQ also potentially affects barrier behavior in RAID, though > I'm guessing that is not a factor here. Ah.. right. That should have enabled NCQ. Me slow today. :-) -- tejun ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-23 7:03 ` SATA RAID5 speed drop of 100 MB/s Jeff Garzik 2007-06-23 7:54 ` Tejun Heo @ 2007-06-23 12:53 ` Carlo Wood 2007-06-23 17:30 ` Bartlomiej Zolnierkiewicz 2007-06-23 22:43 ` Jeff Garzik 2007-06-24 0:54 ` Eyal Lebedinsky 2 siblings, 2 replies; 17+ messages in thread From: Carlo Wood @ 2007-06-23 12:53 UTC (permalink / raw) To: Jeff Garzik Cc: Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list On Sat, Jun 23, 2007 at 03:03:33AM -0400, Jeff Garzik wrote: > Your disk configurations are quite radically different between the two > kernels (see attached diff for key highlights). > > The new behavior of the more recent kernel (551c012d7...) is that it now > fully drives your hardware :) The reset problems go away, NCQ is > enabled, and if you had 3.0Gbps drives (you don't) they would be driven > at a faster speed. > > Given that some drives might be better tuned for benchmarks in > non-queued mode, and that a major behavior difference is that your > drives are now NCQ-enabled, the first thing I would suggest you try is > disabling NCQ: > http://linux-ata.org/faq.html#ncq Thanks! That is indeed the difference that causes the drop of "hdparm -tT" that I observed. After setting /sys/block/sdX/device/queue_depth of all three drives to 1, I get again /dev/md2: Timing cached reads: 8252 MB in 2.00 seconds = 4130.59 MB/sec Timing buffered disk reads: 496 MB in 3.01 seconds = 164.88 MB/sec on 2.6.22-rc5. > Other indicators are the other changes in the "ahci 0000:00:1f.2: > flags:" line, which do affect other behaviors, though none so important > to RAID5 performance as NCQ, I would think. > > Turning on NCQ also potentially affects barrier behavior in RAID, though > I'm guessing that is not a factor here. Of course, I am not really interested in what "hdparm -tT" gives, but rather in a high performance during real-life use of the disks. Is it possible that the measurement with "hdparm -tT" returns a higher value for some setting, but that the over-all real-life performance drops? Also, the effect of this setting is nil for the individual drives. hdparm -tT /dev/sda gives me still around 65 MB/s. I don't understand why this setting has such a HUGE effect on RAID5 while the underlaying drives themselves don't seem affected. PS I'd like to do extensive testing with Bonnie++ to tune everything there is to tune. But bonnie likes to write/read files TWICE the amount of RAM I have. It therefore takes a LOT of time to run one test. Do you happen to know how I can limit the amount of RAM that the linux kernel sees to, say 500 MB? That should be enough to run in Single User mode but allow me to run the tests MUCH faster. (I have dual channel, four DIMM's of 1 GB each -- 2 GB per Core 2 die. Hopefully the fact that I have dual channel isn't going to be a problem when limiting the ram that the kernel sees.) -- Carlo Wood <carlo@alinoe.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-23 12:53 ` Carlo Wood @ 2007-06-23 17:30 ` Bartlomiej Zolnierkiewicz 2007-06-23 22:43 ` Jeff Garzik 1 sibling, 0 replies; 17+ messages in thread From: Bartlomiej Zolnierkiewicz @ 2007-06-23 17:30 UTC (permalink / raw) To: Carlo Wood Cc: Jeff Garzik, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list Hi, On Saturday 23 June 2007, Carlo Wood wrote: > PS I'd like to do extensive testing with Bonnie++ to tune everything > there is to tune. But bonnie likes to write/read files TWICE the amount > of RAM I have. It therefore takes a LOT of time to run one test. Do you > happen to know how I can limit the amount of RAM that the linux kernel > sees to, say 500 MB? That should be enough to run in Single User mode > but allow me to run the tests MUCH faster. (I have dual channel, four > DIMM's of 1 GB each -- 2 GB per Core 2 die. Hopefully the fact that > I have dual channel isn't going to be a problem when limiting the ram > that the kernel sees.) "mem=" kernel parameter limits amount of memory seen by kernel (more info in Documentation/kernel-parameters.txt) You can also limit amount of RAM detected by bonnie++ by using -r parameter but please remember that this will make bonnie++ benchmark combined kernel I/O buffering + filesystem + hard disk performance instead of just filesystem + hard disk performance (as it can happen that some / all data won't ever hit the disk). Bart ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-23 12:53 ` Carlo Wood 2007-06-23 17:30 ` Bartlomiej Zolnierkiewicz @ 2007-06-23 22:43 ` Jeff Garzik 2007-06-24 11:58 ` Michael Tokarev 1 sibling, 1 reply; 17+ messages in thread From: Jeff Garzik @ 2007-06-23 22:43 UTC (permalink / raw) To: Carlo Wood, Jeff Garzik, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list Carlo Wood wrote: > Is it possible that the measurement with "hdparm -tT" returns a higher > value for some setting, but that the over-all real-life performance > drops? IN THEORY, RAID performance should /increase/ due to additional queued commands available to be sent to the drive. NCQ == command queueing == sending multiple commands to the drive, rather than one-at-a-time like normal. But hdparm isn't the best test for that theory, since it does not simulate the transactions like real-world MD device usage does. We have seen buggy NCQ firmwares where performance decreases, so it is possible that NCQ just isn't good on your drives. Jeff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-23 22:43 ` Jeff Garzik @ 2007-06-24 11:58 ` Michael Tokarev 2007-06-24 12:59 ` Dr. David Alan Gilbert 2007-07-05 22:12 ` Phillip Susi 0 siblings, 2 replies; 17+ messages in thread From: Michael Tokarev @ 2007-06-24 11:58 UTC (permalink / raw) To: Jeff Garzik Cc: Carlo Wood, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list Jeff Garzik wrote: > IN THEORY, RAID performance should /increase/ due to additional queued > commands available to be sent to the drive. NCQ == command queueing == > sending multiple commands to the drive, rather than one-at-a-time like > normal. > > But hdparm isn't the best test for that theory, since it does not > simulate the transactions like real-world MD device usage does. > > We have seen buggy NCQ firmwares where performance decreases, so it is > possible that NCQ just isn't good on your drives. By the way, I did some testing of various drives, and NCQ/TCQ indeed shows some difference -- with multiple I/O processes (like "server" workload), IF NCQ/TCQ is implemented properly, especially in the drive. For example, this is a good one: Single Seagate 74Gb SCSI drive (10KRPM) BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W 4k 1 66.4 0.5 0.6 0.5 0.6/ 0.6 0.4/ 0.2 2 0.6 0.6 0.5/ 0.1 4 0.7 0.6 0.6/ 0.2 16k 1 84.8 2.0 2.5 1.9 2.5/ 2.5 1.6/ 0.6 2 2.3 2.1 2.0/ 0.6 4 2.7 2.5 2.3/ 0.6 64k 1 84.8 7.4 9.3 7.2 9.4/ 9.3 5.8/ 2.2 2 8.6 7.9 7.3/ 2.1 4 9.9 9.1 8.1/ 2.2 128k 1 84.8 13.6 16.7 12.9 16.9/16.6 10.6/ 3.9 2 15.6 14.4 13.5/ 3.2 4 17.9 16.4 15.7/ 2.7 512k 1 84.9 34.0 41.9 33.3 29.0/27.1 22.4/13.2 2 36.9 34.5 30.7/ 8.1 4 40.5 38.1 33.2/ 8.3 1024k 1 83.1 36.0 55.8 34.6 28.2/27.6 20.3/19.4 2 45.2 44.1 36.4/ 9.9 4 48.1 47.6 40.7/ 7.1 The tests are direct-I/O over whole drive (/dev/sdX), with either 1, 2, or 4 threads doing sequential or random reads or writes in blocks of a given size. For the R/W tests, we've 2, 4 or 8 threads running in total (1, 2 or 4 readers and the same amount of writers). Numbers are MB/sec, as totals (summary) for all threads. Especially interesting is the very last column - random R/W in parallel. In almost all cases, more threads gives larger total speed (I *guess* it's due to internal optimisations in the drive -- with more threads the drive has more chances to reorder commands to minimize seek time etc). The only thing I don't understand is why with larger I/O block size we see write speed drop with multiple threads. And in contrast to the above, here's another test run, now with Seagate SATA ST3250620AS ("desktop" class) 250GB 7200RPM drive: BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W 4k 1 47.5 0.3 0.5 0.3 0.3/ 0.3 0.1/ 0.1 2 0.3 0.3 0.2/ 0.1 4 0.3 0.3 0.2/ 0.2 16k 1 78.4 1.1 1.8 1.1 0.9/ 0.9 0.6/ 0.6 2 1.2 1.1 0.6/ 0.6 4 1.3 1.2 0.6/ 0.6 64k 1 78.4 4.3 6.7 4.0 3.5/ 3.5 2.1/ 2.2 2 4.5 4.1 2.2/ 2.3 4 4.7 4.2 2.3/ 2.4 128k 1 78.4 8.0 12.6 7.2 6.2/ 6.2 3.9/ 3.8 2 8.2 7.3 4.1/ 4.0 4 8.7 7.7 4.3/ 4.3 512k 1 78.5 23.1 34.0 20.3 17.1/17.1 11.3/10.7 2 23.5 20.6 11.3/11.4 4 24.7 21.3 11.6/11.8 1024k 1 78.4 34.1 33.5 24.6 19.6/19.5 16.0/12.7 2 33.3 24.6 15.4/13.8 4 34.3 25.0 14.7/15.0 Here, the (total) I/O speed does not depend on the number of threads. From which I conclude that the drive does not reorder/optimize commands internally, even if NCQ is enabled (queue depth is 32). (And two notes. First of all, for some, those tables may look.. strange, showing too low speed. Note the block size, and note I'm doing *direct* *random* I/O, without buffering in the kernel. Yes, even the most advanced modern drives are very slow in this workload, due to seek times and rotation latency -- the disk is maxing out at the theoretical requests/secound -- take average seek time plus rotation latency (usually given in the drive specs), and divide one secound to the calculated value -- you'll see about 200..250 - that's requests/sec. And the numbers - like 0.3Mb/sec write - are very close to those 200..250. In any way, this is not a typical workload - file server for example is not like this. But it's more or less resembles database workload. And second, so far I haven't seen a case where a drive with NCQ/TCQ enabled works worse than without. I don't want to say there aren't such drives/controllers, but it just happen that I haven't seen any.) /mjt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-24 11:58 ` Michael Tokarev @ 2007-06-24 12:59 ` Dr. David Alan Gilbert 2007-06-24 14:21 ` Justin Piszcz 2007-06-24 15:48 ` Michael Tokarev 2007-07-05 22:12 ` Phillip Susi 1 sibling, 2 replies; 17+ messages in thread From: Dr. David Alan Gilbert @ 2007-06-24 12:59 UTC (permalink / raw) To: Michael Tokarev Cc: Jeff Garzik, Carlo Wood, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list * Michael Tokarev (mjt@tls.msk.ru) wrote: <snip> > By the way, I did some testing of various drives, and NCQ/TCQ indeed > shows some difference -- with multiple I/O processes (like "server" > workload), IF NCQ/TCQ is implemented properly, especially in the > drive. > > For example, this is a good one: > > Single Seagate 74Gb SCSI drive (10KRPM) > > BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W <snip> > 1024k 1 83.1 36.0 55.8 34.6 28.2/27.6 20.3/19.4 > 2 45.2 44.1 36.4/ 9.9 > 4 48.1 47.6 40.7/ 7.1 > > The tests are direct-I/O over whole drive (/dev/sdX), with > either 1, 2, or 4 threads doing sequential or random reads > or writes in blocks of a given size. For the R/W tests, > we've 2, 4 or 8 threads running in total (1, 2 or 4 readers > and the same amount of writers). Numbers are MB/sec, as > totals (summary) for all threads. > > Especially interesting is the very last column - random R/W > in parallel. In almost all cases, more threads gives larger > total speed (I *guess* it's due to internal optimisations in > the drive -- with more threads the drive has more chances to > reorder commands to minimize seek time etc). > > The only thing I don't understand is why with larger I/O block > size we see write speed drop with multiple threads. My guess is that something is chopping them up into smaller writes. > And in contrast to the above, here's another test run, now > with Seagate SATA ST3250620AS ("desktop" class) 250GB > 7200RPM drive: > > BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W <snip> > 1024k 1 78.4 34.1 33.5 24.6 19.6/19.5 16.0/12.7 > 2 33.3 24.6 15.4/13.8 > 4 34.3 25.0 14.7/15.0 > <snip> > And second, so far I haven't seen a case where a drive > with NCQ/TCQ enabled works worse than without. I don't > want to say there aren't such drives/controllers, but > it just happen that I haven't seen any.) Yes you have - the random writes with large blocks and 2 or 4 threads is significantly better for your non-NCQ drive; and getting more significant as you add more threads - I'm curious what happens on 8 threads or more. Dave -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \ \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex / \ _________________________|_____ http://www.treblig.org |_______/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-24 12:59 ` Dr. David Alan Gilbert @ 2007-06-24 14:21 ` Justin Piszcz 2007-06-24 15:52 ` Michael Tokarev 2007-06-24 15:48 ` Michael Tokarev 1 sibling, 1 reply; 17+ messages in thread From: Justin Piszcz @ 2007-06-24 14:21 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Michael Tokarev, Jeff Garzik, Carlo Wood, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list Don't forget about max_sectors_kb either (for all drives in the SW RAID5 array) max_sectors_kb = 8 $ dd if=/dev/zero of=file.out6 bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 55.4848 seconds, 194 MB/s max_sectors_kb = 16 $ dd if=/dev/zero of=file.out5 bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 37.6886 seconds, 285 MB/s max_sectors_kb = 32 $ dd if=/dev/zero of=file.out4 bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 26.2875 seconds, 408 MB/s max_sectors_kb = 64 $ dd if=/dev/zero of=file.out2 bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 24.8301 seconds, 432 MB/s max_sectors_kb = 128 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 22.6298 seconds, 474 MB/s On Sun, 24 Jun 2007, Dr. David Alan Gilbert wrote: > * Michael Tokarev (mjt@tls.msk.ru) wrote: > > <snip> > >> By the way, I did some testing of various drives, and NCQ/TCQ indeed >> shows some difference -- with multiple I/O processes (like "server" >> workload), IF NCQ/TCQ is implemented properly, especially in the >> drive. >> >> For example, this is a good one: >> >> Single Seagate 74Gb SCSI drive (10KRPM) >> >> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W > > <snip> > >> 1024k 1 83.1 36.0 55.8 34.6 28.2/27.6 20.3/19.4 >> 2 45.2 44.1 36.4/ 9.9 >> 4 48.1 47.6 40.7/ 7.1 >> >> The tests are direct-I/O over whole drive (/dev/sdX), with >> either 1, 2, or 4 threads doing sequential or random reads >> or writes in blocks of a given size. For the R/W tests, >> we've 2, 4 or 8 threads running in total (1, 2 or 4 readers >> and the same amount of writers). Numbers are MB/sec, as >> totals (summary) for all threads. >> >> Especially interesting is the very last column - random R/W >> in parallel. In almost all cases, more threads gives larger >> total speed (I *guess* it's due to internal optimisations in >> the drive -- with more threads the drive has more chances to >> reorder commands to minimize seek time etc). >> >> The only thing I don't understand is why with larger I/O block >> size we see write speed drop with multiple threads. > > My guess is that something is chopping them up into smaller writes. > >> And in contrast to the above, here's another test run, now >> with Seagate SATA ST3250620AS ("desktop" class) 250GB >> 7200RPM drive: >> >> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W > > <snip> > >> 1024k 1 78.4 34.1 33.5 24.6 19.6/19.5 16.0/12.7 >> 2 33.3 24.6 15.4/13.8 >> 4 34.3 25.0 14.7/15.0 >> > > <snip> > >> And second, so far I haven't seen a case where a drive >> with NCQ/TCQ enabled works worse than without. I don't >> want to say there aren't such drives/controllers, but >> it just happen that I haven't seen any.) > > Yes you have - the random writes with large blocks and 2 or 4 threads > is significantly better for your non-NCQ drive; and getting more > significant as you add more threads - I'm curious what happens > on 8 threads or more. > > Dave > -- > -----Open up your eyes, open up your mind, open up your code ------- > / Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \ > \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex / > \ _________________________|_____ http://www.treblig.org |_______/ > - > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-24 14:21 ` Justin Piszcz @ 2007-06-24 15:52 ` Michael Tokarev 2007-06-24 16:59 ` Justin Piszcz 0 siblings, 1 reply; 17+ messages in thread From: Michael Tokarev @ 2007-06-24 15:52 UTC (permalink / raw) To: Justin Piszcz Cc: Dr. David Alan Gilbert, Jeff Garzik, Carlo Wood, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > Don't forget about max_sectors_kb either (for all drives in the SW RAID5 > array) > > max_sectors_kb = 8 > $ dd if=/dev/zero of=file.out6 bs=1M count=10240 > 10737418240 bytes (11 GB) copied, 55.4848 seconds, 194 MB/s > > max_sectors_kb = 128 > 10737418240 bytes (11 GB) copied, 22.6298 seconds, 474 MB/s Well. You're comparing something different. Yes, this thread is about linux software raid5 in the first place, but I were commenting about [NT]CQ within a single drive. Overall, yes, the larger your reads/writes to the drive becomes, the faster its linear performance is. Yet you have to consider real workload instead of very synthetic dd test. It may be good approcsimation of a streaming video workload (when you feed a large video file over network or something like that), but even with this, you probably want to feed several files at once (different files to different clients), so single-threaded test here isn't very useful. IMHO anyway, and good for a personal computer test. /mjt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-24 15:52 ` Michael Tokarev @ 2007-06-24 16:59 ` Justin Piszcz 2007-06-24 22:07 ` Carlo Wood 0 siblings, 1 reply; 17+ messages in thread From: Justin Piszcz @ 2007-06-24 16:59 UTC (permalink / raw) To: Michael Tokarev Cc: Dr. David Alan Gilbert, Jeff Garzik, Carlo Wood, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list On Sun, 24 Jun 2007, Michael Tokarev wrote: > Justin Piszcz wrote: >> Don't forget about max_sectors_kb either (for all drives in the SW RAID5 >> array) >> >> max_sectors_kb = 8 >> $ dd if=/dev/zero of=file.out6 bs=1M count=10240 >> 10737418240 bytes (11 GB) copied, 55.4848 seconds, 194 MB/s >> >> max_sectors_kb = 128 >> 10737418240 bytes (11 GB) copied, 22.6298 seconds, 474 MB/s > > Well. You're comparing something different. Yes, this > thread is about linux software raid5 in the first place, > but I were commenting about [NT]CQ within a single drive. > > Overall, yes, the larger your reads/writes to the drive > becomes, the faster its linear performance is. Yet you > have to consider real workload instead of very synthetic > dd test. It may be good approcsimation of a streaming > video workload (when you feed a large video file over > network or something like that), but even with this, > you probably want to feed several files at once (different > files to different clients), so single-threaded test > here isn't very useful. IMHO anyway, and good for a > personal computer test. > > /mjt > - > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Concerning NCQ/no NCQ, without NCQ I get an additional 15-50MB/s in speed per various bonnie++ tests. # Average of 3 runs with NCQ on for Quad Raptor 150 RAID 5 Software RAID: p34-ncq-on,7952M,43916.3,96.6667,151943,28.6667,75794.3,18.6667,48991.3,99,181687,24,558.033,0.333333,16:100000:16/64,867.667,9,29972.7,98.3333,2801.67,16,890.667,9.33333,27743,94.3333,2115.33,15.6667 # Average of 3 runs with NCQ off for Quad Raptor 150 RAID 5 Software RAID: p34-ncq-off,7952M,42470,97.3333,200409,36.3333,90240.3,22.6667,48656,99,198853,27,546.467,0,16:100000:16/64,972.333,10,21833,72.3333,3697,21,995,10.6667,27901.7,95.6667,2681,20.6667 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-24 16:59 ` Justin Piszcz @ 2007-06-24 22:07 ` Carlo Wood 2007-06-24 23:46 ` Mark Lord 2007-06-25 0:23 ` Patrick Mau 0 siblings, 2 replies; 17+ messages in thread From: Carlo Wood @ 2007-06-24 22:07 UTC (permalink / raw) To: Justin Piszcz Cc: Michael Tokarev, Dr. David Alan Gilbert, Jeff Garzik, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list On Sun, Jun 24, 2007 at 12:59:10PM -0400, Justin Piszcz wrote: > Concerning NCQ/no NCQ, without NCQ I get an additional 15-50MB/s in speed > per various bonnie++ tests. There is more going on than a bad NCQ implementation of the drive imho. I did a long test over night (and still only got two schedulers done, will do the other two tomorrow), and the difference between a queue depth of 1 and 2 is DRAMATIC. See http://www.xs4all.nl/~carlo17/noop_queue_depth.png and http://www.xs4all.nl/~carlo17/anticipatory_queue_depth.png The bonnie++ tests are done in a directory on the /dev/md7 and /dev/ssd2 partitions respectively. Each bonnie test is performed four times. The hdparm -t tests (that show no difference with a -tT test) are each done five times, for /dev/sdd, /dev/md7 and /dev/sda (that is one of the RAID5 drives used for /dev/md7). Thus in total there are 2 * 4 + 3 * 5 = 23 data points per queue depth value in each graph. The following can be observed: 1) There is hardly any difference between the two schedulers (noop is a little faster for the bonny test). 2) An NCQ depth of 1 is WAY faster on RAID5 (bonnie; around 125 MB/s), the NCQ depth of 2 is by far the slowest for the RAID5 (bonnie; around 40 MB/s). NCQ depths of 3 and higher show no difference, but are also slow (bonnie; around 75 MB/s). 3) There is no significant influence of the NCQ depth for non-RAID, either the /dev/sda (hdparm -t) or /dev/sdd disk (hdparm -t and bonnie). 4) With a NCQ depth > 1, the hdparm -t measurement of /dev/md7 is VERY unstable. Sometimes it gives the maximum (around 150 MB/s), and sometimes as low as 30 MB/s, seemingly independent of the NCQ depth. Note that those measurement were done on an otherwise unloaded machine in single user mode; and the measurements were all done one after an other. The strong fluctuation of the hdparm results for the RAID device (while the underlaying devices do not show this behaviour) are unexplainable. >From the above I conclude that something must be wrong with the software RAID implementation - and not just with the harddisks, imho. At least, that's what it looks like to me. I am not an expert though ;) -- Carlo Wood <carlo@alinoe.com> PS RAID5 (md7 = sda7 + sdb7 + sdc7): Three times a Western Digital Raptor 10k rpm (WDC WD740ADFD-00NLR1). non-RAID (sdd2): Seagate barracuda 7200 rpm (ST3320620AS). The reason that now I measure around 145 MB/s instead of 165 MB/s as reported in previous post (with hdparm -t /dev/md7) is because before I use hdparm -t /dev/md2, which is closer to the outside of the disk and therefore faster. /dev/md2 still is around 165 MB/s. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-24 22:07 ` Carlo Wood @ 2007-06-24 23:46 ` Mark Lord 2007-06-25 0:23 ` Patrick Mau 1 sibling, 0 replies; 17+ messages in thread From: Mark Lord @ 2007-06-24 23:46 UTC (permalink / raw) To: Carlo Wood, Justin Piszcz, Michael Tokarev, Dr. David Alan Gilbert, Jeff Garzik, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list Carlo Wood wrote: > > The following can be observed: > > 1) There is hardly any difference between the two schedulers (noop > is a little faster for the bonny test). > 2) An NCQ depth of 1 is WAY faster on RAID5 (bonnie; around 125 MB/s), > the NCQ depth of 2 is by far the slowest for the RAID5 (bonnie; > around 40 MB/s). NCQ depths of 3 and higher show no difference, > but are also slow (bonnie; around 75 MB/s). > 3) There is no significant influence of the NCQ depth for non-RAID, > either the /dev/sda (hdparm -t) or /dev/sdd disk (hdparm -t and > bonnie). > 4) With a NCQ depth > 1, the hdparm -t measurement of /dev/md7 is > VERY unstable. Sometimes it gives the maximum (around 150 MB/s), > and sometimes as low as 30 MB/s, seemingly independent of the > NCQ depth. Note that those measurement were done on an otherwise > unloaded machine in single user mode; and the measurements were > all done one after an other. The strong fluctuation of the hdparm > results for the RAID device (while the underlaying devices do not > show this behaviour) are unexplainable. > >>From the above I conclude that something must be wrong with the > software RAID implementation - and not just with the harddisks, imho. > At least, that's what it looks like to me. I am not an expert though ;) I'm late tuning in here, but: (1) hdparm issues only a single read at a time, so NCQ won't help it. (2) WD Raptor drives automatically turn off "read-ahead" when using NCQ, which totally kills any throughput measurements. They do this to speed up random access seeks; dunno if it pays off or not. Under Windows, the disk drivers don't use NCQ when performing large I/O operations, which avoids the performance loss. (3) Other drives from other brands may have similar issues, but I have not run into it on them yet. Cheers ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-24 22:07 ` Carlo Wood 2007-06-24 23:46 ` Mark Lord @ 2007-06-25 0:23 ` Patrick Mau 1 sibling, 0 replies; 17+ messages in thread From: Patrick Mau @ 2007-06-25 0:23 UTC (permalink / raw) To: Carlo Wood, Justin Piszcz, Michael Tokarev, Dr. David Alan Gilbert, Jeff Garzik, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list On Mon, Jun 25, 2007 at 12:07:23AM +0200, Carlo Wood wrote: > On Sun, Jun 24, 2007 at 12:59:10PM -0400, Justin Piszcz wrote: > > Concerning NCQ/no NCQ, without NCQ I get an additional 15-50MB/s in speed > > per various bonnie++ tests. > > There is more going on than a bad NCQ implementation of the drive imho. > I did a long test over night (and still only got two schedulers done, > will do the other two tomorrow), and the difference between a queue depth > of 1 and 2 is DRAMATIC. > > See http://www.xs4all.nl/~carlo17/noop_queue_depth.png > and http://www.xs4all.nl/~carlo17/anticipatory_queue_depth.png Hi Carlo, Have you considered using "blktrace" ? It enables you to gather data of all seperate requests queues and will also show you the mapping of bio request from /dev/mdX to the individual physical disk. You can also identify SYNC and BARRIER flags for requests, that might show you why the md driver will sometimes wait for completion or even REQUEUE if the queue is full. Just compile your kernel with CONFIG_BLK_DEV_IO_TRACE and pull the "blktrace" (and "blockparse") utility with git. The git URL is in the Kconfig help text. You have to mount, debugfs (automatically selected by IO trace). I just want to mention, because I did not figure it at first ;) You should of course use a different location for the output files to avoid an endless flood of IO. Regards, Patrick PS: I know, I talked about blktrace twice already ;) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-24 12:59 ` Dr. David Alan Gilbert 2007-06-24 14:21 ` Justin Piszcz @ 2007-06-24 15:48 ` Michael Tokarev 1 sibling, 0 replies; 17+ messages in thread From: Michael Tokarev @ 2007-06-24 15:48 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Jeff Garzik, Carlo Wood, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list Dr. David Alan Gilbert wrote: > * Michael Tokarev (mjt@tls.msk.ru) wrote: > > <snip> > >> By the way, I did some testing of various drives, and NCQ/TCQ indeed >> shows some difference -- with multiple I/O processes (like "server" >> workload), IF NCQ/TCQ is implemented properly, especially in the >> drive. >> >> For example, this is a good one: >> >> Single Seagate 74Gb SCSI drive (10KRPM) >> >> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W >> 1024k 1 83.1 36.0 55.8 34.6 28.2/27.6 20.3/19.4 >> 2 45.2 44.1 36.4/ 9.9 >> 4 48.1 47.6 40.7/ 7.1 [] >> The only thing I don't understand is why with larger I/O block >> size we see write speed drop with multiple threads. > > My guess is that something is chopping them up into smaller writes. At least it's not in the kernel. According to /proc/diskstats, the requests goes in 1024kb into the drive. >> And in contrast to the above, here's another test run, now >> with Seagate SATA ST3250620AS ("desktop" class) 250GB >> 7200RPM drive: >> >> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W >> 1024k 1 78.4 34.1 33.5 24.6 19.6/19.5 16.0/12.7 >> 2 33.3 24.6 15.4/13.8 >> 4 34.3 25.0 14.7/15.0 > >> And second, so far I haven't seen a case where a drive >> with NCQ/TCQ enabled works worse than without. I don't >> want to say there aren't such drives/controllers, but >> it just happen that I haven't seen any.) > > Yes you have - the random writes with large blocks and 2 or 4 threads > is significantly better for your non-NCQ drive; and getting more > significant as you add more threads - I'm curious what happens > on 8 threads or more. Both drives shown above are with [NT]CQ enabled. And the first drive above (74Gb SCSI, where the speed increases with the amount of threads) is the one which has "better" TCQ implementation. When I turn off TCQ for that drive, there's almost no speed increase while increasing number of threads. (I can't test this drive now as it's in production. The results where gathered before I installed the system on it). /mjt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-24 11:58 ` Michael Tokarev 2007-06-24 12:59 ` Dr. David Alan Gilbert @ 2007-07-05 22:12 ` Phillip Susi 1 sibling, 0 replies; 17+ messages in thread From: Phillip Susi @ 2007-07-05 22:12 UTC (permalink / raw) To: Michael Tokarev Cc: Jeff Garzik, Carlo Wood, Tejun Heo, Manoj Kasichainula, linux-kernel, IDE/ATA development list Michael Tokarev wrote: > Single Seagate 74Gb SCSI drive (10KRPM) > > BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W > 4k 1 66.4 0.5 0.6 0.5 0.6/ 0.6 0.4/ 0.2 > 2 0.6 0.6 0.5/ 0.1 > 4 0.7 0.6 0.6/ 0.2 > 16k 1 84.8 2.0 2.5 1.9 2.5/ 2.5 1.6/ 0.6 > 2 2.3 2.1 2.0/ 0.6 > 4 2.7 2.5 2.3/ 0.6 > 64k 1 84.8 7.4 9.3 7.2 9.4/ 9.3 5.8/ 2.2 > 2 8.6 7.9 7.3/ 2.1 > 4 9.9 9.1 8.1/ 2.2 > 128k 1 84.8 13.6 16.7 12.9 16.9/16.6 10.6/ 3.9 > 2 15.6 14.4 13.5/ 3.2 > 4 17.9 16.4 15.7/ 2.7 > 512k 1 84.9 34.0 41.9 33.3 29.0/27.1 22.4/13.2 > 2 36.9 34.5 30.7/ 8.1 > 4 40.5 38.1 33.2/ 8.3 > 1024k 1 83.1 36.0 55.8 34.6 28.2/27.6 20.3/19.4 > 2 45.2 44.1 36.4/ 9.9 > 4 48.1 47.6 40.7/ 7.1 > <snip> > The only thing I don't understand is why with larger I/O block > size we see write speed drop with multiple threads. Huh? Your data table does not show larger block size dropping write speed. 47.6 > 38.1 > 16.4. > And in contrast to the above, here's another test run, now > with Seagate SATA ST3250620AS ("desktop" class) 250GB > 7200RPM drive: > > BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W > 4k 1 47.5 0.3 0.5 0.3 0.3/ 0.3 0.1/ 0.1 > 2 0.3 0.3 0.2/ 0.1 > 4 0.3 0.3 0.2/ 0.2 > 16k 1 78.4 1.1 1.8 1.1 0.9/ 0.9 0.6/ 0.6 > 2 1.2 1.1 0.6/ 0.6 > 4 1.3 1.2 0.6/ 0.6 > 64k 1 78.4 4.3 6.7 4.0 3.5/ 3.5 2.1/ 2.2 > 2 4.5 4.1 2.2/ 2.3 > 4 4.7 4.2 2.3/ 2.4 > 128k 1 78.4 8.0 12.6 7.2 6.2/ 6.2 3.9/ 3.8 > 2 8.2 7.3 4.1/ 4.0 > 4 8.7 7.7 4.3/ 4.3 > 512k 1 78.5 23.1 34.0 20.3 17.1/17.1 11.3/10.7 > 2 23.5 20.6 11.3/11.4 > 4 24.7 21.3 11.6/11.8 > 1024k 1 78.4 34.1 33.5 24.6 19.6/19.5 16.0/12.7 > 2 33.3 24.6 15.4/13.8 > 4 34.3 25.0 14.7/15.0 > > Here, the (total) I/O speed does not depend on the number > of threads. From which I conclude that the drive does > not reorder/optimize commands internally, even if NCQ is > enabled (queue depth is 32). While the difference does not appear to be as pronounced as with the WD drive, the data does show more threads give more total IO. 4.7 > 4.5 > 4.3 in the 64k rndRd test, and the other tests show an increase with more threads as well. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s 2007-06-23 7:03 ` SATA RAID5 speed drop of 100 MB/s Jeff Garzik 2007-06-23 7:54 ` Tejun Heo 2007-06-23 12:53 ` Carlo Wood @ 2007-06-24 0:54 ` Eyal Lebedinsky 2 siblings, 0 replies; 17+ messages in thread From: Eyal Lebedinsky @ 2007-06-24 0:54 UTC (permalink / raw) To: Jeff Garzik; +Cc: IDE/ATA development list Jeff Garzik wrote: [trim] > Given that some drives might be better tuned for benchmarks in > non-queued mode, and that a major behavior difference is that your > drives are now NCQ-enabled, the first thing I would suggest you try is > disabling NCQ: > http://linux-ata.org/faq.html#ncq I see in my bootup messages: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.00: ATA-7: WDC WD3200YS-01PGB0, 21.00M21, max UDMA/133 ata6.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 0/1) ata6.00: configured for UDMA/133 and I wonder how to interpret "NCQ (depth 0/1)". Does this drive support NCQ or not? Controller: Promise SATA-II-150-TX4. Kernel: 2.6.21.5, x86 -- Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
@ 2007-06-24 9:01 Mikael Pettersson
0 siblings, 0 replies; 17+ messages in thread
From: Mikael Pettersson @ 2007-06-24 9:01 UTC (permalink / raw)
To: eyal, jeff; +Cc: linux-ide
On Sun, 24 Jun 2007 10:54:56 +1000, Eyal Lebedinsky wrote:
> Jeff Garzik wrote:
> [trim]
> > Given that some drives might be better tuned for benchmarks in
> > non-queued mode, and that a major behavior difference is that your
> > drives are now NCQ-enabled, the first thing I would suggest you try is
> > disabling NCQ:
> > http://linux-ata.org/faq.html#ncq
>
> I see in my bootup messages:
> ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata6.00: ATA-7: WDC WD3200YS-01PGB0, 21.00M21, max UDMA/133
> ata6.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 0/1)
> ata6.00: configured for UDMA/133
>
> and I wonder how to interpret "NCQ (depth 0/1)". Does this drive
> support NCQ or not?
>
> Controller: Promise SATA-II-150-TX4.
> Kernel: 2.6.21.5, x86
Your drive does, but the driver for your controller does not (yet).
/Mikael
^ permalink raw reply [flat|nested] 17+ messages in threadend of thread, other threads:[~2007-07-05 22:12 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070620224847.GA5488@alinoe.com>
[not found] ` <4679B2DE.9090903@garzik.org>
[not found] ` <20070622214859.GC6970@alinoe.com>
2007-06-23 7:03 ` SATA RAID5 speed drop of 100 MB/s Jeff Garzik
2007-06-23 7:54 ` Tejun Heo
2007-06-23 12:53 ` Carlo Wood
2007-06-23 17:30 ` Bartlomiej Zolnierkiewicz
2007-06-23 22:43 ` Jeff Garzik
2007-06-24 11:58 ` Michael Tokarev
2007-06-24 12:59 ` Dr. David Alan Gilbert
2007-06-24 14:21 ` Justin Piszcz
2007-06-24 15:52 ` Michael Tokarev
2007-06-24 16:59 ` Justin Piszcz
2007-06-24 22:07 ` Carlo Wood
2007-06-24 23:46 ` Mark Lord
2007-06-25 0:23 ` Patrick Mau
2007-06-24 15:48 ` Michael Tokarev
2007-07-05 22:12 ` Phillip Susi
2007-06-24 0:54 ` Eyal Lebedinsky
2007-06-24 9:01 Mikael Pettersson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).