* Re: SATA RAID5 speed drop of 100 MB/s
[not found] ` <20070622214859.GC6970@alinoe.com>
@ 2007-06-23 7:03 ` Jeff Garzik
2007-06-23 7:54 ` Tejun Heo
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Jeff Garzik @ 2007-06-23 7:03 UTC (permalink / raw)
To: Carlo Wood
Cc: Tejun Heo, Manoj Kasichainula, linux-kernel,
IDE/ATA development list
[-- Attachment #1: Type: text/plain, Size: 1122 bytes --]
Carlo Wood wrote:
> The dmesg output of 33480a0ede8dcc7e6483054279008f972bd56fd3 (thus
> "before") is:
[...]
> And the dmesg output of 551c012d7eea3dc5ec063c7ff9c718d39e77634f (thus
> "after") is:
[...]
Your disk configurations are quite radically different between the two
kernels (see attached diff for key highlights).
The new behavior of the more recent kernel (551c012d7...) is that it now
fully drives your hardware :) The reset problems go away, NCQ is
enabled, and if you had 3.0Gbps drives (you don't) they would be driven
at a faster speed.
Given that some drives might be better tuned for benchmarks in
non-queued mode, and that a major behavior difference is that your
drives are now NCQ-enabled, the first thing I would suggest you try is
disabling NCQ:
http://linux-ata.org/faq.html#ncq
Other indicators are the other changes in the "ahci 0000:00:1f.2:
flags:" line, which do affect other behaviors, though none so important
to RAID5 performance as NCQ, I would think.
Turning on NCQ also potentially affects barrier behavior in RAID, though
I'm guessing that is not a factor here.
Jeff
[-- Attachment #2: diff.txt --]
[-- Type: text/plain, Size: 1673 bytes --]
-ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 6 ports ? Gbps 0x3f impl SATA mode
-ahci 0000:00:1f.2: flags: 64bit ilck stag led pmp pio
+ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 6 ports 3 Gbps 0x3f impl SATA mode
+ahci 0000:00:1f.2: flags: 64bit ncq ilck stag pm led clo pmp pio slum part
scsi0 : ahci
-ata1: softreset failed (port busy but CLO unavailable)
-ata1: softreset failed, retrying in 5 secs
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
-ata1.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 0/32)
+ata1.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 31/32)
ata1.00: ata1: dev 0 multi count 0
ata1.00: configured for UDMA/133
scsi1 : ahci
-ata2: softreset failed (port busy but CLO unavailable)
-ata2: softreset failed, retrying in 5 secs
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
-ata2.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 0/32)
+ata2.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 31/32)
ata2.00: ata2: dev 0 multi count 0
ata2.00: configured for UDMA/133
scsi2 : ahci
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
-ata3.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 0/32)
+ata3.00: ATA-7, max UDMA/133, 145226112 sectors: LBA48 NCQ (depth 31/32)
ata3.00: ata3: dev 0 multi count 0
ata3.00: configured for UDMA/133
scsi3 : ahci
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
-ata4.00: ATA-7, max UDMA/133, 625142448 sectors: LBA48 NCQ (depth 0/32)
+ata4.00: ATA-7, max UDMA/133, 625142448 sectors: LBA48 NCQ (depth 31/32)
ata4.00: configured for UDMA/133
scsi4 : ahci
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-23 7:03 ` SATA RAID5 speed drop of 100 MB/s Jeff Garzik
@ 2007-06-23 7:54 ` Tejun Heo
2007-06-23 12:53 ` Carlo Wood
2007-06-24 0:54 ` Eyal Lebedinsky
2 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2007-06-23 7:54 UTC (permalink / raw)
To: Jeff Garzik
Cc: Carlo Wood, Manoj Kasichainula, linux-kernel,
IDE/ATA development list
Jeff Garzik wrote:
> Carlo Wood wrote:
>> The dmesg output of 33480a0ede8dcc7e6483054279008f972bd56fd3 (thus
>> "before") is:
> [...]
>> And the dmesg output of 551c012d7eea3dc5ec063c7ff9c718d39e77634f (thus
>> "after") is:
> [...]
>
> Your disk configurations are quite radically different between the two
> kernels (see attached diff for key highlights).
>
> The new behavior of the more recent kernel (551c012d7...) is that it now
> fully drives your hardware :) The reset problems go away, NCQ is
> enabled, and if you had 3.0Gbps drives (you don't) they would be driven
> at a faster speed.
>
> Given that some drives might be better tuned for benchmarks in
> non-queued mode, and that a major behavior difference is that your
> drives are now NCQ-enabled, the first thing I would suggest you try is
> disabling NCQ:
> http://linux-ata.org/faq.html#ncq
>
> Other indicators are the other changes in the "ahci 0000:00:1f.2:
> flags:" line, which do affect other behaviors, though none so important
> to RAID5 performance as NCQ, I would think.
>
> Turning on NCQ also potentially affects barrier behavior in RAID, though
> I'm guessing that is not a factor here.
Ah.. right. That should have enabled NCQ. Me slow today. :-)
--
tejun
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-23 7:03 ` SATA RAID5 speed drop of 100 MB/s Jeff Garzik
2007-06-23 7:54 ` Tejun Heo
@ 2007-06-23 12:53 ` Carlo Wood
2007-06-23 17:30 ` Bartlomiej Zolnierkiewicz
2007-06-23 22:43 ` Jeff Garzik
2007-06-24 0:54 ` Eyal Lebedinsky
2 siblings, 2 replies; 17+ messages in thread
From: Carlo Wood @ 2007-06-23 12:53 UTC (permalink / raw)
To: Jeff Garzik
Cc: Tejun Heo, Manoj Kasichainula, linux-kernel,
IDE/ATA development list
On Sat, Jun 23, 2007 at 03:03:33AM -0400, Jeff Garzik wrote:
> Your disk configurations are quite radically different between the two
> kernels (see attached diff for key highlights).
>
> The new behavior of the more recent kernel (551c012d7...) is that it now
> fully drives your hardware :) The reset problems go away, NCQ is
> enabled, and if you had 3.0Gbps drives (you don't) they would be driven
> at a faster speed.
>
> Given that some drives might be better tuned for benchmarks in
> non-queued mode, and that a major behavior difference is that your
> drives are now NCQ-enabled, the first thing I would suggest you try is
> disabling NCQ:
> http://linux-ata.org/faq.html#ncq
Thanks! That is indeed the difference that causes the drop of
"hdparm -tT" that I observed.
After setting /sys/block/sdX/device/queue_depth of all three drives
to 1, I get again
/dev/md2:
Timing cached reads: 8252 MB in 2.00 seconds = 4130.59 MB/sec
Timing buffered disk reads: 496 MB in 3.01 seconds = 164.88 MB/sec
on 2.6.22-rc5.
> Other indicators are the other changes in the "ahci 0000:00:1f.2:
> flags:" line, which do affect other behaviors, though none so important
> to RAID5 performance as NCQ, I would think.
>
> Turning on NCQ also potentially affects barrier behavior in RAID, though
> I'm guessing that is not a factor here.
Of course, I am not really interested in what "hdparm -tT" gives, but
rather in a high performance during real-life use of the disks.
Is it possible that the measurement with "hdparm -tT" returns a higher
value for some setting, but that the over-all real-life performance
drops?
Also, the effect of this setting is nil for the individual drives.
hdparm -tT /dev/sda gives me still around 65 MB/s. I don't understand
why this setting has such a HUGE effect on RAID5 while the underlaying
drives themselves don't seem affected.
PS I'd like to do extensive testing with Bonnie++ to tune everything
there is to tune. But bonnie likes to write/read files TWICE the amount
of RAM I have. It therefore takes a LOT of time to run one test. Do you
happen to know how I can limit the amount of RAM that the linux kernel
sees to, say 500 MB? That should be enough to run in Single User mode
but allow me to run the tests MUCH faster. (I have dual channel, four
DIMM's of 1 GB each -- 2 GB per Core 2 die. Hopefully the fact that
I have dual channel isn't going to be a problem when limiting the ram
that the kernel sees.)
--
Carlo Wood <carlo@alinoe.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-23 12:53 ` Carlo Wood
@ 2007-06-23 17:30 ` Bartlomiej Zolnierkiewicz
2007-06-23 22:43 ` Jeff Garzik
1 sibling, 0 replies; 17+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2007-06-23 17:30 UTC (permalink / raw)
To: Carlo Wood
Cc: Jeff Garzik, Tejun Heo, Manoj Kasichainula, linux-kernel,
IDE/ATA development list
Hi,
On Saturday 23 June 2007, Carlo Wood wrote:
> PS I'd like to do extensive testing with Bonnie++ to tune everything
> there is to tune. But bonnie likes to write/read files TWICE the amount
> of RAM I have. It therefore takes a LOT of time to run one test. Do you
> happen to know how I can limit the amount of RAM that the linux kernel
> sees to, say 500 MB? That should be enough to run in Single User mode
> but allow me to run the tests MUCH faster. (I have dual channel, four
> DIMM's of 1 GB each -- 2 GB per Core 2 die. Hopefully the fact that
> I have dual channel isn't going to be a problem when limiting the ram
> that the kernel sees.)
"mem=" kernel parameter limits amount of memory seen by kernel
(more info in Documentation/kernel-parameters.txt)
You can also limit amount of RAM detected by bonnie++ by using -r parameter
but please remember that this will make bonnie++ benchmark combined kernel
I/O buffering + filesystem + hard disk performance instead of just filesystem
+ hard disk performance (as it can happen that some / all data won't ever
hit the disk).
Bart
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-23 12:53 ` Carlo Wood
2007-06-23 17:30 ` Bartlomiej Zolnierkiewicz
@ 2007-06-23 22:43 ` Jeff Garzik
2007-06-24 11:58 ` Michael Tokarev
1 sibling, 1 reply; 17+ messages in thread
From: Jeff Garzik @ 2007-06-23 22:43 UTC (permalink / raw)
To: Carlo Wood, Jeff Garzik, Tejun Heo, Manoj Kasichainula,
linux-kernel, IDE/ATA development list
Carlo Wood wrote:
> Is it possible that the measurement with "hdparm -tT" returns a higher
> value for some setting, but that the over-all real-life performance
> drops?
IN THEORY, RAID performance should /increase/ due to additional queued
commands available to be sent to the drive. NCQ == command queueing ==
sending multiple commands to the drive, rather than one-at-a-time like
normal.
But hdparm isn't the best test for that theory, since it does not
simulate the transactions like real-world MD device usage does.
We have seen buggy NCQ firmwares where performance decreases, so it is
possible that NCQ just isn't good on your drives.
Jeff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-23 7:03 ` SATA RAID5 speed drop of 100 MB/s Jeff Garzik
2007-06-23 7:54 ` Tejun Heo
2007-06-23 12:53 ` Carlo Wood
@ 2007-06-24 0:54 ` Eyal Lebedinsky
2 siblings, 0 replies; 17+ messages in thread
From: Eyal Lebedinsky @ 2007-06-24 0:54 UTC (permalink / raw)
To: Jeff Garzik; +Cc: IDE/ATA development list
Jeff Garzik wrote:
[trim]
> Given that some drives might be better tuned for benchmarks in
> non-queued mode, and that a major behavior difference is that your
> drives are now NCQ-enabled, the first thing I would suggest you try is
> disabling NCQ:
> http://linux-ata.org/faq.html#ncq
I see in my bootup messages:
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata6.00: ATA-7: WDC WD3200YS-01PGB0, 21.00M21, max UDMA/133
ata6.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 0/1)
ata6.00: configured for UDMA/133
and I wonder how to interpret "NCQ (depth 0/1)". Does this drive
support NCQ or not?
Controller: Promise SATA-II-150-TX4.
Kernel: 2.6.21.5, x86
--
Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
@ 2007-06-24 9:01 Mikael Pettersson
0 siblings, 0 replies; 17+ messages in thread
From: Mikael Pettersson @ 2007-06-24 9:01 UTC (permalink / raw)
To: eyal, jeff; +Cc: linux-ide
On Sun, 24 Jun 2007 10:54:56 +1000, Eyal Lebedinsky wrote:
> Jeff Garzik wrote:
> [trim]
> > Given that some drives might be better tuned for benchmarks in
> > non-queued mode, and that a major behavior difference is that your
> > drives are now NCQ-enabled, the first thing I would suggest you try is
> > disabling NCQ:
> > http://linux-ata.org/faq.html#ncq
>
> I see in my bootup messages:
> ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata6.00: ATA-7: WDC WD3200YS-01PGB0, 21.00M21, max UDMA/133
> ata6.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 0/1)
> ata6.00: configured for UDMA/133
>
> and I wonder how to interpret "NCQ (depth 0/1)". Does this drive
> support NCQ or not?
>
> Controller: Promise SATA-II-150-TX4.
> Kernel: 2.6.21.5, x86
Your drive does, but the driver for your controller does not (yet).
/Mikael
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-23 22:43 ` Jeff Garzik
@ 2007-06-24 11:58 ` Michael Tokarev
2007-06-24 12:59 ` Dr. David Alan Gilbert
2007-07-05 22:12 ` Phillip Susi
0 siblings, 2 replies; 17+ messages in thread
From: Michael Tokarev @ 2007-06-24 11:58 UTC (permalink / raw)
To: Jeff Garzik
Cc: Carlo Wood, Tejun Heo, Manoj Kasichainula, linux-kernel,
IDE/ATA development list
Jeff Garzik wrote:
> IN THEORY, RAID performance should /increase/ due to additional queued
> commands available to be sent to the drive. NCQ == command queueing ==
> sending multiple commands to the drive, rather than one-at-a-time like
> normal.
>
> But hdparm isn't the best test for that theory, since it does not
> simulate the transactions like real-world MD device usage does.
>
> We have seen buggy NCQ firmwares where performance decreases, so it is
> possible that NCQ just isn't good on your drives.
By the way, I did some testing of various drives, and NCQ/TCQ indeed
shows some difference -- with multiple I/O processes (like "server"
workload), IF NCQ/TCQ is implemented properly, especially in the
drive.
For example, this is a good one:
Single Seagate 74Gb SCSI drive (10KRPM)
BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
4k 1 66.4 0.5 0.6 0.5 0.6/ 0.6 0.4/ 0.2
2 0.6 0.6 0.5/ 0.1
4 0.7 0.6 0.6/ 0.2
16k 1 84.8 2.0 2.5 1.9 2.5/ 2.5 1.6/ 0.6
2 2.3 2.1 2.0/ 0.6
4 2.7 2.5 2.3/ 0.6
64k 1 84.8 7.4 9.3 7.2 9.4/ 9.3 5.8/ 2.2
2 8.6 7.9 7.3/ 2.1
4 9.9 9.1 8.1/ 2.2
128k 1 84.8 13.6 16.7 12.9 16.9/16.6 10.6/ 3.9
2 15.6 14.4 13.5/ 3.2
4 17.9 16.4 15.7/ 2.7
512k 1 84.9 34.0 41.9 33.3 29.0/27.1 22.4/13.2
2 36.9 34.5 30.7/ 8.1
4 40.5 38.1 33.2/ 8.3
1024k 1 83.1 36.0 55.8 34.6 28.2/27.6 20.3/19.4
2 45.2 44.1 36.4/ 9.9
4 48.1 47.6 40.7/ 7.1
The tests are direct-I/O over whole drive (/dev/sdX), with
either 1, 2, or 4 threads doing sequential or random reads
or writes in blocks of a given size. For the R/W tests,
we've 2, 4 or 8 threads running in total (1, 2 or 4 readers
and the same amount of writers). Numbers are MB/sec, as
totals (summary) for all threads.
Especially interesting is the very last column - random R/W
in parallel. In almost all cases, more threads gives larger
total speed (I *guess* it's due to internal optimisations in
the drive -- with more threads the drive has more chances to
reorder commands to minimize seek time etc).
The only thing I don't understand is why with larger I/O block
size we see write speed drop with multiple threads.
And in contrast to the above, here's another test run, now
with Seagate SATA ST3250620AS ("desktop" class) 250GB
7200RPM drive:
BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
4k 1 47.5 0.3 0.5 0.3 0.3/ 0.3 0.1/ 0.1
2 0.3 0.3 0.2/ 0.1
4 0.3 0.3 0.2/ 0.2
16k 1 78.4 1.1 1.8 1.1 0.9/ 0.9 0.6/ 0.6
2 1.2 1.1 0.6/ 0.6
4 1.3 1.2 0.6/ 0.6
64k 1 78.4 4.3 6.7 4.0 3.5/ 3.5 2.1/ 2.2
2 4.5 4.1 2.2/ 2.3
4 4.7 4.2 2.3/ 2.4
128k 1 78.4 8.0 12.6 7.2 6.2/ 6.2 3.9/ 3.8
2 8.2 7.3 4.1/ 4.0
4 8.7 7.7 4.3/ 4.3
512k 1 78.5 23.1 34.0 20.3 17.1/17.1 11.3/10.7
2 23.5 20.6 11.3/11.4
4 24.7 21.3 11.6/11.8
1024k 1 78.4 34.1 33.5 24.6 19.6/19.5 16.0/12.7
2 33.3 24.6 15.4/13.8
4 34.3 25.0 14.7/15.0
Here, the (total) I/O speed does not depend on the number
of threads. From which I conclude that the drive does
not reorder/optimize commands internally, even if NCQ is
enabled (queue depth is 32).
(And two notes. First of all, for some, those tables may
look.. strange, showing too low speed. Note the block
size, and note I'm doing *direct* *random* I/O, without
buffering in the kernel. Yes, even the most advanced
modern drives are very slow in this workload, due to
seek times and rotation latency -- the disk is maxing
out at the theoretical requests/secound -- take average
seek time plus rotation latency (usually given in the
drive specs), and divide one secound to the calculated
value -- you'll see about 200..250 - that's requests/sec.
And the numbers - like 0.3Mb/sec write - are very close
to those 200..250. In any way, this is not a typical
workload - file server for example is not like this.
But it's more or less resembles database workload.
And second, so far I haven't seen a case where a drive
with NCQ/TCQ enabled works worse than without. I don't
want to say there aren't such drives/controllers, but
it just happen that I haven't seen any.)
/mjt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-24 11:58 ` Michael Tokarev
@ 2007-06-24 12:59 ` Dr. David Alan Gilbert
2007-06-24 14:21 ` Justin Piszcz
2007-06-24 15:48 ` Michael Tokarev
2007-07-05 22:12 ` Phillip Susi
1 sibling, 2 replies; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2007-06-24 12:59 UTC (permalink / raw)
To: Michael Tokarev
Cc: Jeff Garzik, Carlo Wood, Tejun Heo, Manoj Kasichainula,
linux-kernel, IDE/ATA development list
* Michael Tokarev (mjt@tls.msk.ru) wrote:
<snip>
> By the way, I did some testing of various drives, and NCQ/TCQ indeed
> shows some difference -- with multiple I/O processes (like "server"
> workload), IF NCQ/TCQ is implemented properly, especially in the
> drive.
>
> For example, this is a good one:
>
> Single Seagate 74Gb SCSI drive (10KRPM)
>
> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
<snip>
> 1024k 1 83.1 36.0 55.8 34.6 28.2/27.6 20.3/19.4
> 2 45.2 44.1 36.4/ 9.9
> 4 48.1 47.6 40.7/ 7.1
>
> The tests are direct-I/O over whole drive (/dev/sdX), with
> either 1, 2, or 4 threads doing sequential or random reads
> or writes in blocks of a given size. For the R/W tests,
> we've 2, 4 or 8 threads running in total (1, 2 or 4 readers
> and the same amount of writers). Numbers are MB/sec, as
> totals (summary) for all threads.
>
> Especially interesting is the very last column - random R/W
> in parallel. In almost all cases, more threads gives larger
> total speed (I *guess* it's due to internal optimisations in
> the drive -- with more threads the drive has more chances to
> reorder commands to minimize seek time etc).
>
> The only thing I don't understand is why with larger I/O block
> size we see write speed drop with multiple threads.
My guess is that something is chopping them up into smaller writes.
> And in contrast to the above, here's another test run, now
> with Seagate SATA ST3250620AS ("desktop" class) 250GB
> 7200RPM drive:
>
> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
<snip>
> 1024k 1 78.4 34.1 33.5 24.6 19.6/19.5 16.0/12.7
> 2 33.3 24.6 15.4/13.8
> 4 34.3 25.0 14.7/15.0
>
<snip>
> And second, so far I haven't seen a case where a drive
> with NCQ/TCQ enabled works worse than without. I don't
> want to say there aren't such drives/controllers, but
> it just happen that I haven't seen any.)
Yes you have - the random writes with large blocks and 2 or 4 threads
is significantly better for your non-NCQ drive; and getting more
significant as you add more threads - I'm curious what happens
on 8 threads or more.
Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-24 12:59 ` Dr. David Alan Gilbert
@ 2007-06-24 14:21 ` Justin Piszcz
2007-06-24 15:52 ` Michael Tokarev
2007-06-24 15:48 ` Michael Tokarev
1 sibling, 1 reply; 17+ messages in thread
From: Justin Piszcz @ 2007-06-24 14:21 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Michael Tokarev, Jeff Garzik, Carlo Wood, Tejun Heo,
Manoj Kasichainula, linux-kernel, IDE/ATA development list
Don't forget about max_sectors_kb either (for all drives in the SW RAID5
array)
max_sectors_kb = 8
$ dd if=/dev/zero of=file.out6 bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 55.4848 seconds, 194 MB/s
max_sectors_kb = 16
$ dd if=/dev/zero of=file.out5 bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 37.6886 seconds, 285 MB/s
max_sectors_kb = 32
$ dd if=/dev/zero of=file.out4 bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 26.2875 seconds, 408 MB/s
max_sectors_kb = 64
$ dd if=/dev/zero of=file.out2 bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 24.8301 seconds, 432 MB/s
max_sectors_kb = 128
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 22.6298 seconds, 474 MB/s
On Sun, 24 Jun 2007, Dr. David Alan Gilbert wrote:
> * Michael Tokarev (mjt@tls.msk.ru) wrote:
>
> <snip>
>
>> By the way, I did some testing of various drives, and NCQ/TCQ indeed
>> shows some difference -- with multiple I/O processes (like "server"
>> workload), IF NCQ/TCQ is implemented properly, especially in the
>> drive.
>>
>> For example, this is a good one:
>>
>> Single Seagate 74Gb SCSI drive (10KRPM)
>>
>> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
>
> <snip>
>
>> 1024k 1 83.1 36.0 55.8 34.6 28.2/27.6 20.3/19.4
>> 2 45.2 44.1 36.4/ 9.9
>> 4 48.1 47.6 40.7/ 7.1
>>
>> The tests are direct-I/O over whole drive (/dev/sdX), with
>> either 1, 2, or 4 threads doing sequential or random reads
>> or writes in blocks of a given size. For the R/W tests,
>> we've 2, 4 or 8 threads running in total (1, 2 or 4 readers
>> and the same amount of writers). Numbers are MB/sec, as
>> totals (summary) for all threads.
>>
>> Especially interesting is the very last column - random R/W
>> in parallel. In almost all cases, more threads gives larger
>> total speed (I *guess* it's due to internal optimisations in
>> the drive -- with more threads the drive has more chances to
>> reorder commands to minimize seek time etc).
>>
>> The only thing I don't understand is why with larger I/O block
>> size we see write speed drop with multiple threads.
>
> My guess is that something is chopping them up into smaller writes.
>
>> And in contrast to the above, here's another test run, now
>> with Seagate SATA ST3250620AS ("desktop" class) 250GB
>> 7200RPM drive:
>>
>> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
>
> <snip>
>
>> 1024k 1 78.4 34.1 33.5 24.6 19.6/19.5 16.0/12.7
>> 2 33.3 24.6 15.4/13.8
>> 4 34.3 25.0 14.7/15.0
>>
>
> <snip>
>
>> And second, so far I haven't seen a case where a drive
>> with NCQ/TCQ enabled works worse than without. I don't
>> want to say there aren't such drives/controllers, but
>> it just happen that I haven't seen any.)
>
> Yes you have - the random writes with large blocks and 2 or 4 threads
> is significantly better for your non-NCQ drive; and getting more
> significant as you add more threads - I'm curious what happens
> on 8 threads or more.
>
> Dave
> --
> -----Open up your eyes, open up your mind, open up your code -------
> / Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
> \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
> \ _________________________|_____ http://www.treblig.org |_______/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-24 12:59 ` Dr. David Alan Gilbert
2007-06-24 14:21 ` Justin Piszcz
@ 2007-06-24 15:48 ` Michael Tokarev
1 sibling, 0 replies; 17+ messages in thread
From: Michael Tokarev @ 2007-06-24 15:48 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Jeff Garzik, Carlo Wood, Tejun Heo, Manoj Kasichainula,
linux-kernel, IDE/ATA development list
Dr. David Alan Gilbert wrote:
> * Michael Tokarev (mjt@tls.msk.ru) wrote:
>
> <snip>
>
>> By the way, I did some testing of various drives, and NCQ/TCQ indeed
>> shows some difference -- with multiple I/O processes (like "server"
>> workload), IF NCQ/TCQ is implemented properly, especially in the
>> drive.
>>
>> For example, this is a good one:
>>
>> Single Seagate 74Gb SCSI drive (10KRPM)
>>
>> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
>> 1024k 1 83.1 36.0 55.8 34.6 28.2/27.6 20.3/19.4
>> 2 45.2 44.1 36.4/ 9.9
>> 4 48.1 47.6 40.7/ 7.1
[]
>> The only thing I don't understand is why with larger I/O block
>> size we see write speed drop with multiple threads.
>
> My guess is that something is chopping them up into smaller writes.
At least it's not in the kernel. According to /proc/diskstats,
the requests goes in 1024kb into the drive.
>> And in contrast to the above, here's another test run, now
>> with Seagate SATA ST3250620AS ("desktop" class) 250GB
>> 7200RPM drive:
>>
>> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
>> 1024k 1 78.4 34.1 33.5 24.6 19.6/19.5 16.0/12.7
>> 2 33.3 24.6 15.4/13.8
>> 4 34.3 25.0 14.7/15.0
>
>> And second, so far I haven't seen a case where a drive
>> with NCQ/TCQ enabled works worse than without. I don't
>> want to say there aren't such drives/controllers, but
>> it just happen that I haven't seen any.)
>
> Yes you have - the random writes with large blocks and 2 or 4 threads
> is significantly better for your non-NCQ drive; and getting more
> significant as you add more threads - I'm curious what happens
> on 8 threads or more.
Both drives shown above are with [NT]CQ enabled. And the first drive
above (74Gb SCSI, where the speed increases with the amount of threads)
is the one which has "better" TCQ implementation. When I turn off TCQ
for that drive, there's almost no speed increase while increasing number
of threads.
(I can't test this drive now as it's in production. The results where
gathered before I installed the system on it).
/mjt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-24 14:21 ` Justin Piszcz
@ 2007-06-24 15:52 ` Michael Tokarev
2007-06-24 16:59 ` Justin Piszcz
0 siblings, 1 reply; 17+ messages in thread
From: Michael Tokarev @ 2007-06-24 15:52 UTC (permalink / raw)
To: Justin Piszcz
Cc: Dr. David Alan Gilbert, Jeff Garzik, Carlo Wood, Tejun Heo,
Manoj Kasichainula, linux-kernel, IDE/ATA development list
Justin Piszcz wrote:
> Don't forget about max_sectors_kb either (for all drives in the SW RAID5
> array)
>
> max_sectors_kb = 8
> $ dd if=/dev/zero of=file.out6 bs=1M count=10240
> 10737418240 bytes (11 GB) copied, 55.4848 seconds, 194 MB/s
>
> max_sectors_kb = 128
> 10737418240 bytes (11 GB) copied, 22.6298 seconds, 474 MB/s
Well. You're comparing something different. Yes, this
thread is about linux software raid5 in the first place,
but I were commenting about [NT]CQ within a single drive.
Overall, yes, the larger your reads/writes to the drive
becomes, the faster its linear performance is. Yet you
have to consider real workload instead of very synthetic
dd test. It may be good approcsimation of a streaming
video workload (when you feed a large video file over
network or something like that), but even with this,
you probably want to feed several files at once (different
files to different clients), so single-threaded test
here isn't very useful. IMHO anyway, and good for a
personal computer test.
/mjt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-24 15:52 ` Michael Tokarev
@ 2007-06-24 16:59 ` Justin Piszcz
2007-06-24 22:07 ` Carlo Wood
0 siblings, 1 reply; 17+ messages in thread
From: Justin Piszcz @ 2007-06-24 16:59 UTC (permalink / raw)
To: Michael Tokarev
Cc: Dr. David Alan Gilbert, Jeff Garzik, Carlo Wood, Tejun Heo,
Manoj Kasichainula, linux-kernel, IDE/ATA development list
On Sun, 24 Jun 2007, Michael Tokarev wrote:
> Justin Piszcz wrote:
>> Don't forget about max_sectors_kb either (for all drives in the SW RAID5
>> array)
>>
>> max_sectors_kb = 8
>> $ dd if=/dev/zero of=file.out6 bs=1M count=10240
>> 10737418240 bytes (11 GB) copied, 55.4848 seconds, 194 MB/s
>>
>> max_sectors_kb = 128
>> 10737418240 bytes (11 GB) copied, 22.6298 seconds, 474 MB/s
>
> Well. You're comparing something different. Yes, this
> thread is about linux software raid5 in the first place,
> but I were commenting about [NT]CQ within a single drive.
>
> Overall, yes, the larger your reads/writes to the drive
> becomes, the faster its linear performance is. Yet you
> have to consider real workload instead of very synthetic
> dd test. It may be good approcsimation of a streaming
> video workload (when you feed a large video file over
> network or something like that), but even with this,
> you probably want to feed several files at once (different
> files to different clients), so single-threaded test
> here isn't very useful. IMHO anyway, and good for a
> personal computer test.
>
> /mjt
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Concerning NCQ/no NCQ, without NCQ I get an additional 15-50MB/s in speed
per various bonnie++ tests.
# Average of 3 runs with NCQ on for Quad Raptor 150 RAID 5 Software RAID:
p34-ncq-on,7952M,43916.3,96.6667,151943,28.6667,75794.3,18.6667,48991.3,99,181687,24,558.033,0.333333,16:100000:16/64,867.667,9,29972.7,98.3333,2801.67,16,890.667,9.33333,27743,94.3333,2115.33,15.6667
# Average of 3 runs with NCQ off for Quad Raptor 150 RAID 5 Software RAID:
p34-ncq-off,7952M,42470,97.3333,200409,36.3333,90240.3,22.6667,48656,99,198853,27,546.467,0,16:100000:16/64,972.333,10,21833,72.3333,3697,21,995,10.6667,27901.7,95.6667,2681,20.6667
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-24 16:59 ` Justin Piszcz
@ 2007-06-24 22:07 ` Carlo Wood
2007-06-24 23:46 ` Mark Lord
2007-06-25 0:23 ` Patrick Mau
0 siblings, 2 replies; 17+ messages in thread
From: Carlo Wood @ 2007-06-24 22:07 UTC (permalink / raw)
To: Justin Piszcz
Cc: Michael Tokarev, Dr. David Alan Gilbert, Jeff Garzik, Tejun Heo,
Manoj Kasichainula, linux-kernel, IDE/ATA development list
On Sun, Jun 24, 2007 at 12:59:10PM -0400, Justin Piszcz wrote:
> Concerning NCQ/no NCQ, without NCQ I get an additional 15-50MB/s in speed
> per various bonnie++ tests.
There is more going on than a bad NCQ implementation of the drive imho.
I did a long test over night (and still only got two schedulers done,
will do the other two tomorrow), and the difference between a queue depth
of 1 and 2 is DRAMATIC.
See http://www.xs4all.nl/~carlo17/noop_queue_depth.png
and http://www.xs4all.nl/~carlo17/anticipatory_queue_depth.png
The bonnie++ tests are done in a directory on the /dev/md7 and
/dev/ssd2 partitions respectively. Each bonnie test is performed
four times.
The hdparm -t tests (that show no difference with a -tT test) are
each done five times, for /dev/sdd, /dev/md7 and /dev/sda (that is
one of the RAID5 drives used for /dev/md7).
Thus in total there are 2 * 4 + 3 * 5 = 23 data points per
queue depth value in each graph.
The following can be observed:
1) There is hardly any difference between the two schedulers (noop
is a little faster for the bonny test).
2) An NCQ depth of 1 is WAY faster on RAID5 (bonnie; around 125 MB/s),
the NCQ depth of 2 is by far the slowest for the RAID5 (bonnie;
around 40 MB/s). NCQ depths of 3 and higher show no difference,
but are also slow (bonnie; around 75 MB/s).
3) There is no significant influence of the NCQ depth for non-RAID,
either the /dev/sda (hdparm -t) or /dev/sdd disk (hdparm -t and
bonnie).
4) With a NCQ depth > 1, the hdparm -t measurement of /dev/md7 is
VERY unstable. Sometimes it gives the maximum (around 150 MB/s),
and sometimes as low as 30 MB/s, seemingly independent of the
NCQ depth. Note that those measurement were done on an otherwise
unloaded machine in single user mode; and the measurements were
all done one after an other. The strong fluctuation of the hdparm
results for the RAID device (while the underlaying devices do not
show this behaviour) are unexplainable.
>From the above I conclude that something must be wrong with the
software RAID implementation - and not just with the harddisks, imho.
At least, that's what it looks like to me. I am not an expert though ;)
--
Carlo Wood <carlo@alinoe.com>
PS RAID5 (md7 = sda7 + sdb7 + sdc7): Three times a Western Digital
Raptor 10k rpm (WDC WD740ADFD-00NLR1).
non-RAID (sdd2): Seagate barracuda 7200 rpm (ST3320620AS).
The reason that now I measure around 145 MB/s instead of 165 MB/s
as reported in previous post (with hdparm -t /dev/md7) is because
before I use hdparm -t /dev/md2, which is closer to the outside
of the disk and therefore faster. /dev/md2 still is around 165 MB/s.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-24 22:07 ` Carlo Wood
@ 2007-06-24 23:46 ` Mark Lord
2007-06-25 0:23 ` Patrick Mau
1 sibling, 0 replies; 17+ messages in thread
From: Mark Lord @ 2007-06-24 23:46 UTC (permalink / raw)
To: Carlo Wood, Justin Piszcz, Michael Tokarev,
Dr. David Alan Gilbert, Jeff Garzik, Tejun Heo,
Manoj Kasichainula, linux-kernel, IDE/ATA development list
Carlo Wood wrote:
>
> The following can be observed:
>
> 1) There is hardly any difference between the two schedulers (noop
> is a little faster for the bonny test).
> 2) An NCQ depth of 1 is WAY faster on RAID5 (bonnie; around 125 MB/s),
> the NCQ depth of 2 is by far the slowest for the RAID5 (bonnie;
> around 40 MB/s). NCQ depths of 3 and higher show no difference,
> but are also slow (bonnie; around 75 MB/s).
> 3) There is no significant influence of the NCQ depth for non-RAID,
> either the /dev/sda (hdparm -t) or /dev/sdd disk (hdparm -t and
> bonnie).
> 4) With a NCQ depth > 1, the hdparm -t measurement of /dev/md7 is
> VERY unstable. Sometimes it gives the maximum (around 150 MB/s),
> and sometimes as low as 30 MB/s, seemingly independent of the
> NCQ depth. Note that those measurement were done on an otherwise
> unloaded machine in single user mode; and the measurements were
> all done one after an other. The strong fluctuation of the hdparm
> results for the RAID device (while the underlaying devices do not
> show this behaviour) are unexplainable.
>
>>From the above I conclude that something must be wrong with the
> software RAID implementation - and not just with the harddisks, imho.
> At least, that's what it looks like to me. I am not an expert though ;)
I'm late tuning in here, but:
(1) hdparm issues only a single read at a time, so NCQ won't help it.
(2) WD Raptor drives automatically turn off "read-ahead" when using NCQ,
which totally kills any throughput measurements. They do this to speed
up random access seeks; dunno if it pays off or not. Under Windows,
the disk drivers don't use NCQ when performing large I/O operations,
which avoids the performance loss.
(3) Other drives from other brands may have similar issues,
but I have not run into it on them yet.
Cheers
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-24 22:07 ` Carlo Wood
2007-06-24 23:46 ` Mark Lord
@ 2007-06-25 0:23 ` Patrick Mau
1 sibling, 0 replies; 17+ messages in thread
From: Patrick Mau @ 2007-06-25 0:23 UTC (permalink / raw)
To: Carlo Wood, Justin Piszcz, Michael Tokarev,
Dr. David Alan Gilbert, Jeff Garzik, Tejun Heo,
Manoj Kasichainula, linux-kernel, IDE/ATA development list
On Mon, Jun 25, 2007 at 12:07:23AM +0200, Carlo Wood wrote:
> On Sun, Jun 24, 2007 at 12:59:10PM -0400, Justin Piszcz wrote:
> > Concerning NCQ/no NCQ, without NCQ I get an additional 15-50MB/s in speed
> > per various bonnie++ tests.
>
> There is more going on than a bad NCQ implementation of the drive imho.
> I did a long test over night (and still only got two schedulers done,
> will do the other two tomorrow), and the difference between a queue depth
> of 1 and 2 is DRAMATIC.
>
> See http://www.xs4all.nl/~carlo17/noop_queue_depth.png
> and http://www.xs4all.nl/~carlo17/anticipatory_queue_depth.png
Hi Carlo,
Have you considered using "blktrace" ?
It enables you to gather data of all seperate requests queues
and will also show you the mapping of bio request from /dev/mdX
to the individual physical disk.
You can also identify SYNC and BARRIER flags for requests,
that might show you why the md driver will sometimes wait
for completion or even REQUEUE if the queue is full.
Just compile your kernel with CONFIG_BLK_DEV_IO_TRACE
and pull the "blktrace" (and "blockparse") utility with git.
The git URL is in the Kconfig help text.
You have to mount, debugfs (automatically selected by IO trace).
I just want to mention, because I did not figure it at first ;)
You should of course use a different location for the output
files to avoid an endless flood of IO.
Regards,
Patrick
PS: I know, I talked about blktrace twice already ;)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SATA RAID5 speed drop of 100 MB/s
2007-06-24 11:58 ` Michael Tokarev
2007-06-24 12:59 ` Dr. David Alan Gilbert
@ 2007-07-05 22:12 ` Phillip Susi
1 sibling, 0 replies; 17+ messages in thread
From: Phillip Susi @ 2007-07-05 22:12 UTC (permalink / raw)
To: Michael Tokarev
Cc: Jeff Garzik, Carlo Wood, Tejun Heo, Manoj Kasichainula,
linux-kernel, IDE/ATA development list
Michael Tokarev wrote:
> Single Seagate 74Gb SCSI drive (10KRPM)
>
> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
> 4k 1 66.4 0.5 0.6 0.5 0.6/ 0.6 0.4/ 0.2
> 2 0.6 0.6 0.5/ 0.1
> 4 0.7 0.6 0.6/ 0.2
> 16k 1 84.8 2.0 2.5 1.9 2.5/ 2.5 1.6/ 0.6
> 2 2.3 2.1 2.0/ 0.6
> 4 2.7 2.5 2.3/ 0.6
> 64k 1 84.8 7.4 9.3 7.2 9.4/ 9.3 5.8/ 2.2
> 2 8.6 7.9 7.3/ 2.1
> 4 9.9 9.1 8.1/ 2.2
> 128k 1 84.8 13.6 16.7 12.9 16.9/16.6 10.6/ 3.9
> 2 15.6 14.4 13.5/ 3.2
> 4 17.9 16.4 15.7/ 2.7
> 512k 1 84.9 34.0 41.9 33.3 29.0/27.1 22.4/13.2
> 2 36.9 34.5 30.7/ 8.1
> 4 40.5 38.1 33.2/ 8.3
> 1024k 1 83.1 36.0 55.8 34.6 28.2/27.6 20.3/19.4
> 2 45.2 44.1 36.4/ 9.9
> 4 48.1 47.6 40.7/ 7.1
>
<snip>
> The only thing I don't understand is why with larger I/O block
> size we see write speed drop with multiple threads.
Huh? Your data table does not show larger block size dropping write
speed. 47.6 > 38.1 > 16.4.
> And in contrast to the above, here's another test run, now
> with Seagate SATA ST3250620AS ("desktop" class) 250GB
> 7200RPM drive:
>
> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
> 4k 1 47.5 0.3 0.5 0.3 0.3/ 0.3 0.1/ 0.1
> 2 0.3 0.3 0.2/ 0.1
> 4 0.3 0.3 0.2/ 0.2
> 16k 1 78.4 1.1 1.8 1.1 0.9/ 0.9 0.6/ 0.6
> 2 1.2 1.1 0.6/ 0.6
> 4 1.3 1.2 0.6/ 0.6
> 64k 1 78.4 4.3 6.7 4.0 3.5/ 3.5 2.1/ 2.2
> 2 4.5 4.1 2.2/ 2.3
> 4 4.7 4.2 2.3/ 2.4
> 128k 1 78.4 8.0 12.6 7.2 6.2/ 6.2 3.9/ 3.8
> 2 8.2 7.3 4.1/ 4.0
> 4 8.7 7.7 4.3/ 4.3
> 512k 1 78.5 23.1 34.0 20.3 17.1/17.1 11.3/10.7
> 2 23.5 20.6 11.3/11.4
> 4 24.7 21.3 11.6/11.8
> 1024k 1 78.4 34.1 33.5 24.6 19.6/19.5 16.0/12.7
> 2 33.3 24.6 15.4/13.8
> 4 34.3 25.0 14.7/15.0
>
> Here, the (total) I/O speed does not depend on the number
> of threads. From which I conclude that the drive does
> not reorder/optimize commands internally, even if NCQ is
> enabled (queue depth is 32).
While the difference does not appear to be as pronounced as with the WD
drive, the data does show more threads give more total IO. 4.7 > 4.5 >
4.3 in the 64k rndRd test, and the other tests show an increase with
more threads as well.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-07-05 22:12 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070620224847.GA5488@alinoe.com>
[not found] ` <4679B2DE.9090903@garzik.org>
[not found] ` <20070622214859.GC6970@alinoe.com>
2007-06-23 7:03 ` SATA RAID5 speed drop of 100 MB/s Jeff Garzik
2007-06-23 7:54 ` Tejun Heo
2007-06-23 12:53 ` Carlo Wood
2007-06-23 17:30 ` Bartlomiej Zolnierkiewicz
2007-06-23 22:43 ` Jeff Garzik
2007-06-24 11:58 ` Michael Tokarev
2007-06-24 12:59 ` Dr. David Alan Gilbert
2007-06-24 14:21 ` Justin Piszcz
2007-06-24 15:52 ` Michael Tokarev
2007-06-24 16:59 ` Justin Piszcz
2007-06-24 22:07 ` Carlo Wood
2007-06-24 23:46 ` Mark Lord
2007-06-25 0:23 ` Patrick Mau
2007-06-24 15:48 ` Michael Tokarev
2007-07-05 22:12 ` Phillip Susi
2007-06-24 0:54 ` Eyal Lebedinsky
2007-06-24 9:01 Mikael Pettersson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).