* Some NCQ numbers...
@ 2007-06-28 10:51 Michael Tokarev
2007-06-28 11:01 ` Michael Tokarev
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Michael Tokarev @ 2007-06-28 10:51 UTC (permalink / raw)
To: Kernel Mailing List; +Cc: linux-ide, linux-scsi
[Offtopic notice: For the first time I demonstrated some
speed testing results on linux-ide mailinglist, as a
demonstration how [NT]CQ can help. But later, someone
becomes curious and posted that email to lkml, asking
for more details. Since that, I become more curious
as well, and decided to look at it more closely.
Here it goes...]
A test drive is Seagate Barracuda ST3250620AS "desktop" drive,
250Gb, cache size is 16Mb, 7200RPM.
The same results shows Seagate Barracuda ES drive, ST3250620NS.
I guess pretty similar results will be fore larger Barracudas from
Seagate. The only difference between 250Gb ones and larger ones is
the amount of disk platters and heads.
Test machine was using MPTSAS driver for the following card:
SCSI storage controller: LSI Logic / Symbios Logic SAS1064E PCI-Express Fusion-MPT SAS (rev 02)
Pretty similar results were obtained on an AHCI controller:
SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01)
on another machines.
The following tables shows data read/write speed in Megabytes/sec,
with different parameters.
All I/O performed directly on the whole drive, i.e.
open("/dev/sda", O_RDWR|O_DIRECT).
There are 5 kinds of tests were performed: linear read (linRd),
random read (rndRd), linear write (linWr), random write (rndWr),
and a combination of random read and write (rndR/W).
Each test has been tried with 1 (2 in case of r/w), 4 and 32 threads
doing I/O in parallel (Trd column). Linear read and writes were
performed only for single thread.
Two modes for each test -- with command queuing enabled (qena) and
disabled (qdis), using /sys/block/sda/device/queue_depth, by setting
queue depth to 31 (default) and 1 respectively.
And finally, each set of tests were performed for different block sizes --
4, 8, 16, 32, 128 and 1024 kb (1 kb = 1024 bytes).
First, tests with write cache enabled (factory default settings for the
drives in question):
BlkSz Trd linRd rndRd linWr rndWr rndR/W
qena qdis qena qdis qena qdis qena qdis qena qdis
4k 1 12.8 12.8 0.3 0.3 35.4 37.0 0.5 0.5 0.2/ 0.2 0.2/ 0.2
4 0.3 0.3 0.5 0.5 0.2/ 0.2 0.2/ 0.1
32 0.3 0.4 0.5 0.5 0.2/ 0.2 0.2/ 0.1
8k 1 23.4 23.4 0.6 0.6 51.5 51.2 1.0 1.0 0.4/ 0.4 0.4/ 0.4
4 0.6 0.6 1.0 1.0 0.4/ 0.4 0.4/ 0.2
32 0.6 0.8 1.0 1.0 0.4/ 0.4 0.4/ 0.2
16k 1 41.1 40.3 1.2 1.2 67.4 67.8 2.0 2.0 0.7/ 0.7 0.7/ 0.7
4 1.2 1.1 2.0 2.0 0.7/ 0.7 0.8/ 0.4
32 1.2 1.6 2.0 2.0 0.7/ 0.7 0.9/ 0.4
32k 1 58.6 57.6 2.2 2.2 79.1 70.9 3.8 3.7 1.4/ 1.4 1.4/ 1.4
4 2.3 2.2 3.8 3.7 1.4/ 1.4 1.6/ 0.8
32 2.3 3.0 3.7 3.8 1.4/ 1.4 1.7/ 0.9
128k 1 80.4 80.4 8.1 8.0 78.7 77.3 11.6 11.6 4.5/ 4.5 4.5/ 4.5
4 8.1 8.0 11.4 11.3 4.6/ 4.6 5.5/ 2.8
32 8.1 9.2 11.3 11.4 4.7/ 4.6 5.7/ 3.0
1024k 1 80.4 80.4 33.9 34.0 78.2 78.3 33.6 33.6 15.9/15.9 15.9/15.9
4 34.5 34.8 33.5 33.3 16.4/16.3 17.2/11.8
32 34.5 34.5 33.5 35.8 20.6/11.3 21.4/11.6
And second, the same drive with write caching disabled (WCE=0, hdparm -W0):
BlkSz Trd linRd rndRd linWr rndWr rndR/W
qena qdis qena qdis qena qdis qena qdis qena qdis
4k 1 12.8 12.8 0.3 0.3 0.4 0.5 0.3 0.3 0.2/ 0.2 0.2/ 0.2
4 0.3 0.3 0.3 0.3 0.2/ 0.2 0.2/ 0.1
32 0.3 0.4 0.3 0.4 0.2/ 0.1 0.2/ 0.1
8k 1 24.6 24.6 0.6 0.6 0.9 0.9 0.6 0.6 0.3/ 0.3 0.3/ 0.3
4 0.6 0.6 0.6 0.5 0.3/ 0.3 0.4/ 0.2
32 0.6 0.8 0.6 0.8 0.3/ 0.3 0.4/ 0.2
16k 1 41.8 41.7 1.2 1.1 1.8 1.8 1.1 1.1 0.6/ 0.6 0.6/ 0.6
4 1.2 1.1 1.1 1.0 0.6/ 0.6 0.8/ 0.4
32 1.2 1.5 1.1 1.6 0.6/ 0.6 0.8/ 0.4
32k 1 60.1 59.2 2.3 2.3 3.6 3.5 2.1 2.1 1.1/ 1.1 1.1/ 1.1
4 2.3 2.2 2.1 2.0 1.1/ 1.1 1.5/ 0.7
32 2.3 2.9 2.1 3.0 1.1/ 1.1 1.5/ 0.8
128k 1 79.4 79.4 8.1 8.1 12.4 12.6 7.2 7.1 3.8/ 3.8 3.8/ 3.8
4 8.1 7.9 7.2 7.1 3.8/ 3.8 5.2/ 2.6
32 8.1 9.0 7.2 7.8 3.9/ 3.8 5.2/ 2.7
1024k 1 79.0 79.4 33.8 33.8 34.2 34.1 24.6 24.6 14.3/14.2 14.3/14.2
4 33.9 34.3 24.7 24.8 14.4/14.2 15.9/11.1
32 34.0 34.3 24.7 27.6 19.3/10.4 19.3/10.4
Two immediate conclusions.
First, NCQ on this combination of drive/controller/driver DOES NOT WORK.
Without write cache on the drive, the results are pretty much the same
whenever NCQ is enabled or not (modulo some random fluctuations). With
write cache enabled, NCQ leads to the drive starts "preferring" reads over
writes in combined R/W test, but summary thougoutput stays the same.
And second, write cache doesn't seem to help, at least not for a busy
drive. Yes it helps *alot* for sequential writes, but definitely not
for random writes, which stays the same whenever write cache is turned
on or not. Again: this is a "busy" drive, which has no idle time to
complete queued writes since new requests are coming and coming -- for
not-so-busy drive write caching should help to made the whole system
more "responsive".
I don't have other models of SATA drives around.
I'm planning to test several models of SCSI drives. On SCSI front
(or maybe with different drives - I don't know) things are WAY more
interesting wrt TCQ. Difference in results between 1 and 32 threads
goes up to 4 times sometimes!. But I'm a bit stuck with SCSI tests
since I don't know how to turn off TCQ without rebooting a machine.
Note that the results are NOT interesting for a "typical" workload,
where you work with filesystem (file server, compiling a kernel etc).
The test is more a (synthetic) database workload with direct I/O and
relatively small block sizes.
As a test, I used a tiny program I wrote especially for this purpose,
available at http://www.corpit.ru/mjt/iot.c . Please note that I
never used threads before, and since this program runs several
threads, it needs some syncronisations between them, and I'm not
sure I got it right, but at least it seems to work... ;) So if
anyone can offer correction to this area, you're welcome!
The program (look into it for instructions) performs a single test.
In order to build a table like the above, I had to run it multiple
times, and collect the results into a nicely-looking table. I wont
show you a script I used for that, because it's way too specific
for my system (particular device names etc), and it's just too ugly
(a quick hack). Instead, here is a simpler, but more generally
useful one, to test single drive in single mode as I posted
previously: http://www.corpit.ru/mjt/iot.sh .
/mjt
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: Some NCQ numbers... 2007-06-28 10:51 Some NCQ numbers Michael Tokarev @ 2007-06-28 11:01 ` Michael Tokarev 2007-07-03 8:19 ` Tejun Heo 2007-07-04 15:44 ` Dan Aloni 2 siblings, 0 replies; 17+ messages in thread From: Michael Tokarev @ 2007-06-28 11:01 UTC (permalink / raw) To: Michael Tokarev; +Cc: Kernel Mailing List, linux-ide, linux-scsi Michael Tokarev wrote: [] > I'm planning to test several models of SCSI drives. On SCSI front > (or maybe with different drives - I don't know) things are WAY more > interesting wrt TCQ. Difference in results between 1 and 32 threads > goes up to 4 times sometimes!. But I'm a bit stuck with SCSI tests > since I don't know how to turn off TCQ without rebooting a machine. A quick followup, to demonstrate the "interesting" part. Seagate SCSI ST3146854LC drive, 140Gb, 15KRPM, write cache disabled, queue depth = 32: BlkSz Trd linRd rndRd linWr rndWr rndR/W 4k 1 37.9 0.6 0.9 0.6 0.4/ 0.3 4 0.9 0.7 0.6/ 0.4 32 1.5 1.1 0.9/ 0.4 8k 1 75.2 1.2 1.9 1.1 0.8/ 0.6 4 1.7 1.5 1.1/ 0.7 32 2.9 2.2 1.7/ 0.9 16k 1 82.3 2.4 3.6 2.3 1.5/ 1.2 4 3.3 2.9 2.2/ 1.4 32 5.5 4.3 3.3/ 1.7 32k 1 86.3 4.7 6.9 4.4 2.9/ 2.3 4 6.4 5.6 4.2/ 2.7 32 10.2 8.0 6.2/ 3.1 128k 1 86.5 15.8 26.6 14.9 9.5/ 7.7 4 20.6 18.2 13.5/ 8.5 32 29.2 24.8 18.3/ 9.1 1024k 1 88.6 46.7 63.1 48.2 25.3/25.3 4 51.7 51.3 33.5/21.8 32 55.9 57.3 37.6/19.0 Fujitsu SCSI MAX3147NC drive, same parameters: BlkSz Trd linRd rndRd linWr rndWr rndR/W 4k 1 37.4 0.7 1.0 0.6 0.4/ 0.3 4 0.9 0.8 0.6/ 0.4 32 1.5 1.2 0.9/ 0.4 8k 1 32.9 1.3 1.9 1.2 0.7/ 0.7 4 1.8 1.5 1.2/ 0.7 32 2.8 2.3 1.7/ 0.9 16k 1 89.6 2.6 3.7 2.4 1.4/ 1.3 4 3.5 3.0 2.4/ 1.4 32 5.4 4.4 3.3/ 1.7 32k 1 87.9 4.8 7.0 4.4 2.6/ 2.6 4 6.8 5.6 4.6/ 2.7 32 9.9 8.3 6.2/ 3.1 128k 1 90.7 16.2 22.5 15.3 8.6/ 8.6 4 21.8 18.6 15.0/ 8.1 32 28.6 25.0 18.2/ 9.1 1024k 1 90.6 48.9 60.0 47.4 25.3/25.9 4 55.6 51.7 34.4/22.5 32 59.8 56.2 38.6/19.7 /mjt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-06-28 10:51 Some NCQ numbers Michael Tokarev 2007-06-28 11:01 ` Michael Tokarev @ 2007-07-03 8:19 ` Tejun Heo 2007-07-03 20:29 ` Michael Tokarev 2007-07-04 15:44 ` Dan Aloni 2 siblings, 1 reply; 17+ messages in thread From: Tejun Heo @ 2007-07-03 8:19 UTC (permalink / raw) To: Michael Tokarev; +Cc: Kernel Mailing List, linux-ide, linux-scsi Michael Tokarev wrote: > [Offtopic notice: For the first time I demonstrated some > speed testing results on linux-ide mailinglist, as a > demonstration how [NT]CQ can help. But later, someone > becomes curious and posted that email to lkml, asking > for more details. Since that, I become more curious > as well, and decided to look at it more closely. > Here it goes...] > > A test drive is Seagate Barracuda ST3250620AS "desktop" drive, > 250Gb, cache size is 16Mb, 7200RPM. And which elevator? -- tejun ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-03 8:19 ` Tejun Heo @ 2007-07-03 20:29 ` Michael Tokarev 2007-07-04 1:19 ` Tejun Heo 0 siblings, 1 reply; 17+ messages in thread From: Michael Tokarev @ 2007-07-03 20:29 UTC (permalink / raw) To: Tejun Heo; +Cc: Kernel Mailing List, linux-ide, linux-scsi Tejun Heo wrote: > Michael Tokarev wrote: [] >> A test drive is Seagate Barracuda ST3250620AS "desktop" drive, >> 250Gb, cache size is 16Mb, 7200RPM. [test shows that NCQ makes no difference whatsoever] > And which elevator? Well. It looks like the results does not depend on the elevator. Originally I tried with deadline, and just re-ran the test with noop (hence the long delay with the answer) - changing linux elevator changes almost nothing in the results - modulo some random "fluctuations". In any case, NCQ - at least in this drive - just does not work. Linux with its I/O elevator may help to speed things up a bit, but the disk does nothing in this area. NCQ doesn't slow things down either - it just does not work. The same's for ST3250620NS "enterprise" drives. By the way, Seagate announced Barracuda ES 2 series (in range 500..1200Gb if memory serves) - maybe with those, NCQ will work better? Or maybe it's libata which does not implement NCQ "properly"? (As I shown before, with almost all ol'good SCSI drives TCQ helps alot - up to 2x the difference and more - with multiple I/O threads) /mjt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-03 20:29 ` Michael Tokarev @ 2007-07-04 1:19 ` Tejun Heo 2007-07-04 9:43 ` Michael Tokarev 2007-07-04 14:40 ` James Bottomley 0 siblings, 2 replies; 17+ messages in thread From: Tejun Heo @ 2007-07-04 1:19 UTC (permalink / raw) To: Michael Tokarev; +Cc: Kernel Mailing List, linux-ide, linux-scsi Hello, Michael Tokarev wrote: > Well. It looks like the results does not depend on the > elevator. Originally I tried with deadline, and just > re-ran the test with noop (hence the long delay with > the answer) - changing linux elevator changes almost > nothing in the results - modulo some random "fluctuations". I see. Thanks for testing. > In any case, NCQ - at least in this drive - just does > not work. Linux with its I/O elevator may help to > speed things up a bit, but the disk does nothing in > this area. NCQ doesn't slow things down either - it > just does not work. > > The same's for ST3250620NS "enterprise" drives. > > By the way, Seagate announced Barracuda ES 2 series > (in range 500..1200Gb if memory serves) - maybe with > those, NCQ will work better? No one would know without testing. > Or maybe it's libata which does not implement NCQ > "properly"? (As I shown before, with almost all > ol'good SCSI drives TCQ helps alot - up to 2x the > difference and more - with multiple I/O threads) Well, what the driver does is minimal. It just passes through all the commands to the harddrive. After all, NCQ/TCQ gives the harddrive more responsibility regarding request scheduling. -- tejun ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-04 1:19 ` Tejun Heo @ 2007-07-04 9:43 ` Michael Tokarev 2007-07-04 10:22 ` Justin Piszcz 2007-07-05 19:22 ` Bill Davidsen 2007-07-04 14:40 ` James Bottomley 1 sibling, 2 replies; 17+ messages in thread From: Michael Tokarev @ 2007-07-04 9:43 UTC (permalink / raw) To: Tejun Heo; +Cc: Kernel Mailing List, linux-ide, linux-scsi Tejun Heo wrote: > Hello, > > Michael Tokarev wrote: >> Well. It looks like the results does not depend on the >> elevator. Originally I tried with deadline, and just >> re-ran the test with noop (hence the long delay with >> the answer) - changing linux elevator changes almost >> nothing in the results - modulo some random "fluctuations". > > I see. Thanks for testing. Here are actual results - the tests were still running when I replied yesterday. Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM, 16Mb cache, 250Gb capacity. The tests were performed with queue depth = 64 (on mptsas), drive write cache is turned off. noop scheduler: BlkSz Trd linRd rndRd linWr rndWr rndR/W 4k 1 12.8 0.3 0.4 0.3 0.1/ 0.1 4 0.3 0.3 0.1/ 0.1 32 0.3 0.3 0.1/ 0.1 8k 1 24.6 0.6 0.9 0.6 0.3/ 0.3 4 0.6 0.6 0.3/ 0.3 32 0.6 0.6 0.3/ 0.3 16k 1 41.3 1.2 1.8 1.1 0.6/ 0.6 4 1.2 1.1 0.6/ 0.6 32 1.2 1.1 0.6/ 0.6 32k 1 58.4 2.2 3.5 2.1 1.1/ 1.1 4 2.3 2.1 1.1/ 1.1 32 2.3 2.1 1.1/ 1.1 128k 1 80.4 8.1 12.5 7.2 3.8/ 3.8 4 8.1 7.2 3.8/ 3.8 32 8.1 7.2 3.8/ 3.8 1024k 1 80.5 33.9 33.8 24.5 14.3/14.3 4 34.1 24.6 14.3/14.2 32 34.2 24.6 14.4/14.2 deadline scheduler: BlkSz Trd linRd rndRd linWr rndWr rndR/W 4k 1 12.8 0.3 0.4 0.3 0.1/ 0.1 4 0.3 0.3 0.1/ 0.1 32 0.3 0.3 0.1/ 0.1 8k 1 24.5 0.6 0.9 0.6 0.3/ 0.3 4 0.6 0.6 0.3/ 0.3 32 0.6 0.6 0.3/ 0.3 16k 1 41.3 1.2 1.8 1.1 0.6/ 0.6 4 1.2 1.1 0.6/ 0.6 32 1.2 1.1 0.6/ 0.6 32k 1 57.7 2.3 3.4 2.1 1.1/ 1.1 4 2.3 2.1 1.1/ 1.1 32 2.3 2.1 1.1/ 1.1 128k 1 79.4 8.1 12.5 7.2 3.8/ 3.8 4 8.1 7.3 3.8/ 3.8 32 8.2 7.3 3.9/ 3.8 1024k 1 79.4 33.7 33.8 24.5 14.2/14.2 4 33.9 24.6 14.3/14.2 32 33.4 24.4 17.0/10.5 [] >> By the way, Seagate announced Barracuda ES 2 series >> (in range 500..1200Gb if memory serves) - maybe with >> those, NCQ will work better? > > No one would know without testing. Sure thing. I guess I'll set up a web page with all the results so far, in a hope someday it will be more complete (we don't have many different drives to test, but others do). By the way. Both SATA drives we have are single-platter ones (with 500Gb models they've 2 platters, and 750Gb ones are with 3 platters), while all SCSI drives I tested have more than one platters. Maybe this is yet another reason for NCQ failing. And another note. I heard somewhere that Seagate for one prohibits publishing of tests like this, however I haven't signed any NDAs and somesuch when purchased their drives in a nearest computer store... ;) >> Or maybe it's libata which does not implement NCQ >> "properly"? (As I shown before, with almost all >> ol'good SCSI drives TCQ helps alot - up to 2x the >> difference and more - with multiple I/O threads) > > Well, what the driver does is minimal. It just passes through all the > commands to the harddrive. After all, NCQ/TCQ gives the harddrive more > responsibility regarding request scheduling. Oh well, I see.... :( /mjt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-04 9:43 ` Michael Tokarev @ 2007-07-04 10:22 ` Justin Piszcz 2007-07-04 10:33 ` Justin Piszcz 2007-07-09 12:26 ` Jens Axboe 2007-07-05 19:22 ` Bill Davidsen 1 sibling, 2 replies; 17+ messages in thread From: Justin Piszcz @ 2007-07-04 10:22 UTC (permalink / raw) To: Michael Tokarev; +Cc: Tejun Heo, Kernel Mailing List, linux-ide, linux-scsi On Wed, 4 Jul 2007, Michael Tokarev wrote: > Tejun Heo wrote: >> Hello, >> >> Michael Tokarev wrote: >>> Well. It looks like the results does not depend on the >>> elevator. Originally I tried with deadline, and just >>> re-ran the test with noop (hence the long delay with >>> the answer) - changing linux elevator changes almost >>> nothing in the results - modulo some random "fluctuations". >> >> I see. Thanks for testing. > > Here are actual results - the tests were still running when > I replied yesterday. > > Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM, > 16Mb cache, 250Gb capacity. The tests were performed with > queue depth = 64 (on mptsas), drive write cache is turned > off. I found AS scheduler to be the premium and best for single-user performance. You want speed? Use AS. http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline_vs_noop.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-04 10:22 ` Justin Piszcz @ 2007-07-04 10:33 ` Justin Piszcz 2007-07-05 19:00 ` Bill Davidsen 2007-07-09 12:26 ` Jens Axboe 1 sibling, 1 reply; 17+ messages in thread From: Justin Piszcz @ 2007-07-04 10:33 UTC (permalink / raw) To: Michael Tokarev; +Cc: Tejun Heo, Kernel Mailing List, linux-ide, linux-scsi On Wed, 4 Jul 2007, Justin Piszcz wrote: > On Wed, 4 Jul 2007, Michael Tokarev wrote: > > > Tejun Heo wrote: > >> Hello, > >> > >> Michael Tokarev wrote: > >>> Well. It looks like the results does not depend on the > >>> elevator. Originally I tried with deadline, and just > >>> re-ran the test with noop (hence the long delay with > >>> the answer) - changing linux elevator changes almost > >>> nothing in the results - modulo some random "fluctuations". > >> > >> I see. Thanks for testing. > > > > Here are actual results - the tests were still running when > > I replied yesterday. > > > > Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM, > > 16Mb cache, 250Gb capacity. The tests were performed with > > queue depth = 64 (on mptsas), drive write cache is turned > > off. > > I found AS scheduler to be the premium and best for single-user performance. > > You want speed? Use AS. > > http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline_vs_noop.html > > Does not include noop-- tested the main three though, renamed :) http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline.html And for the archives: p34-cfq,15696M,77114.3,99,311683,55.3333,184947,38.6667,79842.7,99,524065,41.3333,634.033,0.333333,16:100000:16/64,1043.33,8.33333,4419.33,11.6667,2942,17.3333,1178,10.3333,4192.67,12.3333,2619.33,19 p34-as,15696M,76202.3,99,443103,85,189716,34.6667,79552,99,507271,39.6667,607.067,0,16:100000:16/64,1153,10,13434,36,2769.67,16.3333,1201.67,10.6667,3951.33,12,2665.67,19 p34-deadline,15696M,76933.3,98.6667,386852,72,183016,29.6667,79530.7,99,512082,39.6667,678.567,0,16:100000:16/64,1230.33,10.3333,12349,32.3333,2945,17.3333,1258,11,8183,22.3333,2867,20.3333 Justin. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-04 10:33 ` Justin Piszcz @ 2007-07-05 19:00 ` Bill Davidsen 2007-07-09 11:07 ` Justin Piszcz 0 siblings, 1 reply; 17+ messages in thread From: Bill Davidsen @ 2007-07-05 19:00 UTC (permalink / raw) To: Justin Piszcz Cc: Michael Tokarev, Tejun Heo, Kernel Mailing List, linux-ide, linux-scsi Justin Piszcz wrote: > > > On Wed, 4 Jul 2007, Justin Piszcz wrote: > >> On Wed, 4 Jul 2007, Michael Tokarev wrote: >> >> > Tejun Heo wrote: >> >> Hello, >> >> >> >> Michael Tokarev wrote: >> >>> Well. It looks like the results does not depend on the >> >>> elevator. Originally I tried with deadline, and just >> >>> re-ran the test with noop (hence the long delay with >> >>> the answer) - changing linux elevator changes almost >> >>> nothing in the results - modulo some random "fluctuations". >> >> >> >> I see. Thanks for testing. >> > >> > Here are actual results - the tests were still running when >> > I replied yesterday. >> > >> > Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM, >> > 16Mb cache, 250Gb capacity. The tests were performed with >> > queue depth = 64 (on mptsas), drive write cache is turned >> > off. >> >> I found AS scheduler to be the premium and best for single-user >> performance. >> >> You want speed? Use AS. >> >> http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline_vs_noop.html >> >> > > Does not include noop-- tested the main three though, renamed :) > > http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline.html > > And for the archives: > > p34-cfq,15696M,77114.3,99,311683,55.3333,184947,38.6667,79842.7,99,524065,41.3333,634.033,0.333333,16:100000:16/64,1043.33,8.33333,4419.33,11.6667,2942,17.3333,1178,10.3333,4192.67,12.3333,2619.33,19 > > p34-as,15696M,76202.3,99,443103,85,189716,34.6667,79552,99,507271,39.6667,607.067,0,16:100000:16/64,1153,10,13434,36,2769.67,16.3333,1201.67,10.6667,3951.33,12,2665.67,19 > > p34-deadline,15696M,76933.3,98.6667,386852,72,183016,29.6667,79530.7,99,512082,39.6667,678.567,0,16:100000:16/64,1230.33,10.3333,12349,32.3333,2945,17.3333,1258,11,8183,22.3333,2867,20.3333 > I looked at these before, did you really run with a chunk size of just under 16GB, or does "15696M" have some inobvious meaning? -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-05 19:00 ` Bill Davidsen @ 2007-07-09 11:07 ` Justin Piszcz 0 siblings, 0 replies; 17+ messages in thread From: Justin Piszcz @ 2007-07-09 11:07 UTC (permalink / raw) To: Bill Davidsen Cc: Michael Tokarev, Tejun Heo, Kernel Mailing List, linux-ide, linux-scsi On Thu, 5 Jul 2007, Bill Davidsen wrote: > Justin Piszcz wrote: >> >> >> On Wed, 4 Jul 2007, Justin Piszcz wrote: >> >>> On Wed, 4 Jul 2007, Michael Tokarev wrote: >>> >>> > Tejun Heo wrote: >>> >> Hello, >>> >> >>> >> Michael Tokarev wrote: >>> >>> Well. It looks like the results does not depend on the >>> >>> elevator. Originally I tried with deadline, and just >>> >>> re-ran the test with noop (hence the long delay with >>> >>> the answer) - changing linux elevator changes almost >>> >>> nothing in the results - modulo some random "fluctuations". >>> >> >>> >> I see. Thanks for testing. >>> > >>> > Here are actual results - the tests were still running when >>> > I replied yesterday. >>> > >>> > Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM, >>> > 16Mb cache, 250Gb capacity. The tests were performed with >>> > queue depth = 64 (on mptsas), drive write cache is turned >>> > off. >>> >>> I found AS scheduler to be the premium and best for single-user >>> performance. >>> >>> You want speed? Use AS. >>> >>> http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline_vs_noop.html >>> >>> >> >> Does not include noop-- tested the main three though, renamed :) >> >> http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline.html >> >> And for the archives: >> >> p34-cfq,15696M,77114.3,99,311683,55.3333,184947,38.6667,79842.7,99,524065,41.3333,634.033,0.333333,16:100000:16/64,1043.33,8.33333,4419.33,11.6667,2942,17.3333,1178,10.3333,4192.67,12.3333,2619.33,19 >> p34-as,15696M,76202.3,99,443103,85,189716,34.6667,79552,99,507271,39.6667,607.067,0,16:100000:16/64,1153,10,13434,36,2769.67,16.3333,1201.67,10.6667,3951.33,12,2665.67,19 >> p34-deadline,15696M,76933.3,98.6667,386852,72,183016,29.6667,79530.7,99,512082,39.6667,678.567,0,16:100000:16/64,1230.33,10.3333,12349,32.3333,2945,17.3333,1258,11,8183,22.3333,2867,20.3333 > I looked at these before, did you really run with a chunk size of just under > 16GB, or does "15696M" have some inobvious meaning? > > -- > Bill Davidsen <davidsen@tmr.com> > "We have more to fear from the bungling of the incompetent than from > the machinations of the wicked." - from Slashdot > It says to use double your RAM, your RAM is 7848, so that is why I use 15696M :) I did some tests recently, it appears JFS is 20-60MB/s faster for sequential read/writes/re-writes but it does not have a defrag tool, defragfs but its not included in Debian and people say not to use it on Google/so I am not sure I want to go there. Justin. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-04 10:22 ` Justin Piszcz 2007-07-04 10:33 ` Justin Piszcz @ 2007-07-09 12:26 ` Jens Axboe 1 sibling, 0 replies; 17+ messages in thread From: Jens Axboe @ 2007-07-09 12:26 UTC (permalink / raw) To: Justin Piszcz Cc: Michael Tokarev, Tejun Heo, Kernel Mailing List, linux-ide, linux-scsi On Wed, Jul 04 2007, Justin Piszcz wrote: > On Wed, 4 Jul 2007, Michael Tokarev wrote: > > > Tejun Heo wrote: > >> Hello, > >> > >> Michael Tokarev wrote: > >>> Well. It looks like the results does not depend on the > >>> elevator. Originally I tried with deadline, and just > >>> re-ran the test with noop (hence the long delay with > >>> the answer) - changing linux elevator changes almost > >>> nothing in the results - modulo some random "fluctuations". > >> > >> I see. Thanks for testing. > > > > Here are actual results - the tests were still running when > > I replied yesterday. > > > > Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM, > > 16Mb cache, 250Gb capacity. The tests were performed with > > queue depth = 64 (on mptsas), drive write cache is turned > > off. > > I found AS scheduler to be the premium and best for single-user performance. > > You want speed? Use AS. > > http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline_vs_noop.html Hmm, I find your data very weak for such a conclusion. Value of the test itself withstanding, AS seems to be a lot faster for sequential output for some reason, yet slower for everything else. Which is odd, deadline should always be running at the same speed for writeout as AS. The only real difference should be sequential and random reads. So allow me to call your results questionable. It also looks like bonnie (some version) output, I never found bonnie to provide good and repeatable numbers. tiotest is much better, or (of course) fio. -- Jens Axboe ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-04 9:43 ` Michael Tokarev 2007-07-04 10:22 ` Justin Piszcz @ 2007-07-05 19:22 ` Bill Davidsen 1 sibling, 0 replies; 17+ messages in thread From: Bill Davidsen @ 2007-07-05 19:22 UTC (permalink / raw) To: Michael Tokarev; +Cc: Tejun Heo, Kernel Mailing List, linux-ide, linux-scsi Michael Tokarev wrote: > Tejun Heo wrote: >> Hello, >> >> Michael Tokarev wrote: >>> Well. It looks like the results does not depend on the >>> elevator. Originally I tried with deadline, and just >>> re-ran the test with noop (hence the long delay with >>> the answer) - changing linux elevator changes almost >>> nothing in the results - modulo some random "fluctuations". >> I see. Thanks for testing. > > Here are actual results - the tests were still running when > I replied yesterday. > > Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM, > 16Mb cache, 250Gb capacity. The tests were performed with > queue depth = 64 (on mptsas), drive write cache is turned > off. > But... with write cache off you don't let the drive do some things which might show a lot of improvement with one scheduler or another. So your data are only part of the story, aren't they? [snip] >>> By the way, Seagate announced Barracuda ES 2 series >>> (in range 500..1200Gb if memory serves) - maybe with >>> those, NCQ will work better? >> No one would know without testing. > > Sure thing. I guess I'll set up a web page with all > the results so far, in a hope someday it will be more > complete (we don't have many different drives to test, > but others do). > > By the way. Both SATA drives we have are single-platter > ones (with 500Gb models they've 2 platters, and 750Gb > ones are with 3 platters), while all SCSI drives I > tested have more than one platters. Maybe this is > yet another reason for NCQ failing. > > And another note. I heard somewhere that Seagate for > one prohibits publishing of tests like this, however > I haven't signed any NDAs and somesuch when purchased > their drives in a nearest computer store... ;) > >>> Or maybe it's libata which does not implement NCQ >>> "properly"? (As I shown before, with almost all >>> ol'good SCSI drives TCQ helps alot - up to 2x the >>> difference and more - with multiple I/O threads) >> Well, what the driver does is minimal. It just passes through all the >> commands to the harddrive. After all, NCQ/TCQ gives the harddrive more >> responsibility regarding request scheduling. > > Oh well, I see.... :( > -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-04 1:19 ` Tejun Heo 2007-07-04 9:43 ` Michael Tokarev @ 2007-07-04 14:40 ` James Bottomley 2007-07-09 12:26 ` Jens Axboe 1 sibling, 1 reply; 17+ messages in thread From: James Bottomley @ 2007-07-04 14:40 UTC (permalink / raw) To: Tejun Heo; +Cc: Michael Tokarev, Kernel Mailing List, linux-ide, linux-scsi On Wed, 2007-07-04 at 10:19 +0900, Tejun Heo wrote: > Michael Tokarev wrote: > > Well. It looks like the results does not depend on the > > elevator. Originally I tried with deadline, and just > > re-ran the test with noop (hence the long delay with > > the answer) - changing linux elevator changes almost > > nothing in the results - modulo some random "fluctuations". > > I see. Thanks for testing. > > > In any case, NCQ - at least in this drive - just does > > not work. Linux with its I/O elevator may help to > > speed things up a bit, but the disk does nothing in > > this area. NCQ doesn't slow things down either - it > > just does not work. > > > > The same's for ST3250620NS "enterprise" drives. > > > > By the way, Seagate announced Barracuda ES 2 series > > (in range 500..1200Gb if memory serves) - maybe with > > those, NCQ will work better? > > No one would know without testing. > > > Or maybe it's libata which does not implement NCQ > > "properly"? (As I shown before, with almost all > > ol'good SCSI drives TCQ helps alot - up to 2x the > > difference and more - with multiple I/O threads) > > Well, what the driver does is minimal. It just passes through all the > commands to the harddrive. After all, NCQ/TCQ gives the harddrive more > responsibility regarding request scheduling. Actually, in many ways the result support a theory of SCSI TCQ Jens used when designing the block layer. The original TCQ theory held that the drive could make much better head scheduling decisions than the Operating System, so you just used TCQ to pass all the outstanding I/O unfiltered down to the drive to let it schedule. However, the I/O results always seemed to indicate that the effect of TCQ was negligible at around 4 outstanding commands, leading to the second theory that all TCQ was good for was saturating the transport, and making scheduling decisions was, indeed, better left to the OS (hence all our I/O schedulers). The key difference between NCQ and TCQ is that NCQ allows a non interlock setup and completion, but there can't be overlapping (or interrupted) data transfers. TCQ and Disconnect (for SPI although there are equivalents for most other transports) allow any style of overlap you can construct, so NCQ was really designed more to allow the drive to make the head scheduling decisions. Where SCSI TCQ seems to win is that most devices pull the incoming TCQ commands into a (usually quite large) pre-execute cache, which gives them streaming command execution (usually they're executing command n-2 or 3 while accepting the data for command n), so they're using the cache actually to smooth out internal latencies. One final question: have you tried SAS devices for comparison? The figures that give TCQ a 2x performance boost were with SPI and FC ... I'm not aware that anyone has actually done a SAS test. James ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-04 14:40 ` James Bottomley @ 2007-07-09 12:26 ` Jens Axboe 0 siblings, 0 replies; 17+ messages in thread From: Jens Axboe @ 2007-07-09 12:26 UTC (permalink / raw) To: James Bottomley Cc: Tejun Heo, Michael Tokarev, Kernel Mailing List, linux-ide, linux-scsi On Wed, Jul 04 2007, James Bottomley wrote: > On Wed, 2007-07-04 at 10:19 +0900, Tejun Heo wrote: > > Michael Tokarev wrote: > > > Well. It looks like the results does not depend on the > > > elevator. Originally I tried with deadline, and just > > > re-ran the test with noop (hence the long delay with > > > the answer) - changing linux elevator changes almost > > > nothing in the results - modulo some random "fluctuations". > > > > I see. Thanks for testing. > > > > > In any case, NCQ - at least in this drive - just does > > > not work. Linux with its I/O elevator may help to > > > speed things up a bit, but the disk does nothing in > > > this area. NCQ doesn't slow things down either - it > > > just does not work. > > > > > > The same's for ST3250620NS "enterprise" drives. > > > > > > By the way, Seagate announced Barracuda ES 2 series > > > (in range 500..1200Gb if memory serves) - maybe with > > > those, NCQ will work better? > > > > No one would know without testing. > > > > > Or maybe it's libata which does not implement NCQ > > > "properly"? (As I shown before, with almost all > > > ol'good SCSI drives TCQ helps alot - up to 2x the > > > difference and more - with multiple I/O threads) > > > > Well, what the driver does is minimal. It just passes through all the > > commands to the harddrive. After all, NCQ/TCQ gives the harddrive more > > responsibility regarding request scheduling. > > Actually, in many ways the result support a theory of SCSI TCQ Jens used > when designing the block layer. The original TCQ theory held that the > drive could make much better head scheduling decisions than the > Operating System, so you just used TCQ to pass all the outstanding I/O > unfiltered down to the drive to let it schedule. However, the I/O > results always seemed to indicate that the effect of TCQ was negligible > at around 4 outstanding commands, leading to the second theory that all > TCQ was good for was saturating the transport, and making scheduling > decisions was, indeed, better left to the OS (hence all our I/O > schedulers). Indeed, the above I still find to be true. The only real case where larger depths make a real difference, is a pure random reads (or writes, with write caching off) workload. And those situations are largely synthetic, hence benchmarks tend to show NCQ being a lot more beneficial since they construct workloads that consist 100% of random IO. Real life is rarely so black and white. Additionally, there are cases where drive queue depths hurt a lot. The drive has no knowledge of fairness, or process-to-io mappings. So AS/CFQ has to artificially limit queue depths competing IO processes doing semi (or fully) sequential workloads, or throughput plummets. So while NCQ has some benefits, I typically tend to prefer managing the IO queue largely in software instead of punting to (often) buggy firmware. -- Jens Axboe ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-06-28 10:51 Some NCQ numbers Michael Tokarev 2007-06-28 11:01 ` Michael Tokarev 2007-07-03 8:19 ` Tejun Heo @ 2007-07-04 15:44 ` Dan Aloni 2007-07-04 16:17 ` Michael Tokarev 2 siblings, 1 reply; 17+ messages in thread From: Dan Aloni @ 2007-07-04 15:44 UTC (permalink / raw) To: Michael Tokarev; +Cc: Kernel Mailing List, linux-ide, linux-scsi On Thu, Jun 28, 2007 at 02:51:58PM +0400, Michael Tokarev wrote: >[..] > Test machine was using MPTSAS driver for the following card: > SCSI storage controller: LSI Logic / Symbios Logic SAS1064E PCI-Express Fusion-MPT SAS (rev 02) > > Pretty similar results were obtained on an AHCI controller: > SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01) > on another machines. Are you sure that NCQ was enabled between the controller and drive? Did you verify this? I know about some versions that disable NCQ support internally in their firmware (something to do with bugs in error handling). -- Dan Aloni XIV LTD, http://www.xivstorage.com da-x (at) monatomic.org, dan (at) xiv.co.il ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-04 15:44 ` Dan Aloni @ 2007-07-04 16:17 ` Michael Tokarev 2007-07-04 16:44 ` Dan Aloni 0 siblings, 1 reply; 17+ messages in thread From: Michael Tokarev @ 2007-07-04 16:17 UTC (permalink / raw) To: Dan Aloni; +Cc: Kernel Mailing List, linux-ide, linux-scsi Dan Aloni wrote: > On Thu, Jun 28, 2007 at 02:51:58PM +0400, Michael Tokarev wrote: >> [..] >> Test machine was using MPTSAS driver for the following card: >> SCSI storage controller: LSI Logic / Symbios Logic SAS1064E PCI-Express Fusion-MPT SAS (rev 02) >> >> Pretty similar results were obtained on an AHCI controller: >> SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01) >> on another machines. > > Are you sure that NCQ was enabled between the controller and drive? > Did you verify this? I know about some versions that disable NCQ > support internally in their firmware (something to do with bugs in > error handling). The next obvious question is: how to check/verify this? /mjt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers... 2007-07-04 16:17 ` Michael Tokarev @ 2007-07-04 16:44 ` Dan Aloni 0 siblings, 0 replies; 17+ messages in thread From: Dan Aloni @ 2007-07-04 16:44 UTC (permalink / raw) To: Michael Tokarev; +Cc: Kernel Mailing List, linux-ide, linux-scsi On Wed, Jul 04, 2007 at 08:17:35PM +0400, Michael Tokarev wrote: > Dan Aloni wrote: > > On Thu, Jun 28, 2007 at 02:51:58PM +0400, Michael Tokarev wrote: > >> [..] > >> Test machine was using MPTSAS driver for the following card: > >> SCSI storage controller: LSI Logic / Symbios Logic SAS1064E PCI-Express Fusion-MPT SAS (rev 02) > >> > >> Pretty similar results were obtained on an AHCI controller: > >> SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01) > >> on another machines. > > > > Are you sure that NCQ was enabled between the controller and drive? > > Did you verify this? I know about some versions that disable NCQ > > support internally in their firmware (something to do with bugs in > > error handling). > > The next obvious question is: how to check/verify this? On the lowest level, it's possible using a protocol analyzer. If you don't have one, you need to be familiar with the controller's driver or its firmware. If the driver is based on libata, I think it's possible to get this information easier. Otherwise, such as in the case of mptsas, it can be completely hidden by the firmware. -- Dan Aloni XIV LTD, http://www.xivstorage.com da-x (at) monatomic.org, dan (at) xiv.co.il ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-07-09 12:26 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-06-28 10:51 Some NCQ numbers Michael Tokarev 2007-06-28 11:01 ` Michael Tokarev 2007-07-03 8:19 ` Tejun Heo 2007-07-03 20:29 ` Michael Tokarev 2007-07-04 1:19 ` Tejun Heo 2007-07-04 9:43 ` Michael Tokarev 2007-07-04 10:22 ` Justin Piszcz 2007-07-04 10:33 ` Justin Piszcz 2007-07-05 19:00 ` Bill Davidsen 2007-07-09 11:07 ` Justin Piszcz 2007-07-09 12:26 ` Jens Axboe 2007-07-05 19:22 ` Bill Davidsen 2007-07-04 14:40 ` James Bottomley 2007-07-09 12:26 ` Jens Axboe 2007-07-04 15:44 ` Dan Aloni 2007-07-04 16:17 ` Michael Tokarev 2007-07-04 16:44 ` Dan Aloni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).