* Some NCQ numbers...
@ 2007-06-28 10:51 Michael Tokarev
2007-06-28 11:01 ` Michael Tokarev
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Michael Tokarev @ 2007-06-28 10:51 UTC (permalink / raw)
To: Kernel Mailing List; +Cc: linux-ide, linux-scsi
[Offtopic notice: For the first time I demonstrated some
speed testing results on linux-ide mailinglist, as a
demonstration how [NT]CQ can help. But later, someone
becomes curious and posted that email to lkml, asking
for more details. Since that, I become more curious
as well, and decided to look at it more closely.
Here it goes...]
A test drive is Seagate Barracuda ST3250620AS "desktop" drive,
250Gb, cache size is 16Mb, 7200RPM.
The same results shows Seagate Barracuda ES drive, ST3250620NS.
I guess pretty similar results will be fore larger Barracudas from
Seagate. The only difference between 250Gb ones and larger ones is
the amount of disk platters and heads.
Test machine was using MPTSAS driver for the following card:
SCSI storage controller: LSI Logic / Symbios Logic SAS1064E PCI-Express Fusion-MPT SAS (rev 02)
Pretty similar results were obtained on an AHCI controller:
SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01)
on another machines.
The following tables shows data read/write speed in Megabytes/sec,
with different parameters.
All I/O performed directly on the whole drive, i.e.
open("/dev/sda", O_RDWR|O_DIRECT).
There are 5 kinds of tests were performed: linear read (linRd),
random read (rndRd), linear write (linWr), random write (rndWr),
and a combination of random read and write (rndR/W).
Each test has been tried with 1 (2 in case of r/w), 4 and 32 threads
doing I/O in parallel (Trd column). Linear read and writes were
performed only for single thread.
Two modes for each test -- with command queuing enabled (qena) and
disabled (qdis), using /sys/block/sda/device/queue_depth, by setting
queue depth to 31 (default) and 1 respectively.
And finally, each set of tests were performed for different block sizes --
4, 8, 16, 32, 128 and 1024 kb (1 kb = 1024 bytes).
First, tests with write cache enabled (factory default settings for the
drives in question):
BlkSz Trd linRd rndRd linWr rndWr rndR/W
qena qdis qena qdis qena qdis qena qdis qena qdis
4k 1 12.8 12.8 0.3 0.3 35.4 37.0 0.5 0.5 0.2/ 0.2 0.2/ 0.2
4 0.3 0.3 0.5 0.5 0.2/ 0.2 0.2/ 0.1
32 0.3 0.4 0.5 0.5 0.2/ 0.2 0.2/ 0.1
8k 1 23.4 23.4 0.6 0.6 51.5 51.2 1.0 1.0 0.4/ 0.4 0.4/ 0.4
4 0.6 0.6 1.0 1.0 0.4/ 0.4 0.4/ 0.2
32 0.6 0.8 1.0 1.0 0.4/ 0.4 0.4/ 0.2
16k 1 41.1 40.3 1.2 1.2 67.4 67.8 2.0 2.0 0.7/ 0.7 0.7/ 0.7
4 1.2 1.1 2.0 2.0 0.7/ 0.7 0.8/ 0.4
32 1.2 1.6 2.0 2.0 0.7/ 0.7 0.9/ 0.4
32k 1 58.6 57.6 2.2 2.2 79.1 70.9 3.8 3.7 1.4/ 1.4 1.4/ 1.4
4 2.3 2.2 3.8 3.7 1.4/ 1.4 1.6/ 0.8
32 2.3 3.0 3.7 3.8 1.4/ 1.4 1.7/ 0.9
128k 1 80.4 80.4 8.1 8.0 78.7 77.3 11.6 11.6 4.5/ 4.5 4.5/ 4.5
4 8.1 8.0 11.4 11.3 4.6/ 4.6 5.5/ 2.8
32 8.1 9.2 11.3 11.4 4.7/ 4.6 5.7/ 3.0
1024k 1 80.4 80.4 33.9 34.0 78.2 78.3 33.6 33.6 15.9/15.9 15.9/15.9
4 34.5 34.8 33.5 33.3 16.4/16.3 17.2/11.8
32 34.5 34.5 33.5 35.8 20.6/11.3 21.4/11.6
And second, the same drive with write caching disabled (WCE=0, hdparm -W0):
BlkSz Trd linRd rndRd linWr rndWr rndR/W
qena qdis qena qdis qena qdis qena qdis qena qdis
4k 1 12.8 12.8 0.3 0.3 0.4 0.5 0.3 0.3 0.2/ 0.2 0.2/ 0.2
4 0.3 0.3 0.3 0.3 0.2/ 0.2 0.2/ 0.1
32 0.3 0.4 0.3 0.4 0.2/ 0.1 0.2/ 0.1
8k 1 24.6 24.6 0.6 0.6 0.9 0.9 0.6 0.6 0.3/ 0.3 0.3/ 0.3
4 0.6 0.6 0.6 0.5 0.3/ 0.3 0.4/ 0.2
32 0.6 0.8 0.6 0.8 0.3/ 0.3 0.4/ 0.2
16k 1 41.8 41.7 1.2 1.1 1.8 1.8 1.1 1.1 0.6/ 0.6 0.6/ 0.6
4 1.2 1.1 1.1 1.0 0.6/ 0.6 0.8/ 0.4
32 1.2 1.5 1.1 1.6 0.6/ 0.6 0.8/ 0.4
32k 1 60.1 59.2 2.3 2.3 3.6 3.5 2.1 2.1 1.1/ 1.1 1.1/ 1.1
4 2.3 2.2 2.1 2.0 1.1/ 1.1 1.5/ 0.7
32 2.3 2.9 2.1 3.0 1.1/ 1.1 1.5/ 0.8
128k 1 79.4 79.4 8.1 8.1 12.4 12.6 7.2 7.1 3.8/ 3.8 3.8/ 3.8
4 8.1 7.9 7.2 7.1 3.8/ 3.8 5.2/ 2.6
32 8.1 9.0 7.2 7.8 3.9/ 3.8 5.2/ 2.7
1024k 1 79.0 79.4 33.8 33.8 34.2 34.1 24.6 24.6 14.3/14.2 14.3/14.2
4 33.9 34.3 24.7 24.8 14.4/14.2 15.9/11.1
32 34.0 34.3 24.7 27.6 19.3/10.4 19.3/10.4
Two immediate conclusions.
First, NCQ on this combination of drive/controller/driver DOES NOT WORK.
Without write cache on the drive, the results are pretty much the same
whenever NCQ is enabled or not (modulo some random fluctuations). With
write cache enabled, NCQ leads to the drive starts "preferring" reads over
writes in combined R/W test, but summary thougoutput stays the same.
And second, write cache doesn't seem to help, at least not for a busy
drive. Yes it helps *alot* for sequential writes, but definitely not
for random writes, which stays the same whenever write cache is turned
on or not. Again: this is a "busy" drive, which has no idle time to
complete queued writes since new requests are coming and coming -- for
not-so-busy drive write caching should help to made the whole system
more "responsive".
I don't have other models of SATA drives around.
I'm planning to test several models of SCSI drives. On SCSI front
(or maybe with different drives - I don't know) things are WAY more
interesting wrt TCQ. Difference in results between 1 and 32 threads
goes up to 4 times sometimes!. But I'm a bit stuck with SCSI tests
since I don't know how to turn off TCQ without rebooting a machine.
Note that the results are NOT interesting for a "typical" workload,
where you work with filesystem (file server, compiling a kernel etc).
The test is more a (synthetic) database workload with direct I/O and
relatively small block sizes.
As a test, I used a tiny program I wrote especially for this purpose,
available at http://www.corpit.ru/mjt/iot.c . Please note that I
never used threads before, and since this program runs several
threads, it needs some syncronisations between them, and I'm not
sure I got it right, but at least it seems to work... ;) So if
anyone can offer correction to this area, you're welcome!
The program (look into it for instructions) performs a single test.
In order to build a table like the above, I had to run it multiple
times, and collect the results into a nicely-looking table. I wont
show you a script I used for that, because it's way too specific
for my system (particular device names etc), and it's just too ugly
(a quick hack). Instead, here is a simpler, but more generally
useful one, to test single drive in single mode as I posted
previously: http://www.corpit.ru/mjt/iot.sh .
/mjt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-06-28 10:51 Some NCQ numbers Michael Tokarev
@ 2007-06-28 11:01 ` Michael Tokarev
2007-07-03 8:19 ` Tejun Heo
2007-07-04 15:44 ` Dan Aloni
2 siblings, 0 replies; 17+ messages in thread
From: Michael Tokarev @ 2007-06-28 11:01 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Kernel Mailing List, linux-ide, linux-scsi
Michael Tokarev wrote:
[]
> I'm planning to test several models of SCSI drives. On SCSI front
> (or maybe with different drives - I don't know) things are WAY more
> interesting wrt TCQ. Difference in results between 1 and 32 threads
> goes up to 4 times sometimes!. But I'm a bit stuck with SCSI tests
> since I don't know how to turn off TCQ without rebooting a machine.
A quick followup, to demonstrate the "interesting" part.
Seagate SCSI ST3146854LC drive, 140Gb, 15KRPM, write cache disabled,
queue depth = 32:
BlkSz Trd linRd rndRd linWr rndWr rndR/W
4k 1 37.9 0.6 0.9 0.6 0.4/ 0.3
4 0.9 0.7 0.6/ 0.4
32 1.5 1.1 0.9/ 0.4
8k 1 75.2 1.2 1.9 1.1 0.8/ 0.6
4 1.7 1.5 1.1/ 0.7
32 2.9 2.2 1.7/ 0.9
16k 1 82.3 2.4 3.6 2.3 1.5/ 1.2
4 3.3 2.9 2.2/ 1.4
32 5.5 4.3 3.3/ 1.7
32k 1 86.3 4.7 6.9 4.4 2.9/ 2.3
4 6.4 5.6 4.2/ 2.7
32 10.2 8.0 6.2/ 3.1
128k 1 86.5 15.8 26.6 14.9 9.5/ 7.7
4 20.6 18.2 13.5/ 8.5
32 29.2 24.8 18.3/ 9.1
1024k 1 88.6 46.7 63.1 48.2 25.3/25.3
4 51.7 51.3 33.5/21.8
32 55.9 57.3 37.6/19.0
Fujitsu SCSI MAX3147NC drive, same parameters:
BlkSz Trd linRd rndRd linWr rndWr rndR/W
4k 1 37.4 0.7 1.0 0.6 0.4/ 0.3
4 0.9 0.8 0.6/ 0.4
32 1.5 1.2 0.9/ 0.4
8k 1 32.9 1.3 1.9 1.2 0.7/ 0.7
4 1.8 1.5 1.2/ 0.7
32 2.8 2.3 1.7/ 0.9
16k 1 89.6 2.6 3.7 2.4 1.4/ 1.3
4 3.5 3.0 2.4/ 1.4
32 5.4 4.4 3.3/ 1.7
32k 1 87.9 4.8 7.0 4.4 2.6/ 2.6
4 6.8 5.6 4.6/ 2.7
32 9.9 8.3 6.2/ 3.1
128k 1 90.7 16.2 22.5 15.3 8.6/ 8.6
4 21.8 18.6 15.0/ 8.1
32 28.6 25.0 18.2/ 9.1
1024k 1 90.6 48.9 60.0 47.4 25.3/25.9
4 55.6 51.7 34.4/22.5
32 59.8 56.2 38.6/19.7
/mjt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-06-28 10:51 Some NCQ numbers Michael Tokarev
2007-06-28 11:01 ` Michael Tokarev
@ 2007-07-03 8:19 ` Tejun Heo
2007-07-03 20:29 ` Michael Tokarev
2007-07-04 15:44 ` Dan Aloni
2 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2007-07-03 8:19 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Kernel Mailing List, linux-ide, linux-scsi
Michael Tokarev wrote:
> [Offtopic notice: For the first time I demonstrated some
> speed testing results on linux-ide mailinglist, as a
> demonstration how [NT]CQ can help. But later, someone
> becomes curious and posted that email to lkml, asking
> for more details. Since that, I become more curious
> as well, and decided to look at it more closely.
> Here it goes...]
>
> A test drive is Seagate Barracuda ST3250620AS "desktop" drive,
> 250Gb, cache size is 16Mb, 7200RPM.
And which elevator?
--
tejun
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-03 8:19 ` Tejun Heo
@ 2007-07-03 20:29 ` Michael Tokarev
2007-07-04 1:19 ` Tejun Heo
0 siblings, 1 reply; 17+ messages in thread
From: Michael Tokarev @ 2007-07-03 20:29 UTC (permalink / raw)
To: Tejun Heo; +Cc: Kernel Mailing List, linux-ide, linux-scsi
Tejun Heo wrote:
> Michael Tokarev wrote:
[]
>> A test drive is Seagate Barracuda ST3250620AS "desktop" drive,
>> 250Gb, cache size is 16Mb, 7200RPM.
[test shows that NCQ makes no difference whatsoever]
> And which elevator?
Well. It looks like the results does not depend on the
elevator. Originally I tried with deadline, and just
re-ran the test with noop (hence the long delay with
the answer) - changing linux elevator changes almost
nothing in the results - modulo some random "fluctuations".
In any case, NCQ - at least in this drive - just does
not work. Linux with its I/O elevator may help to
speed things up a bit, but the disk does nothing in
this area. NCQ doesn't slow things down either - it
just does not work.
The same's for ST3250620NS "enterprise" drives.
By the way, Seagate announced Barracuda ES 2 series
(in range 500..1200Gb if memory serves) - maybe with
those, NCQ will work better?
Or maybe it's libata which does not implement NCQ
"properly"? (As I shown before, with almost all
ol'good SCSI drives TCQ helps alot - up to 2x the
difference and more - with multiple I/O threads)
/mjt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-03 20:29 ` Michael Tokarev
@ 2007-07-04 1:19 ` Tejun Heo
2007-07-04 9:43 ` Michael Tokarev
2007-07-04 14:40 ` James Bottomley
0 siblings, 2 replies; 17+ messages in thread
From: Tejun Heo @ 2007-07-04 1:19 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Kernel Mailing List, linux-ide, linux-scsi
Hello,
Michael Tokarev wrote:
> Well. It looks like the results does not depend on the
> elevator. Originally I tried with deadline, and just
> re-ran the test with noop (hence the long delay with
> the answer) - changing linux elevator changes almost
> nothing in the results - modulo some random "fluctuations".
I see. Thanks for testing.
> In any case, NCQ - at least in this drive - just does
> not work. Linux with its I/O elevator may help to
> speed things up a bit, but the disk does nothing in
> this area. NCQ doesn't slow things down either - it
> just does not work.
>
> The same's for ST3250620NS "enterprise" drives.
>
> By the way, Seagate announced Barracuda ES 2 series
> (in range 500..1200Gb if memory serves) - maybe with
> those, NCQ will work better?
No one would know without testing.
> Or maybe it's libata which does not implement NCQ
> "properly"? (As I shown before, with almost all
> ol'good SCSI drives TCQ helps alot - up to 2x the
> difference and more - with multiple I/O threads)
Well, what the driver does is minimal. It just passes through all the
commands to the harddrive. After all, NCQ/TCQ gives the harddrive more
responsibility regarding request scheduling.
--
tejun
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-04 1:19 ` Tejun Heo
@ 2007-07-04 9:43 ` Michael Tokarev
2007-07-04 10:22 ` Justin Piszcz
2007-07-05 19:22 ` Bill Davidsen
2007-07-04 14:40 ` James Bottomley
1 sibling, 2 replies; 17+ messages in thread
From: Michael Tokarev @ 2007-07-04 9:43 UTC (permalink / raw)
To: Tejun Heo; +Cc: Kernel Mailing List, linux-ide, linux-scsi
Tejun Heo wrote:
> Hello,
>
> Michael Tokarev wrote:
>> Well. It looks like the results does not depend on the
>> elevator. Originally I tried with deadline, and just
>> re-ran the test with noop (hence the long delay with
>> the answer) - changing linux elevator changes almost
>> nothing in the results - modulo some random "fluctuations".
>
> I see. Thanks for testing.
Here are actual results - the tests were still running when
I replied yesterday.
Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM,
16Mb cache, 250Gb capacity. The tests were performed with
queue depth = 64 (on mptsas), drive write cache is turned
off.
noop scheduler:
BlkSz Trd linRd rndRd linWr rndWr rndR/W
4k 1 12.8 0.3 0.4 0.3 0.1/ 0.1
4 0.3 0.3 0.1/ 0.1
32 0.3 0.3 0.1/ 0.1
8k 1 24.6 0.6 0.9 0.6 0.3/ 0.3
4 0.6 0.6 0.3/ 0.3
32 0.6 0.6 0.3/ 0.3
16k 1 41.3 1.2 1.8 1.1 0.6/ 0.6
4 1.2 1.1 0.6/ 0.6
32 1.2 1.1 0.6/ 0.6
32k 1 58.4 2.2 3.5 2.1 1.1/ 1.1
4 2.3 2.1 1.1/ 1.1
32 2.3 2.1 1.1/ 1.1
128k 1 80.4 8.1 12.5 7.2 3.8/ 3.8
4 8.1 7.2 3.8/ 3.8
32 8.1 7.2 3.8/ 3.8
1024k 1 80.5 33.9 33.8 24.5 14.3/14.3
4 34.1 24.6 14.3/14.2
32 34.2 24.6 14.4/14.2
deadline scheduler:
BlkSz Trd linRd rndRd linWr rndWr rndR/W
4k 1 12.8 0.3 0.4 0.3 0.1/ 0.1
4 0.3 0.3 0.1/ 0.1
32 0.3 0.3 0.1/ 0.1
8k 1 24.5 0.6 0.9 0.6 0.3/ 0.3
4 0.6 0.6 0.3/ 0.3
32 0.6 0.6 0.3/ 0.3
16k 1 41.3 1.2 1.8 1.1 0.6/ 0.6
4 1.2 1.1 0.6/ 0.6
32 1.2 1.1 0.6/ 0.6
32k 1 57.7 2.3 3.4 2.1 1.1/ 1.1
4 2.3 2.1 1.1/ 1.1
32 2.3 2.1 1.1/ 1.1
128k 1 79.4 8.1 12.5 7.2 3.8/ 3.8
4 8.1 7.3 3.8/ 3.8
32 8.2 7.3 3.9/ 3.8
1024k 1 79.4 33.7 33.8 24.5 14.2/14.2
4 33.9 24.6 14.3/14.2
32 33.4 24.4 17.0/10.5
[]
>> By the way, Seagate announced Barracuda ES 2 series
>> (in range 500..1200Gb if memory serves) - maybe with
>> those, NCQ will work better?
>
> No one would know without testing.
Sure thing. I guess I'll set up a web page with all
the results so far, in a hope someday it will be more
complete (we don't have many different drives to test,
but others do).
By the way. Both SATA drives we have are single-platter
ones (with 500Gb models they've 2 platters, and 750Gb
ones are with 3 platters), while all SCSI drives I
tested have more than one platters. Maybe this is
yet another reason for NCQ failing.
And another note. I heard somewhere that Seagate for
one prohibits publishing of tests like this, however
I haven't signed any NDAs and somesuch when purchased
their drives in a nearest computer store... ;)
>> Or maybe it's libata which does not implement NCQ
>> "properly"? (As I shown before, with almost all
>> ol'good SCSI drives TCQ helps alot - up to 2x the
>> difference and more - with multiple I/O threads)
>
> Well, what the driver does is minimal. It just passes through all the
> commands to the harddrive. After all, NCQ/TCQ gives the harddrive more
> responsibility regarding request scheduling.
Oh well, I see.... :(
/mjt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-04 9:43 ` Michael Tokarev
@ 2007-07-04 10:22 ` Justin Piszcz
2007-07-04 10:33 ` Justin Piszcz
2007-07-09 12:26 ` Jens Axboe
2007-07-05 19:22 ` Bill Davidsen
1 sibling, 2 replies; 17+ messages in thread
From: Justin Piszcz @ 2007-07-04 10:22 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Tejun Heo, Kernel Mailing List, linux-ide, linux-scsi
On Wed, 4 Jul 2007, Michael Tokarev wrote:
> Tejun Heo wrote:
>> Hello,
>>
>> Michael Tokarev wrote:
>>> Well. It looks like the results does not depend on the
>>> elevator. Originally I tried with deadline, and just
>>> re-ran the test with noop (hence the long delay with
>>> the answer) - changing linux elevator changes almost
>>> nothing in the results - modulo some random "fluctuations".
>>
>> I see. Thanks for testing.
>
> Here are actual results - the tests were still running when
> I replied yesterday.
>
> Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM,
> 16Mb cache, 250Gb capacity. The tests were performed with
> queue depth = 64 (on mptsas), drive write cache is turned
> off.
I found AS scheduler to be the premium and best for single-user performance.
You want speed? Use AS.
http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline_vs_noop.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-04 10:22 ` Justin Piszcz
@ 2007-07-04 10:33 ` Justin Piszcz
2007-07-05 19:00 ` Bill Davidsen
2007-07-09 12:26 ` Jens Axboe
1 sibling, 1 reply; 17+ messages in thread
From: Justin Piszcz @ 2007-07-04 10:33 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Tejun Heo, Kernel Mailing List, linux-ide, linux-scsi
On Wed, 4 Jul 2007, Justin Piszcz wrote:
> On Wed, 4 Jul 2007, Michael Tokarev wrote:
>
> > Tejun Heo wrote:
> >> Hello,
> >>
> >> Michael Tokarev wrote:
> >>> Well. It looks like the results does not depend on the
> >>> elevator. Originally I tried with deadline, and just
> >>> re-ran the test with noop (hence the long delay with
> >>> the answer) - changing linux elevator changes almost
> >>> nothing in the results - modulo some random "fluctuations".
> >>
> >> I see. Thanks for testing.
> >
> > Here are actual results - the tests were still running when
> > I replied yesterday.
> >
> > Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM,
> > 16Mb cache, 250Gb capacity. The tests were performed with
> > queue depth = 64 (on mptsas), drive write cache is turned
> > off.
>
> I found AS scheduler to be the premium and best for single-user performance.
>
> You want speed? Use AS.
>
> http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline_vs_noop.html
>
>
Does not include noop-- tested the main three though, renamed :)
http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline.html
And for the archives:
p34-cfq,15696M,77114.3,99,311683,55.3333,184947,38.6667,79842.7,99,524065,41.3333,634.033,0.333333,16:100000:16/64,1043.33,8.33333,4419.33,11.6667,2942,17.3333,1178,10.3333,4192.67,12.3333,2619.33,19
p34-as,15696M,76202.3,99,443103,85,189716,34.6667,79552,99,507271,39.6667,607.067,0,16:100000:16/64,1153,10,13434,36,2769.67,16.3333,1201.67,10.6667,3951.33,12,2665.67,19
p34-deadline,15696M,76933.3,98.6667,386852,72,183016,29.6667,79530.7,99,512082,39.6667,678.567,0,16:100000:16/64,1230.33,10.3333,12349,32.3333,2945,17.3333,1258,11,8183,22.3333,2867,20.3333
Justin.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-04 1:19 ` Tejun Heo
2007-07-04 9:43 ` Michael Tokarev
@ 2007-07-04 14:40 ` James Bottomley
2007-07-09 12:26 ` Jens Axboe
1 sibling, 1 reply; 17+ messages in thread
From: James Bottomley @ 2007-07-04 14:40 UTC (permalink / raw)
To: Tejun Heo; +Cc: Michael Tokarev, Kernel Mailing List, linux-ide, linux-scsi
On Wed, 2007-07-04 at 10:19 +0900, Tejun Heo wrote:
> Michael Tokarev wrote:
> > Well. It looks like the results does not depend on the
> > elevator. Originally I tried with deadline, and just
> > re-ran the test with noop (hence the long delay with
> > the answer) - changing linux elevator changes almost
> > nothing in the results - modulo some random "fluctuations".
>
> I see. Thanks for testing.
>
> > In any case, NCQ - at least in this drive - just does
> > not work. Linux with its I/O elevator may help to
> > speed things up a bit, but the disk does nothing in
> > this area. NCQ doesn't slow things down either - it
> > just does not work.
> >
> > The same's for ST3250620NS "enterprise" drives.
> >
> > By the way, Seagate announced Barracuda ES 2 series
> > (in range 500..1200Gb if memory serves) - maybe with
> > those, NCQ will work better?
>
> No one would know without testing.
>
> > Or maybe it's libata which does not implement NCQ
> > "properly"? (As I shown before, with almost all
> > ol'good SCSI drives TCQ helps alot - up to 2x the
> > difference and more - with multiple I/O threads)
>
> Well, what the driver does is minimal. It just passes through all the
> commands to the harddrive. After all, NCQ/TCQ gives the harddrive more
> responsibility regarding request scheduling.
Actually, in many ways the result support a theory of SCSI TCQ Jens used
when designing the block layer. The original TCQ theory held that the
drive could make much better head scheduling decisions than the
Operating System, so you just used TCQ to pass all the outstanding I/O
unfiltered down to the drive to let it schedule. However, the I/O
results always seemed to indicate that the effect of TCQ was negligible
at around 4 outstanding commands, leading to the second theory that all
TCQ was good for was saturating the transport, and making scheduling
decisions was, indeed, better left to the OS (hence all our I/O
schedulers).
The key difference between NCQ and TCQ is that NCQ allows a non
interlock setup and completion, but there can't be overlapping (or
interrupted) data transfers. TCQ and Disconnect (for SPI although there
are equivalents for most other transports) allow any style of overlap
you can construct, so NCQ was really designed more to allow the drive to
make the head scheduling decisions.
Where SCSI TCQ seems to win is that most devices pull the incoming TCQ
commands into a (usually quite large) pre-execute cache, which gives
them streaming command execution (usually they're executing command n-2
or 3 while accepting the data for command n), so they're using the cache
actually to smooth out internal latencies.
One final question: have you tried SAS devices for comparison? The
figures that give TCQ a 2x performance boost were with SPI and FC ...
I'm not aware that anyone has actually done a SAS test.
James
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-06-28 10:51 Some NCQ numbers Michael Tokarev
2007-06-28 11:01 ` Michael Tokarev
2007-07-03 8:19 ` Tejun Heo
@ 2007-07-04 15:44 ` Dan Aloni
2007-07-04 16:17 ` Michael Tokarev
2 siblings, 1 reply; 17+ messages in thread
From: Dan Aloni @ 2007-07-04 15:44 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Kernel Mailing List, linux-ide, linux-scsi
On Thu, Jun 28, 2007 at 02:51:58PM +0400, Michael Tokarev wrote:
>[..]
> Test machine was using MPTSAS driver for the following card:
> SCSI storage controller: LSI Logic / Symbios Logic SAS1064E PCI-Express Fusion-MPT SAS (rev 02)
>
> Pretty similar results were obtained on an AHCI controller:
> SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01)
> on another machines.
Are you sure that NCQ was enabled between the controller and drive?
Did you verify this? I know about some versions that disable NCQ
support internally in their firmware (something to do with bugs in
error handling).
--
Dan Aloni
XIV LTD, http://www.xivstorage.com
da-x (at) monatomic.org, dan (at) xiv.co.il
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-04 15:44 ` Dan Aloni
@ 2007-07-04 16:17 ` Michael Tokarev
2007-07-04 16:44 ` Dan Aloni
0 siblings, 1 reply; 17+ messages in thread
From: Michael Tokarev @ 2007-07-04 16:17 UTC (permalink / raw)
To: Dan Aloni; +Cc: Kernel Mailing List, linux-ide, linux-scsi
Dan Aloni wrote:
> On Thu, Jun 28, 2007 at 02:51:58PM +0400, Michael Tokarev wrote:
>> [..]
>> Test machine was using MPTSAS driver for the following card:
>> SCSI storage controller: LSI Logic / Symbios Logic SAS1064E PCI-Express Fusion-MPT SAS (rev 02)
>>
>> Pretty similar results were obtained on an AHCI controller:
>> SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01)
>> on another machines.
>
> Are you sure that NCQ was enabled between the controller and drive?
> Did you verify this? I know about some versions that disable NCQ
> support internally in their firmware (something to do with bugs in
> error handling).
The next obvious question is: how to check/verify this?
/mjt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-04 16:17 ` Michael Tokarev
@ 2007-07-04 16:44 ` Dan Aloni
0 siblings, 0 replies; 17+ messages in thread
From: Dan Aloni @ 2007-07-04 16:44 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Kernel Mailing List, linux-ide, linux-scsi
On Wed, Jul 04, 2007 at 08:17:35PM +0400, Michael Tokarev wrote:
> Dan Aloni wrote:
> > On Thu, Jun 28, 2007 at 02:51:58PM +0400, Michael Tokarev wrote:
> >> [..]
> >> Test machine was using MPTSAS driver for the following card:
> >> SCSI storage controller: LSI Logic / Symbios Logic SAS1064E PCI-Express Fusion-MPT SAS (rev 02)
> >>
> >> Pretty similar results were obtained on an AHCI controller:
> >> SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01)
> >> on another machines.
> >
> > Are you sure that NCQ was enabled between the controller and drive?
> > Did you verify this? I know about some versions that disable NCQ
> > support internally in their firmware (something to do with bugs in
> > error handling).
>
> The next obvious question is: how to check/verify this?
On the lowest level, it's possible using a protocol analyzer. If you
don't have one, you need to be familiar with the controller's driver
or its firmware. If the driver is based on libata, I think it's
possible to get this information easier. Otherwise, such as in the
case of mptsas, it can be completely hidden by the firmware.
--
Dan Aloni
XIV LTD, http://www.xivstorage.com
da-x (at) monatomic.org, dan (at) xiv.co.il
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-04 10:33 ` Justin Piszcz
@ 2007-07-05 19:00 ` Bill Davidsen
2007-07-09 11:07 ` Justin Piszcz
0 siblings, 1 reply; 17+ messages in thread
From: Bill Davidsen @ 2007-07-05 19:00 UTC (permalink / raw)
To: Justin Piszcz
Cc: Michael Tokarev, Tejun Heo, Kernel Mailing List, linux-ide,
linux-scsi
Justin Piszcz wrote:
>
>
> On Wed, 4 Jul 2007, Justin Piszcz wrote:
>
>> On Wed, 4 Jul 2007, Michael Tokarev wrote:
>>
>> > Tejun Heo wrote:
>> >> Hello,
>> >>
>> >> Michael Tokarev wrote:
>> >>> Well. It looks like the results does not depend on the
>> >>> elevator. Originally I tried with deadline, and just
>> >>> re-ran the test with noop (hence the long delay with
>> >>> the answer) - changing linux elevator changes almost
>> >>> nothing in the results - modulo some random "fluctuations".
>> >>
>> >> I see. Thanks for testing.
>> >
>> > Here are actual results - the tests were still running when
>> > I replied yesterday.
>> >
>> > Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM,
>> > 16Mb cache, 250Gb capacity. The tests were performed with
>> > queue depth = 64 (on mptsas), drive write cache is turned
>> > off.
>>
>> I found AS scheduler to be the premium and best for single-user
>> performance.
>>
>> You want speed? Use AS.
>>
>> http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline_vs_noop.html
>>
>>
>
> Does not include noop-- tested the main three though, renamed :)
>
> http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline.html
>
> And for the archives:
>
> p34-cfq,15696M,77114.3,99,311683,55.3333,184947,38.6667,79842.7,99,524065,41.3333,634.033,0.333333,16:100000:16/64,1043.33,8.33333,4419.33,11.6667,2942,17.3333,1178,10.3333,4192.67,12.3333,2619.33,19
>
> p34-as,15696M,76202.3,99,443103,85,189716,34.6667,79552,99,507271,39.6667,607.067,0,16:100000:16/64,1153,10,13434,36,2769.67,16.3333,1201.67,10.6667,3951.33,12,2665.67,19
>
> p34-deadline,15696M,76933.3,98.6667,386852,72,183016,29.6667,79530.7,99,512082,39.6667,678.567,0,16:100000:16/64,1230.33,10.3333,12349,32.3333,2945,17.3333,1258,11,8183,22.3333,2867,20.3333
>
I looked at these before, did you really run with a chunk size of just
under 16GB, or does "15696M" have some inobvious meaning?
--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-04 9:43 ` Michael Tokarev
2007-07-04 10:22 ` Justin Piszcz
@ 2007-07-05 19:22 ` Bill Davidsen
1 sibling, 0 replies; 17+ messages in thread
From: Bill Davidsen @ 2007-07-05 19:22 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Tejun Heo, Kernel Mailing List, linux-ide, linux-scsi
Michael Tokarev wrote:
> Tejun Heo wrote:
>> Hello,
>>
>> Michael Tokarev wrote:
>>> Well. It looks like the results does not depend on the
>>> elevator. Originally I tried with deadline, and just
>>> re-ran the test with noop (hence the long delay with
>>> the answer) - changing linux elevator changes almost
>>> nothing in the results - modulo some random "fluctuations".
>> I see. Thanks for testing.
>
> Here are actual results - the tests were still running when
> I replied yesterday.
>
> Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM,
> 16Mb cache, 250Gb capacity. The tests were performed with
> queue depth = 64 (on mptsas), drive write cache is turned
> off.
>
But... with write cache off you don't let the drive do some things which
might show a lot of improvement with one scheduler or another. So your
data are only part of the story, aren't they?
[snip]
>>> By the way, Seagate announced Barracuda ES 2 series
>>> (in range 500..1200Gb if memory serves) - maybe with
>>> those, NCQ will work better?
>> No one would know without testing.
>
> Sure thing. I guess I'll set up a web page with all
> the results so far, in a hope someday it will be more
> complete (we don't have many different drives to test,
> but others do).
>
> By the way. Both SATA drives we have are single-platter
> ones (with 500Gb models they've 2 platters, and 750Gb
> ones are with 3 platters), while all SCSI drives I
> tested have more than one platters. Maybe this is
> yet another reason for NCQ failing.
>
> And another note. I heard somewhere that Seagate for
> one prohibits publishing of tests like this, however
> I haven't signed any NDAs and somesuch when purchased
> their drives in a nearest computer store... ;)
>
>>> Or maybe it's libata which does not implement NCQ
>>> "properly"? (As I shown before, with almost all
>>> ol'good SCSI drives TCQ helps alot - up to 2x the
>>> difference and more - with multiple I/O threads)
>> Well, what the driver does is minimal. It just passes through all the
>> commands to the harddrive. After all, NCQ/TCQ gives the harddrive more
>> responsibility regarding request scheduling.
>
> Oh well, I see.... :(
>
--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-05 19:00 ` Bill Davidsen
@ 2007-07-09 11:07 ` Justin Piszcz
0 siblings, 0 replies; 17+ messages in thread
From: Justin Piszcz @ 2007-07-09 11:07 UTC (permalink / raw)
To: Bill Davidsen
Cc: Michael Tokarev, Tejun Heo, Kernel Mailing List, linux-ide,
linux-scsi
On Thu, 5 Jul 2007, Bill Davidsen wrote:
> Justin Piszcz wrote:
>>
>>
>> On Wed, 4 Jul 2007, Justin Piszcz wrote:
>>
>>> On Wed, 4 Jul 2007, Michael Tokarev wrote:
>>>
>>> > Tejun Heo wrote:
>>> >> Hello,
>>> >>
>>> >> Michael Tokarev wrote:
>>> >>> Well. It looks like the results does not depend on the
>>> >>> elevator. Originally I tried with deadline, and just
>>> >>> re-ran the test with noop (hence the long delay with
>>> >>> the answer) - changing linux elevator changes almost
>>> >>> nothing in the results - modulo some random "fluctuations".
>>> >>
>>> >> I see. Thanks for testing.
>>> >
>>> > Here are actual results - the tests were still running when
>>> > I replied yesterday.
>>> >
>>> > Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM,
>>> > 16Mb cache, 250Gb capacity. The tests were performed with
>>> > queue depth = 64 (on mptsas), drive write cache is turned
>>> > off.
>>>
>>> I found AS scheduler to be the premium and best for single-user
>>> performance.
>>>
>>> You want speed? Use AS.
>>>
>>> http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline_vs_noop.html
>>>
>>>
>>
>> Does not include noop-- tested the main three though, renamed :)
>>
>> http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline.html
>>
>> And for the archives:
>>
>> p34-cfq,15696M,77114.3,99,311683,55.3333,184947,38.6667,79842.7,99,524065,41.3333,634.033,0.333333,16:100000:16/64,1043.33,8.33333,4419.33,11.6667,2942,17.3333,1178,10.3333,4192.67,12.3333,2619.33,19
>> p34-as,15696M,76202.3,99,443103,85,189716,34.6667,79552,99,507271,39.6667,607.067,0,16:100000:16/64,1153,10,13434,36,2769.67,16.3333,1201.67,10.6667,3951.33,12,2665.67,19
>> p34-deadline,15696M,76933.3,98.6667,386852,72,183016,29.6667,79530.7,99,512082,39.6667,678.567,0,16:100000:16/64,1230.33,10.3333,12349,32.3333,2945,17.3333,1258,11,8183,22.3333,2867,20.3333
> I looked at these before, did you really run with a chunk size of just under
> 16GB, or does "15696M" have some inobvious meaning?
>
> --
> Bill Davidsen <davidsen@tmr.com>
> "We have more to fear from the bungling of the incompetent than from
> the machinations of the wicked." - from Slashdot
>
It says to use double your RAM, your RAM is 7848, so that is why I use
15696M :)
I did some tests recently, it appears JFS is 20-60MB/s faster for
sequential read/writes/re-writes but it does not have a defrag tool,
defragfs but its not included in Debian and people say not to use it on
Google/so I am not sure I want to go there.
Justin.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-04 14:40 ` James Bottomley
@ 2007-07-09 12:26 ` Jens Axboe
0 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-07-09 12:26 UTC (permalink / raw)
To: James Bottomley
Cc: Tejun Heo, Michael Tokarev, Kernel Mailing List, linux-ide,
linux-scsi
On Wed, Jul 04 2007, James Bottomley wrote:
> On Wed, 2007-07-04 at 10:19 +0900, Tejun Heo wrote:
> > Michael Tokarev wrote:
> > > Well. It looks like the results does not depend on the
> > > elevator. Originally I tried with deadline, and just
> > > re-ran the test with noop (hence the long delay with
> > > the answer) - changing linux elevator changes almost
> > > nothing in the results - modulo some random "fluctuations".
> >
> > I see. Thanks for testing.
> >
> > > In any case, NCQ - at least in this drive - just does
> > > not work. Linux with its I/O elevator may help to
> > > speed things up a bit, but the disk does nothing in
> > > this area. NCQ doesn't slow things down either - it
> > > just does not work.
> > >
> > > The same's for ST3250620NS "enterprise" drives.
> > >
> > > By the way, Seagate announced Barracuda ES 2 series
> > > (in range 500..1200Gb if memory serves) - maybe with
> > > those, NCQ will work better?
> >
> > No one would know without testing.
> >
> > > Or maybe it's libata which does not implement NCQ
> > > "properly"? (As I shown before, with almost all
> > > ol'good SCSI drives TCQ helps alot - up to 2x the
> > > difference and more - with multiple I/O threads)
> >
> > Well, what the driver does is minimal. It just passes through all the
> > commands to the harddrive. After all, NCQ/TCQ gives the harddrive more
> > responsibility regarding request scheduling.
>
> Actually, in many ways the result support a theory of SCSI TCQ Jens used
> when designing the block layer. The original TCQ theory held that the
> drive could make much better head scheduling decisions than the
> Operating System, so you just used TCQ to pass all the outstanding I/O
> unfiltered down to the drive to let it schedule. However, the I/O
> results always seemed to indicate that the effect of TCQ was negligible
> at around 4 outstanding commands, leading to the second theory that all
> TCQ was good for was saturating the transport, and making scheduling
> decisions was, indeed, better left to the OS (hence all our I/O
> schedulers).
Indeed, the above I still find to be true. The only real case where
larger depths make a real difference, is a pure random reads (or writes,
with write caching off) workload. And those situations are largely
synthetic, hence benchmarks tend to show NCQ being a lot more beneficial
since they construct workloads that consist 100% of random IO. Real life
is rarely so black and white.
Additionally, there are cases where drive queue depths hurt a lot. The
drive has no knowledge of fairness, or process-to-io mappings. So AS/CFQ
has to artificially limit queue depths competing IO processes doing
semi (or fully) sequential workloads, or throughput plummets.
So while NCQ has some benefits, I typically tend to prefer managing the
IO queue largely in software instead of punting to (often) buggy
firmware.
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Some NCQ numbers...
2007-07-04 10:22 ` Justin Piszcz
2007-07-04 10:33 ` Justin Piszcz
@ 2007-07-09 12:26 ` Jens Axboe
1 sibling, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-07-09 12:26 UTC (permalink / raw)
To: Justin Piszcz
Cc: Michael Tokarev, Tejun Heo, Kernel Mailing List, linux-ide,
linux-scsi
On Wed, Jul 04 2007, Justin Piszcz wrote:
> On Wed, 4 Jul 2007, Michael Tokarev wrote:
>
> > Tejun Heo wrote:
> >> Hello,
> >>
> >> Michael Tokarev wrote:
> >>> Well. It looks like the results does not depend on the
> >>> elevator. Originally I tried with deadline, and just
> >>> re-ran the test with noop (hence the long delay with
> >>> the answer) - changing linux elevator changes almost
> >>> nothing in the results - modulo some random "fluctuations".
> >>
> >> I see. Thanks for testing.
> >
> > Here are actual results - the tests were still running when
> > I replied yesterday.
> >
> > Again, this is Seagate ST3250620AS "desktop" drive, 7200RPM,
> > 16Mb cache, 250Gb capacity. The tests were performed with
> > queue depth = 64 (on mptsas), drive write cache is turned
> > off.
>
> I found AS scheduler to be the premium and best for single-user performance.
>
> You want speed? Use AS.
>
> http://home.comcast.net/~jpiszcz/sched/cfq_vs_as_vs_deadline_vs_noop.html
Hmm, I find your data very weak for such a conclusion. Value of the test
itself withstanding, AS seems to be a lot faster for sequential output
for some reason, yet slower for everything else. Which is odd, deadline
should always be running at the same speed for writeout as AS. The only
real difference should be sequential and random reads.
So allow me to call your results questionable. It also looks like bonnie
(some version) output, I never found bonnie to provide good and
repeatable numbers. tiotest is much better, or (of course) fio.
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-07-09 12:26 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-28 10:51 Some NCQ numbers Michael Tokarev
2007-06-28 11:01 ` Michael Tokarev
2007-07-03 8:19 ` Tejun Heo
2007-07-03 20:29 ` Michael Tokarev
2007-07-04 1:19 ` Tejun Heo
2007-07-04 9:43 ` Michael Tokarev
2007-07-04 10:22 ` Justin Piszcz
2007-07-04 10:33 ` Justin Piszcz
2007-07-05 19:00 ` Bill Davidsen
2007-07-09 11:07 ` Justin Piszcz
2007-07-09 12:26 ` Jens Axboe
2007-07-05 19:22 ` Bill Davidsen
2007-07-04 14:40 ` James Bottomley
2007-07-09 12:26 ` Jens Axboe
2007-07-04 15:44 ` Dan Aloni
2007-07-04 16:17 ` Michael Tokarev
2007-07-04 16:44 ` Dan Aloni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).