* Tune IO stack to get max IOPS out of AIO
@ 2013-01-31 3:04 Alireza Haghdoost
2013-01-31 9:30 ` Hearns, John
2013-01-31 11:02 ` Jens Axboe
0 siblings, 2 replies; 3+ messages in thread
From: Alireza Haghdoost @ 2013-01-31 3:04 UTC (permalink / raw)
To: linux-aio, fio
Hello
I am trying to tune Linux IO stack to maximize application IOPS. I was
wondering if there is any other parameter that I am missing to tune up
?
Right now I am using a raw block device to write sequential AIO
requests with FIO and set :
1. max possible value for libaio completion queue,
2. max possible value for IO scheduler queue size
(/sys/block/sda/queue/nr_requests)
3. max possible value for generic device driver queue depth
(/sys/block/sda/device/queue_depth)
4. noop IO scheduler
6. disable IO merge ( echo 2 > /sys/block/sda/queueu/nomerge )
Note that the device (/dev/sda) is attached to the server over the
network. Therefore, generic device driver will send IOs to Fiber
Channel network driver (Qlogic qla2xxx).
Please kindly advise how can I push more IO per second ?
Thanks
Alireza
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: Tune IO stack to get max IOPS out of AIO
2013-01-31 3:04 Tune IO stack to get max IOPS out of AIO Alireza Haghdoost
@ 2013-01-31 9:30 ` Hearns, John
2013-01-31 11:02 ` Jens Axboe
1 sibling, 0 replies; 3+ messages in thread
From: Hearns, John @ 2013-01-31 9:30 UTC (permalink / raw)
To: Alireza Haghdoost, fio@vger.kernel.org
Please kindly advise how can I push more IO per second ?
Maybe get some solid state drives?
Or one of the solid state storage arrays which attach via the PCI express bus,
like violin Memory etc. ?
The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Tune IO stack to get max IOPS out of AIO
2013-01-31 3:04 Tune IO stack to get max IOPS out of AIO Alireza Haghdoost
2013-01-31 9:30 ` Hearns, John
@ 2013-01-31 11:02 ` Jens Axboe
1 sibling, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2013-01-31 11:02 UTC (permalink / raw)
To: Alireza Haghdoost; +Cc: linux-aio, fio
On Wed, Jan 30 2013, Alireza Haghdoost wrote:
> Hello
>
> I am trying to tune Linux IO stack to maximize application IOPS. I was
> wondering if there is any other parameter that I am missing to tune up
> ?
>
> Right now I am using a raw block device to write sequential AIO
> requests with FIO and set :
> 1. max possible value for libaio completion queue,
> 2. max possible value for IO scheduler queue size
> (/sys/block/sda/queue/nr_requests)
> 3. max possible value for generic device driver queue depth
> (/sys/block/sda/device/queue_depth)
> 4. noop IO scheduler
> 6. disable IO merge ( echo 2 > /sys/block/sda/queueu/nomerge )
>
> Note that the device (/dev/sda) is attached to the server over the
> network. Therefore, generic device driver will send IOs to Fiber
> Channel network driver (Qlogic qla2xxx).
>
> Please kindly advise how can I push more IO per second ?
It's hard to answer this kind of question, since there are many things
that can be optimized. You really need a good understanding of the full
stack (app to device) and profiles/ideas on where the bottlenecks are.
On the fio side, you can relatively easily set batch parameters for
submitting and completing io - these are the iodepth_batch_submit= and
iodepth_batch_complete= settings. That will make fio submit and complete
multiple IOs at once, thus reducing the system overhead in doing that.
Now, whether that has an actual impact on your performance (or is even a
realistic thing to do, I'm assuming you are using fio to model what your
application would do), that's another question.
Your suggestions should help reduce the overhead too, though increasing
the queue depth beyond the existing 256 will usually not yield much of a
benefit. The basic hash merging is also very cheap, so unless you truly
only have random IO, it might be worth keeping.
You can also experiment with completion locations, that's the
rq_affinity setting in the same queue/ directory. A value of 1 will
migrate completions to/near the group that submitted the IO, a value of
2 will migrate it to the specific CPU. What makes sense depends on the
load of the CPU, and how costly the completion is...
Most good settings will have to be experimentally deduced. But the
better idea you have of where the problems are, the more you can
logically eliminate or prefer some settings.
--
Jens Axboe
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-01-31 11:03 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-31 3:04 Tune IO stack to get max IOPS out of AIO Alireza Haghdoost
2013-01-31 9:30 ` Hearns, John
2013-01-31 11:02 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox