How to get more sequential IO merged at elevator

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* How to get more sequential IO merged at elevator
@ 2014-04-30  7:02 Desai, Kashyap
  0 siblings, 0 replies; 5+ messages in thread
From: Desai, Kashyap @ 2014-04-30  7:02 UTC (permalink / raw)
  To: axboe@kernel.dk; +Cc: linux-scsi@vger.kernel.org

Jens,

While working on one issue of less IOPs for sequential READ/WRITE, I found interesting stuffs which was causing performance drop for sequential IO.  I did some reverse engineering on Block layer code to understand how to get benefit from any sysfs parameters settings, but could not find anything useful to solve this issue. 

I have described problem statement and root cause of this issue in this mail thread. 

Problem statement - "Cannot achieve Sequential read/write performance, because of back merge is not happening frequently"

Here is my understanding of back merge done in elevator.

Linux block layer is responsible for merging/sorting of the IO with the help of Elevator hook in OS + IO scheduler.
IO scheduler does not have any role in merging sequential IO. It is done in Elevator hook, so choosing any IO scheduler in Linux will not help (OR you can consider that behavior will be unchanged irrespective of IO scheduler). Any sequential IO will be merge at Elevator code path.

1. When IO comes from upper layer, it will be queued at Elevator/IO scheduler level.  It will also add IO into hash look up which will be used for merge and other purpose.
2. Elevator code will search any outstanding IO (in the queue of the same layer). If there is any chance to merge it, it will perform  (BACK MERGE) merge 
3. If there is no merge possible, IO will be queued to the next level (which is IO scheduler).
4. In IO completion Path, IO scheduler will post IO to the Driver queue, if at all there is any outstanding IO. (There are many other condition, but this is very common code path)

To merge more command, #2 should find more outstanding in hash table look up. This is possible if flow control start either at block layer/Driver level.
It means, driver/block layer forcefully delay IO submission to next level and give more chance at elevator code to merge more IO via accumulating more IO from user space.

If I manually change queue depth of the device to lower value (between 1-8), which is doing only Sequential IO.. I am able to see maximum IO coming to the driver after merge and it eventually increase the IOPs.

Is there any way to increase possibility of merged IO coming from block layer to the Low level driver ?

Thanks, Kashyap
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: How to get more sequential IO merged at elevator
@ 2014-05-06 10:06 Desai, Kashyap
  2014-05-06 14:16 ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Desai, Kashyap @ 2014-05-06 10:06 UTC (permalink / raw)
  To: axboe@kernel.dk; +Cc: linux-scsi@vger.kernel.org

I got some clue on what was going on while doing 4K sequential read using fio.

If I use ioengine in fio script as "libaio", I see " do_io_submit" call plug/unplug the queue before submitting the IO.
It means after every IO, we expect to send IO immediately to the next layer. If at all there are any pending IO do merge.. but not due to plugging.

This is what happens on my test. Every time IO comes from application with libaio engine, it send down to the elevator/io-scheduler because queue was unplugged in 
do_io_submit(). Moment I reduce the queue depth of the block device, merge start because of congestion at scsi mid layer.

If I use, mmap engine, I see merged IO coming to the device driver because of plugging. I really don't know how it works, but gave a try and found merge happen because of plugging. ( I confirm using blktrace)

Is there any ioengine in <fio> (or any other parameter setting), which can use plugging mechanism of block layer to merge more IO other than mmap ?

~ Kashyap

> -----Original Message-----
> From: Desai, Kashyap
> Sent: Wednesday, April 30, 2014 12:33 PM
> To: 'axboe@kernel.dk'
> Cc: linux-scsi@vger.kernel.org
> Subject: How to get more sequential IO merged at elevator
> 
> Jens,
> 
> While working on one issue of less IOPs for sequential READ/WRITE, I found
> interesting stuffs which was causing performance drop for sequential IO.  I
> did some reverse engineering on Block layer code to understand how to get
> benefit from any sysfs parameters settings, but could not find anything
> useful to solve this issue.
> 
> I have described problem statement and root cause of this issue in this mail
> thread.
> 
> Problem statement - "Cannot achieve Sequential read/write performance,
> because of back merge is not happening frequently"
> 
> Here is my understanding of back merge done in elevator.
> 
> Linux block layer is responsible for merging/sorting of the IO with the help of
> Elevator hook in OS + IO scheduler.
> IO scheduler does not have any role in merging sequential IO. It is done in
> Elevator hook, so choosing any IO scheduler in Linux will not help (OR you can
> consider that behavior will be unchanged irrespective of IO scheduler). Any
> sequential IO will be merge at Elevator code path.
> 
> 1. When IO comes from upper layer, it will be queued at Elevator/IO
> scheduler level.  It will also add IO into hash look up which will be used for
> merge and other purpose.
> 2. Elevator code will search any outstanding IO (in the queue of the same
> layer). If there is any chance to merge it, it will perform  (BACK MERGE)
> merge 3. If there is no merge possible, IO will be queued to the next level
> (which is IO scheduler).
> 4. In IO completion Path, IO scheduler will post IO to the Driver queue, if at all
> there is any outstanding IO. (There are many other condition, but this is very
> common code path)
> 
> To merge more command, #2 should find more outstanding in hash table
> look up. This is possible if flow control start either at block layer/Driver level.
> It means, driver/block layer forcefully delay IO submission to next level and
> give more chance at elevator code to merge more IO via accumulating more
> IO from user space.
> 
> If I manually change queue depth of the device to lower value (between 1-
> 8), which is doing only Sequential IO.. I am able to see maximum IO coming to
> the driver after merge and it eventually increase the IOPs.
> 
> Is there any way to increase possibility of merged IO coming from block layer
> to the Low level driver ?
> 
> Thanks, Kashyap
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to get more sequential IO merged at elevator
  2014-05-06 10:06 How to get more sequential IO merged at elevator Desai, Kashyap
@ 2014-05-06 14:16 ` Jens Axboe
  2014-05-07 13:21   ` Kashyap Desai
  2014-05-07 14:25   ` Kashyap Desai
  0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2014-05-06 14:16 UTC (permalink / raw)
  To: Desai, Kashyap; +Cc: linux-scsi@vger.kernel.org

On 05/06/2014 04:06 AM, Desai, Kashyap wrote:
> I got some clue on what was going on while doing 4K sequential read using fio.
> 
> If I use ioengine in fio script as "libaio", I see " do_io_submit" call plug/unplug the queue before submitting the IO.
> It means after every IO, we expect to send IO immediately to the next layer. If at all there are any pending IO do merge.. but not due to plugging.
> 
> This is what happens on my test. Every time IO comes from application with libaio engine, it send down to the elevator/io-scheduler because queue was unplugged in 
> do_io_submit(). Moment I reduce the queue depth of the block device, merge start because of congestion at scsi mid layer.
> 
> If I use, mmap engine, I see merged IO coming to the device driver because of plugging. I really don't know how it works, but gave a try and found merge happen because of plugging. ( I confirm using blktrace)
> 
> Is there any ioengine in <fio> (or any other parameter setting), which can use plugging mechanism of block layer to merge more IO other than mmap ?

O_DIRECT IO is sync by nature, which is why it is sent off immediately
instead of held for potentially merging. mmap is not. You should be able
to provoke merging by submitting more than 1 IO at the time. See the
iodepth_batch settings for fio.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: How to get more sequential IO merged at elevator
  2014-05-06 14:16 ` Jens Axboe
@ 2014-05-07 13:21   ` Kashyap Desai
  2014-05-07 14:25   ` Kashyap Desai
  1 sibling, 0 replies; 5+ messages in thread
From: Kashyap Desai @ 2014-05-07 13:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-scsi

> -----Original Message-----
> From: Jens Axboe [mailto:axboe@kernel.dk]
> Sent: Tuesday, May 06, 2014 7:47 PM
> To: Desai, Kashyap
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: How to get more sequential IO merged at elevator
>
> On 05/06/2014 04:06 AM, Desai, Kashyap wrote:
> > I got some clue on what was going on while doing 4K sequential read
using
> fio.
> >
> > If I use ioengine in fio script as "libaio", I see " do_io_submit"
call
> plug/unplug the queue before submitting the IO.
> > It means after every IO, we expect to send IO immediately to the next
> layer. If at all there are any pending IO do merge.. but not due to
plugging.
> >
> > This is what happens on my test. Every time IO comes from application
> > with libaio engine, it send down to the elevator/io-scheduler because
> queue was unplugged in do_io_submit(). Moment I reduce the queue depth
> of the block device, merge start because of congestion at scsi mid
layer.
> >
> > If I use, mmap engine, I see merged IO coming to the device driver
> > because of plugging. I really don't know how it works, but gave a try
> > and found merge happen because of plugging. ( I confirm using
> > blktrace)
> >
> > Is there any ioengine in <fio> (or any other parameter setting), which
can
> use plugging mechanism of block layer to merge more IO other than mmap ?
>
> O_DIRECT IO is sync by nature, which is why it is sent off immediately
instead
> of held for potentially merging. mmap is not. You should be able to
provoke
> merging by submitting more than 1 IO at the time. See the iodepth_batch
> settings for fio.

Thanks Jens. I got your point about O_DIRECT IO. When I read <man> page of
<fio> it mentioned default iodepth_batch and iodepth_low will be same as
iodepth value.. but in my case I have to explicitly provide those
parameters. It looks like default value for ipdepth_bacch/low is 1.

Once I provide fio parameter "iodepth_batch=32" and "iodepth_low=32", I
see IO in batch before plug and after unplug from application.
I am able to get now max merged IO. Thanks for helping me on this.

~ Kashyap

>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: How to get more sequential IO merged at elevator
  2014-05-06 14:16 ` Jens Axboe
  2014-05-07 13:21   ` Kashyap Desai
@ 2014-05-07 14:25   ` Kashyap Desai
  1 sibling, 0 replies; 5+ messages in thread
From: Kashyap Desai @ 2014-05-07 14:25 UTC (permalink / raw)
  To: Jens Axboe, kashyap.desai; +Cc: linux-scsi

> -----Original Message-----
> From: Jens Axboe [mailto:axboe@kernel.dk]
> Sent: Tuesday, May 06, 2014 7:47 PM
> To: Desai, Kashyap
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: How to get more sequential IO merged at elevator
>
> On 05/06/2014 04:06 AM, Desai, Kashyap wrote:
> > I got some clue on what was going on while doing 4K sequential read
using
> fio.
> >
> > If I use ioengine in fio script as "libaio", I see " do_io_submit"
call
> plug/unplug the queue before submitting the IO.
> > It means after every IO, we expect to send IO immediately to the next
> layer. If at all there are any pending IO do merge.. but not due to
plugging.
> >
> > This is what happens on my test. Every time IO comes from application
> > with libaio engine, it send down to the elevator/io-scheduler because
> queue was unplugged in do_io_submit(). Moment I reduce the queue depth
> of the block device, merge start because of congestion at scsi mid
layer.
> >
> > If I use, mmap engine, I see merged IO coming to the device driver
> > because of plugging. I really don't know how it works, but gave a try
> > and found merge happen because of plugging. ( I confirm using
> > blktrace)
> >
> > Is there any ioengine in <fio> (or any other parameter setting), which
can
> use plugging mechanism of block layer to merge more IO other than mmap ?
>
> O_DIRECT IO is sync by nature, which is why it is sent off immediately
instead
> of held for potentially merging. mmap is not. You should be able to
provoke
> merging by submitting more than 1 IO at the time. See the iodepth_batch
> settings for fio.

Thanks Jens. I got your point about O_DIRECT IO. When I read <man> page of
<fio> it mentioned default iodepth_batch and iodepth_low will be same as
iodepth value.. but in my case I have to explicitly provide those
parameters. (I explore this after you pointed out ). It looks like default
value for iodepth_batch/low is not same as iodepth.

Once I provide fio parameter "iodepth_batch=32" and "iodepth_low=32", I
see IO in batch before plug and after unplug from application.
I am able to get now max merged IO. Thanks for helping me on this.

~ Kashyap
>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-07 14:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-06 10:06 How to get more sequential IO merged at elevator Desai, Kashyap
2014-05-06 14:16 ` Jens Axboe
2014-05-07 13:21   ` Kashyap Desai
2014-05-07 14:25   ` Kashyap Desai
  -- strict thread matches above, loose matches on Subject: below --
2014-04-30  7:02 Desai, Kashyap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox