Re: [PATCH 2/2] blk-plug: don't flush nested plug lists

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Jeff Moyer <jmoyer@redhat.com>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] blk-plug: don't flush nested plug lists
Date: Wed, 08 Apr 2015 11:38:14 -0600	[thread overview]
Message-ID: <55256786.8000608@kernel.dk> (raw)
In-Reply-To: <1428347694-17704-2-git-send-email-jmoyer@redhat.com>

On 04/06/2015 01:14 PM, Jeff Moyer wrote:
> The way the on-stack plugging currently works, each nesting level
> flushes its own list of I/Os.  This can be less than optimal (read
> awful) for certain workloads.  For example, consider an application
> that issues asynchronous O_DIRECT I/Os.  It can send down a bunch of
> I/Os together in a single io_submit call, only to have each of them
> dispatched individually down in the bowells of the dirct I/O code.
> The reason is that there are blk_plug's instantiated both at the upper
> call site in do_io_submit and down in do_direct_IO.  The latter will
> submit as little as 1 I/O at a time (if you have a small enough I/O
> size) instead of performing the batching that the plugging
> infrastructure is supposed to provide.
>
> Now, for the case where there is an elevator involved, this doesn't
> really matter too much.  The elevator will keep the I/O around long
> enough for it to be merged.  However, in cases where there is no
> elevator (like blk-mq), I/Os are simply dispatched immediately.
>
> Try this, for example (note I'm using a virtio-blk device, so it's
> using the blk-mq single queue path, though I've also reproduced this
> with the micron p320h):
>
> fio --rw=read --bs=4k --iodepth=128 --iodepth_batch=16 --iodepth_batch_complete=16 --runtime=10s --direct=1 --filename=/dev/vdd --name=job1 --ioengine=libaio --time_based
>
> If you run that on a current kernel, you will get zero merges.  Zero!
> After this patch, you will get many merges (the actual number depends
> on how fast your storage is, obviously), and much better throughput.
> Here are results from my test rig:
>
> Unpatched kernel:
> Read B/W:    283,638 KB/s
> Read Merges: 0
>
> Patched kernel:
> Read B/W:    873,224 KB/s
> Read Merges: 2,046K
>
> I considered several approaches to solving the problem:
> 1)  get rid of the inner-most plugs
> 2)  handle nesting by using only one on-stack plug
> 2a) #2, except use a per-cpu blk_plug struct, which may clean up the
>      code a bit at the expense of memory footprint
>
> Option 1 will be tricky or impossible to do, since inner most plug
> lists are sometimes the only plug lists, depending on the call path.
> Option 2 is what this patch implements.  Option 2a is perhaps a better
> idea, but since I already implemented option 2, I figured I'd post it
> for comments and opinions before rewriting it.
>
> Much of the patch involves modifying call sites to blk_start_plug,
> since its signature is changed.  The meat of the patch is actually
> pretty simple and constrained to block/blk-core.c and
> include/linux/blkdev.h.  The only tricky bits were places where plugs
> were finished and then restarted to flush out I/O.  There, I went
> ahead and exported blk_flush_plug_list and called that directly.
>
> Comments would be greatly appreciated.

It's hard to argue with the increased merging for your case. The task 
plugs did originally work like you changed them to, not flushing until 
the outermost plug was flushed. Unfortunately I don't quite remember why 
I changed them, will have to do a bit of digging to refresh my memory.

For cases where we don't do any merging (like nvme), we always want to 
flush. Well almost, if we start do utilize the batched submission, then 
the plug would still potentially help (just for other reasons than merging).

And agree with Ming, this can be cleaned up substantially. I'd also like 
to see some test results from the other end of the spectrum. Your posted 
cased is clearly based case (we missed tons of merging, now we don't), 
I'd like to see a normal case and a worst case result as well so we have 
an idea of what this would do to latencies.

-- 
Jens Axboe

next prev parent reply	other threads:[~2015-04-08 17:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-06 19:14 [PATCH 1/2] [v2] blk-mq: fix plugging in blk_sq_make_request Jeff Moyer
2015-04-06 19:14 ` [PATCH 2/2] blk-plug: don't flush nested plug lists Jeff Moyer
2015-04-07  9:19   ` Ming Lei
2015-04-07 14:47     ` Jeff Moyer
2015-04-08 17:38   ` Jens Axboe [this message]
2015-04-07 16:09     ` Shaohua Li
2015-04-08 17:56     ` Jeff Moyer
2015-04-16 15:47     ` Jeff Moyer
     [not found]   ` <x49wq1nrcoe.fsf_-_@segfault.boston.devel.redhat.com>
     [not found]     ` <20150408230203.GG15810@dastard>
2015-04-09  0:54       ` [PATCH 2/2][v2] " Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55256786.8000608@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox