Re: [RFC] blk-mq and I/O scheduling

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Andreas Herrmann <aherrmann@suse.com>, Christoph Hellwig <hch@lst.de>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC] blk-mq and I/O scheduling
Date: Wed, 25 Nov 2015 12:47:21 -0700	[thread overview]
Message-ID: <56561049.7000201@kernel.dk> (raw)
In-Reply-To: <20151119120235.GA7966@suselix.suse.de>

On 11/19/2015 05:02 AM, Andreas Herrmann wrote:
> Hi,
>
> I've looked into blk-mq and possible support for I/O scheduling.
>
> The reason for this is to minimize performance degradation with
> rotational devices when scsi_mod.use_blk_mq=1 is switched on.
>
> I think that the degradation is well reflected with fio measurements.
> With an increasing number of jobs you'll encounter a significant
> performance drop for sequential reads and writes with blk-mq in
> contrast to CFQ. blk-mq ensures that requests from different processes
> (CPUs) are "perfectly shuffled" in a hardware queue. This is no
> problem for non-rotational devices for which blk-mq is aimed for but
> not so nice for rotational disks.
>
>    (i) I've done some tests with patch c2ed2f2dcf92 (blk-mq: first cut
>        deadline scheduling) from branch mq-deadline of linux-block
>        repository. I've not seen a significant performance impact when
>        enabling it (neither for non-rotational nor for rotational
>        disks).
>
>   (ii) I've played with code to enable sorting/merging of requests. I
>        did this in flush_busy_ctxs. This didn't have a performance
>        impact either. On a closer look this was due to high frequency
>        of calls to __blk_mq_run_hw_queue. There was almost nothing to
>        sort (too few requests). I guess that's also the reason why (i)
>        had not much impact.
>
> (iii) With CFQ I've observed similar performance patterns to blk-mq if
>        slice_idle was set to 0.
>
>   (iv) I thought about introducing a per software queue time slice
>        during which blk-mq will service only one software queue (one
>        CPU) and not flush all software queues. This could help to
>        enqueue multiple requests belonging to the same process (as long
>        as it runs on same CPU) into a hardware queue.  A minimal patch
>        to implement this is attached below.
>
> The latter helped to improve performance for sequential reads and
> writes. But it's not on a par with CFQ. Increasing the time slice is
> suboptimal (as shown with the 2ms results, see below). It might be
> possible to get better performance when further reducing the initial
> time slice and adapting it up to a maximum value if there are
> repeatedly pending requests for a CPU.
>
> After these observations and assuming that non-rotational devices are
> most likely fine using blk-mq without I/O scheduling support I wonder
> whether
>
> - it's really a good idea to re-implement scheduling support for
>    blk-mq that eventually behaves like CFQ for rotational devices.
>
> - it's technical possible to support both blk-mq and CFQ for different
>    devices on the same host adapter. This would allow to use "good old"
>    code for "good old" rotational devices. (But this might not be a
>    choice if in the long run a goal is to get rid of non-blk-mq code --
>    not sure what the plans are.)
>
> What do you think about this?

Sorry I did not get around to properly looking at this this week, I'll 
tend to it next week. I think the concept of tying the idling to a 
specific CPU is likely fine, though I wonder if there are cases where we 
preempt more heavily and subsequently miss breaking the idling properly. 
I don't think we want/need cfq for blk-mq, but basic idling could 
potentially be enough. That's still a far cry from a full cfq 
implementation. The long term plans are still to move away from the 
legacy IO path, though with things like scheduling, that's sure to take 
some time.

That is actually where the mq-deadline work comes in. The mq-deadline 
work is missing a test patch to limit tag allocations, and a bunch of 
other little things to truly make it functional. There might be some 
options for folding it all together, with idling, as that would still be 
important on rotating storage going forward.

-- 
Jens Axboe

next prev parent reply	other threads:[~2015-11-25 19:48 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-19 12:02 [RFC] blk-mq and I/O scheduling Andreas Herrmann
2015-11-24  8:19 ` Christoph Hellwig
2015-12-01  7:37   ` Andreas Herrmann
2015-11-25 19:47 ` Jens Axboe [this message]
2015-12-01  7:43   ` Andreas Herrmann
2016-02-01 22:43 ` [RFC PATCH v2] blk-mq: Introduce per sw queue time-slice Andreas Herrmann
2016-02-01 22:46   ` Andreas Herrmann
2016-02-09 17:12   ` Andreas Herrmann
2016-02-09 17:41     ` Markus Trippelsdorf
2016-02-10 19:34       ` Andreas Herrmann
2016-02-10 19:47         ` Markus Trippelsdorf
2016-02-10 22:09           ` Andreas Herrmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56561049.7000201@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=aherrmann@suse.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).