linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Herrmann <aherrmann@suse.com>
To: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	linux-kernel@vger.kernel.org,
	Johannes Thumshirn <jthumshirn@suse.de>, Jan Kara <jack@suse.cz>,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	Hannes Reinecke <hare@suse.de>
Subject: Re: [RFC PATCH v2] blk-mq: Introduce per sw queue time-slice
Date: Wed, 10 Feb 2016 23:09:19 +0100	[thread overview]
Message-ID: <20160210220919.GA16029@suselix.suse.de> (raw)
In-Reply-To: <20160210194715.GA324@x4>

On Wed, Feb 10, 2016 at 08:47:15PM +0100, Markus Trippelsdorf wrote:
> On 2016.02.10 at 20:34 +0100, Andreas Herrmann wrote:
> > On Tue, Feb 09, 2016 at 06:41:56PM +0100, Markus Trippelsdorf wrote:
> > > > Recently Johannes sent a patch to enable scsi-mq per driver, see
> > > > http://marc.info/?l=linux-scsi&m=145347009631192&w=2
> > > > 
> > > > Probably that is a good solution (at least in the short term) to allow
> > > > users to switch to blk-mq for some host adapters (with fast storage
> > > > attached) but to stick to legacy stuff on other host adapters with
> > > > rotary devices.
> > > 
> > > I don't think that Johannes' patch is a good solution.
> > 
> > Why? Because it's not per device?
> 
> Yes. Like Christoph said in his reply to the patch: »The host is simply
> the wrong place to decide these things.«
> 
> > > The best solution for the user would be if blk-mq could be toggled
> > > per drive (or even automatically enabled if queue/rotational == 0).
> > 
> > Yes, I aggree, but ...
> > 
> > > Is there a fundamental reason why this is not feasible?
> > 
> > ... it's not possible (*) with the current implementation.
> > 
> > Tag handling/command allocation differs. Respective functions are set
> > per host.
> > 
> > (*) Or maybe it's possible but just hard to achieve and I didn't look
> > long enough into relevant code to get an idea how to do it.
> > 
> > > Your solution is better than nothing, but it requires that the user
> > > finds out the drive <=> host mapping by hand and then runs something
> > > like: 
> > > echo "250" > /sys/devices/pci0000:00/0000:00:11.0/ata2/host1/target1:0:0/1:0:0:0/block/sdb/mq/0/time_slice_us
> > > during boot for spinning rust drives...
> > 
> > Or it could automatically be set in case of rotational device.
> > (Once we know for sure that it doesn't cause performance degradation.)
> 
> Yes, this sound like a good idea.
> 
> But, if I understand things correctly, your patch is only an interim
> solution until proper I/O scheduler support gets implemented for blk-mq, no?

That's to be discussed. (Hence the RFC)

My (potentially wrong) claims are

- I don't think that fast storage (e.g. SSDs) requires I/O scheduler
  support with blk-mq. blk-mq is very good at pushing a large number
  of requests from per CPU sw queues to hw queue(s). Why then
  introduce any overhead for I/O scheduler support?

- Slow storage (e.g. spinning drives) is fine with the old code which
  provides scheduler support and I doubt that there is any benefit for
  those devices when switching to blk-mq.

- The big hammer (scsi_mod.use_blk_mq) for the entire scsi stack to
  decide what to use is suboptimal. You can't have optimal performance
  when you have both slow and fast storage devices in your system.

I doubt that it is possible to add I/O scheduling support to blk-mq
which can be on par with what CFQ is able to achieve for slow devices
at the moment.

Requests are scattered among per-CPU software queues (and almost
instantly passed to hardware queue(s)). Due to CPU scheduling,
requests initiated from one process might come down via different
software queues. What is an efficient way to sort/merge requests from
all the software queues in such a way that the result is comparable to
what CFQ does (assuming that CFQ provides optimal performance)? So far
I didn't find a solution to this problem. (I just have this patch
which adds not too much overhead and improves the situation a little
bit.)

Maybe the solution is to avoid per-CPU queues for slow storage and
fall back to a set of queues comparable to what CFQ uses.

One way to do this is by falling back to non-blk-mq code and direct
use of CFQ.

Code that allows to select blk-mq per host would help to some
extent. But when you have both device types connected to the same host
adapter it doesn't help either.


Andreas

      reply	other threads:[~2016-02-10 22:09 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20151119120235.GA7966@suselix.suse.de>
     [not found] ` <20160201224340.GA16639@suselix.suse.de>
2016-02-09 17:12   ` [RFC PATCH v2] blk-mq: Introduce per sw queue time-slice Andreas Herrmann
2016-02-09 17:41     ` Markus Trippelsdorf
2016-02-10 19:34       ` Andreas Herrmann
2016-02-10 19:47         ` Markus Trippelsdorf
2016-02-10 22:09           ` Andreas Herrmann [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160210220919.GA16029@suselix.suse.de \
    --to=aherrmann@suse.com \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jthumshirn@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=markus@trippelsdorf.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).