Re: [LSF/MM ATTEND] multipath redesign and dm blk-mq issues

dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: "Benjamin Marzinski" <bmarzins@redhat.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: linux-block@vger.kernel.org, dm-devel@redhat.com,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [LSF/MM ATTEND] multipath redesign and dm blk-mq issues
Date: Thu, 28 Jan 2016 19:33:16 -0600	[thread overview]
Message-ID: <20160129013316.GY24960@octiron.msp.redhat.com> (raw)
In-Reply-To: <20160128223732.GA7060@redhat.com>

On Thu, Jan 28, 2016 at 05:37:33PM -0500, Mike Snitzer wrote:
> On Thu, Jan 28 2016 at  4:23pm -0500,
> Benjamin Marzinski <bmarzins@redhat.com> wrote:
> 
> > I'd like to attend LSF/MM 2016 to participate in any discussions about 
> > redesigning how device-mapper multipath operates. I spend a significant
> > chunk of time dealing with issues around multipath and I'd like to
> > be part of any discussion about redesigning it.
> > 
> > In addition, I'd be interesting in disucssions that deal with how
> > device-mapper targets are dealing with blk-mq in general.  For instance,
> > it looks like the current dm-multipath blk-mq implementation is running
> > into performance bottlenecks, and changing how path selection works into
> > something that allows for more parallelism is a worthy discussion.
> 
> At this point this isn't the sexy topic we'd like it to be -- not too
> sure how a 30 minute session on this will go.  The devil is really in
> the details.  Hopefully we can have more details once LSF rolls around
> to make an in-person discussion productive.
> 
> I've spent the past few days working on this and while there are
> certainly various questions it is pretty clear that DM multipath's
> m->lock (spinlock) is really _not_ a big bottleneck.  It is an obvious
> one for sure, but I removed the spinlock entirely (debug only) and then
> the 'perf report -g' was completely benign -- no obvious bottlenecks.
> Yet DM mpath performance on a really fast null_blk device, ~1850K read
> IOPs, was still only ~950K -- as Jens rightly pointed out to me today:
> 
> "sure, it's slower but taking a step back, it's about making sure we
> have a pretty low overhead, so actual application workloads don't spend
> a lot of time in the kernel
> 
> ~1M IOPS is a _lot_".
> 
> But even still, DM mpath is dropping 50% of potential IOPs on the floor.
> There must be something inherently limiting in all the extra work done
> to: 1) stack blk-mq devices (2 completely different sw -> hw mappings)
> 2) clone top-level blk-mq requests for submission on the underlying
> blk-mq paths.
> 
> Anyway, my goal is to have my contribution to this LSF session be all
> about what was wrong and how it has been fixed ;)
> 
> But given how much harder analyzing this problem has become I'm less
> encouraged I'll be able to do so.
> 
> > But it would also be worth looking into changes about how the dm blk-mq
> > impementation deals with the mapping between it's swqueues and
> > hwqueue(s). Right now all the dm mapping is done in .queue_rq, instead
> > of in .map_queue, but I'm not convinced it belongs there.
> 
> blk-mq's .queue_rq hook is the logical place to do the mpath mapping, as
> it deals with getting a request from the underlying paths.
> 
> blk-mq's .map_queue is all about mapping sw to hw queues.  It is very
> blk-mq specific and isn't something DM has a roll in -- cannot yet see
> why it'd need to.

At the moment, we only have one hwqueue.  But we could have one hwqueue
per path. Then queue_rq would just be in charge of handing the requst
down to the underlying device.  In that setup, instead using a default
mapping of all swqueues to one hwqueue in .map_queue, we would be
mapping to the hardware queue for the path.  I'd have to look through
the blk-mq code more to know if one of these methods has an obvious
advantage, but it seems like this way, if different cpus were using
different paths (with the per-cpu load-balancing), you wouldn't
constantly be accessing the hwqueue from different cpus. Although I
suppose you may do better just by leaving multipath_map where it is now,
and just adjusting the number of hardware queues. Speaking of which,
have you tried fiddling around with that in your tests?

> > There's also the issue that the bio targets may scale better on blk-mq
> > devices than the blk-mq targets.
> 
> Why is that surprising?  request-based DM (and block core) has quite a
> bit more work that it does.
> 
> bio-based DM targets take a ~20% IOPs hit, whereas blk-mq request-based
> DM takes a ~50% hit.  I'd _love_ for request-based DM to get to only a
> ~20% hit.  (And for the bio-based 20% hit to be reduced further).

Right. But like I said in an earlier email, if bio-based mpath would
give us better performance on this class of devices, then all the blk-mq
performance work helps both multipath and the other targets. I realize
that bio based multipath had issues other than simply IO performance
that caused us to switch, like a lack of good error information.  But if
the performance gap between request-based and bio-based dm persists for
blk-mq devices (even assuming both improve), then we should at least
revist the issues with bio-based multipath to see which set of problems
looks easiest to tackle.

-Ben

next prev parent reply	other threads:[~2016-01-29  1:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-28 21:23 [LSF/MM ATTEND] multipath redesign and dm blk-mq issues Benjamin Marzinski
2016-01-28 22:37 ` Mike Snitzer
2016-01-29  1:33   ` Benjamin Marzinski [this message]
2016-01-29  2:11     ` Benjamin Marzinski
2016-01-29  2:48       ` Mike Snitzer
2016-01-29  6:59 ` Hannes Reinecke
2016-01-29 15:34   ` Benjamin Marzinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160129013316.GY24960@octiron.msp.redhat.com \
    --to=bmarzins@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).