From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Snitzer Subject: Re: [LSF/MM ATTEND] multipath redesign and dm blk-mq issues Date: Thu, 28 Jan 2016 17:37:33 -0500 Message-ID: <20160128223732.GA7060@redhat.com> References: <20160128212315.GX24960@octiron.msp.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20160128212315.GX24960@octiron.msp.redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: Benjamin Marzinski Cc: linux-block@vger.kernel.org, dm-devel@redhat.com, lsf-pc@lists.linux-foundation.org List-Id: dm-devel.ids On Thu, Jan 28 2016 at 4:23pm -0500, Benjamin Marzinski wrote: > I'd like to attend LSF/MM 2016 to participate in any discussions about > redesigning how device-mapper multipath operates. I spend a significant > chunk of time dealing with issues around multipath and I'd like to > be part of any discussion about redesigning it. > > In addition, I'd be interesting in disucssions that deal with how > device-mapper targets are dealing with blk-mq in general. For instance, > it looks like the current dm-multipath blk-mq implementation is running > into performance bottlenecks, and changing how path selection works into > something that allows for more parallelism is a worthy discussion. At this point this isn't the sexy topic we'd like it to be -- not too sure how a 30 minute session on this will go. The devil is really in the details. Hopefully we can have more details once LSF rolls around to make an in-person discussion productive. I've spent the past few days working on this and while there are certainly various questions it is pretty clear that DM multipath's m->lock (spinlock) is really _not_ a big bottleneck. It is an obvious one for sure, but I removed the spinlock entirely (debug only) and then the 'perf report -g' was completely benign -- no obvious bottlenecks. Yet DM mpath performance on a really fast null_blk device, ~1850K read IOPs, was still only ~950K -- as Jens rightly pointed out to me today: "sure, it's slower but taking a step back, it's about making sure we have a pretty low overhead, so actual application workloads don't spend a lot of time in the kernel ~1M IOPS is a _lot_". But even still, DM mpath is dropping 50% of potential IOPs on the floor. There must be something inherently limiting in all the extra work done to: 1) stack blk-mq devices (2 completely different sw -> hw mappings) 2) clone top-level blk-mq requests for submission on the underlying blk-mq paths. Anyway, my goal is to have my contribution to this LSF session be all about what was wrong and how it has been fixed ;) But given how much harder analyzing this problem has become I'm less encouraged I'll be able to do so. > But it would also be worth looking into changes about how the dm blk-mq > impementation deals with the mapping between it's swqueues and > hwqueue(s). Right now all the dm mapping is done in .queue_rq, instead > of in .map_queue, but I'm not convinced it belongs there. blk-mq's .queue_rq hook is the logical place to do the mpath mapping, as it deals with getting a request from the underlying paths. blk-mq's .map_queue is all about mapping sw to hw queues. It is very blk-mq specific and isn't something DM has a roll in -- cannot yet see why it'd need to. > There's also the issue that the bio targets may scale better on blk-mq > devices than the blk-mq targets. Why is that surprising? request-based DM (and block core) has quite a bit more work that it does. bio-based DM targets take a ~20% IOPs hit, whereas blk-mq request-based DM takes a ~50% hit. I'd _love_ for request-based DM to get to only a ~20% hit. (And for the bio-based 20% hit to be reduced further).