Re: [PATCH V3 0/5] dm-rq: improve sequential I/O performance

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mike Snitzer <snitzer@redhat.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Bart Van Assche <Bart.VanAssche@wdc.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"axboe@fb.com" <axboe@fb.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Subject: Re: [PATCH V3 0/5]  dm-rq: improve sequential I/O performance
Date: Fri, 12 Jan 2018 12:18:41 -0500	[thread overview]
Message-ID: <20180112171840.GA4541@redhat.com> (raw)
In-Reply-To: <20180112033308.GC25090@ming.t460p>

On Thu, Jan 11 2018 at 10:33pm -0500,
Ming Lei <ming.lei@redhat.com> wrote:

> On Thu, Jan 11, 2018 at 08:57:21PM -0500, Mike Snitzer wrote:
> > On Thu, Jan 11 2018 at  8:42pm -0500,
> > Ming Lei <ming.lei@redhat.com> wrote:
> > 
> > > On Thu, Jan 11, 2018 at 10:37:37PM +0000, Bart Van Assche wrote:
> > > > On Thu, 2018-01-11 at 17:07 -0500, Mike Snitzer wrote:
> > > > > Bart, if for some reason we regress for some workload you're able to
> > > > > more readily test we can deal with it.  But I'm too encouraged by Ming's
> > > > > performance improvements to hold these changes back any further.
> > > > 
> > > > Sorry Mike but I think Ming's measurement results are not sufficient to
> > > > motivate upstreaming of these patches. What I remember from previous versions
> > > > of this patch series is that Ming measured the performance impact of this
> > > > patch series only for the Emulex FC driver (lpfc). That driver limits queue
> > > > depth to 3. Instead of modifying the dm code, that driver needs to be fixed
> > > > such that it allows larger queue depths.
> > > > 
> > > > Additionally, some time ago I had explained in detail why I think that patch
> > > > 2/5 in this series is wrong and why it will introduce fairness regressions
> > > > in multi-LUN workloads. I think we need performance measurements for this
> > > > patch series for at least the following two configurations before this patch
> > > > series can be considered for upstream inclusion:
> > > > * The performance impact of this patch series for SCSI devices with a
> > > >   realistic queue depth (e.g. 64 or 128).
> > > 
> > > Please look at the cover letter or patch 5's commit log, it mentions that
> > > dm-mpath over virtio-scsi is tested, and the default queue depth of virito-scsi
> > > is 128.
> > > 
> > > > * The performance impact for this patch series for a SCSI host with which
> > > >   multiple LUNs are associated and for which the target system often replies
> > > >   with the SCSI "BUSY" status.
> > > 
> > > I don't think this patch is related with this case, this patch just provides the
> > > BUSY feedback from underlying queue to blk-mq via dm-rq.
> > 
> > Hi Ming,
> > 
> > Please see https://www.redhat.com/archives/dm-devel/2017-April/msg00190.html
> > 
> > Specifically:
> > "That patch [commit 6077c2d706] can be considered as a first step that
> > can be refined further, namely by modifying the dm-rq code further such
> > that dm-rq queues are only rerun after the busy condition has been
> > cleared. The patch at the start of this thread is easier to review and
> > easier to test than any patch that would only rerun dm-rq queues after
> > the busy condition has been cleared."
> > 
> > Do you have time to reason through Bart's previous proposal for how to
> > better address the specific issue that was documented to be fixed in the
> > header for commit 6077c2d706 ?
> 
> Hi Mike,
> 
> Recently we fixed one such issue in blk-mq (eb619fdb2d4cb8b3d34), and I
> highly guess that may fix Bart's case.

Wow, kind of staggering that there was no mention of this fix prior to
now.  Seems _very_ relevant (like the real fix).

> Let's see this commit 6077c2d706 again, I don't know the idea behind the
> fix, which just adds random of 100ms, and this delay degrades performance a
> lot, since no hctx can be scheduled any more during the delay.

I'd love to try to reproduce the issue documented in commit
6077c2d706 but sadly I cannot get that srp-test to work on my system.
Just fails with:

# while srp-test/run_tests -d -r 30 -t 02-mq; do :; done
Unloaded the ib_srpt kernel module
Unloaded the rdma_rxe kernel module
Zero-initializing /dev/ram0 ... done
Zero-initializing /dev/ram1 ... done
Configured SRP target driver
Running test srp-test/tests/02-mq ...
Test file I/O on top of multipath concurrently with logout and login (0 min; mq)
SRP login failed
Test srp-test/tests/02-mq failed

Think the login is failing due to target not being setup properly?
Yeap, now from reading this thread, it is clear that srp-test only works
if you have an IB HCA:
https://patchwork.kernel.org/patch/10041381/

> So again it is definitely a workaround, no any reasoning, no any theory, just
> say it is a fix. Are there anyone who can explain the story?
> 
> IMO, the blk_get_request() in dm-mpath is just like a kmalloc(ATOMIC) which
> is used in other blk-mq drivers' IO path too, so don't know why dm-mq is so
> strange to require such ugly 100ms delay.

The header for commit 6077c2d706 notes that same action that Jens took
to unwedge the request stalls (in the patchwork thread I referenced
above), with:

echo run > /sys/kernel/debug/block/nullb1/state
vs what Bart noted in commit 6077c2d706:
echo run >/sys/kernel/debug/block/dm-0/state

Really does feel like the issue Jens' shared tags fix addressed _is_ the
root cause that commit 6077c2d706 worked around.

> So I suggest to remove it, let's see if there are reports, and if there
> are, I am quite confident we can root cause that and fix that.

Yeah, it really is rediculous that we're getting this kind of pushback
for something that was/is so poorly justified.  And especially given
that dm_mq_queue_rq() _still_ has this:

        if (ti->type->busy && ti->type->busy(ti))
                return BLK_STS_RESOURCE;

yet our desire to avoid blk_mq_delay_run_hw_queue(hctx, 100/*ms*/) in
response to blk_get_request() failure, just prior to returning
BLK_STS_RESOURCE in dm_mq_queue_rq(), is met with hellfire.

I refuse to submit to this highly unexpected cargo cult programming.

This is going upstream for 4.16:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=dm-4.16&id=5b18cff4baedde77e0d69bd62a13ae78f9488d89

I've lost patience with the unwillingness to reassess/justify the need
for commit 6077c2d706 ; SO it is just getting removed and we'll debug
and fix any future reported blk-mq stalls (and/or high cpu load) in a
much more precise manner -- provided the reporter is willing to work
with us!

Mike

next prev parent reply	other threads:[~2018-01-12 17:18 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-11  6:01 [PATCH V3 0/5] dm-rq: improve sequential I/O performance Ming Lei
2018-01-11  6:01 ` [PATCH V3 1/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
2018-01-11  6:01 ` [PATCH V3 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure Ming Lei
2018-01-12 19:04   ` Bart Van Assche
2018-01-13  1:29     ` Ming Lei
2018-01-11  6:01 ` [PATCH V3 3/5] blk-mq: move actual issue into one helper Ming Lei
2018-01-11 22:09   ` Mike Snitzer
2018-01-11  6:01 ` [PATCH V3 4/5] blk-mq: return dispatch result to caller in blk_mq_try_issue_directly Ming Lei
2018-01-11 22:10   ` Mike Snitzer
2018-01-11  6:01 ` [PATCH V3 5/5] blk-mq: issue request directly for blk_insert_cloned_request Ming Lei
2018-01-11 22:42   ` Mike Snitzer
2018-01-11 22:07 ` [PATCH V3 0/5] dm-rq: improve sequential I/O performance Mike Snitzer
2018-01-11 22:37   ` Bart Van Assche
2018-01-11 22:58     ` Mike Snitzer
2018-01-11 23:27       ` Bart Van Assche
2018-01-12  1:43         ` Mike Snitzer
2018-01-12  1:42     ` Ming Lei
2018-01-12  1:57       ` Mike Snitzer
2018-01-12  3:33         ` Ming Lei
2018-01-12 17:18           ` Mike Snitzer [this message]
2018-01-12 17:26             ` Bart Van Assche
2018-01-12 17:40               ` Mike Snitzer
2018-01-12 17:46                 ` Bart Van Assche
2018-01-12 18:06                   ` Mike Snitzer
2018-01-12 18:54                     ` Bart Van Assche
2018-01-12 19:29                       ` Mike Snitzer
2018-01-12 19:53                       ` Elliott, Robert (Persistent Memory)
2018-01-13  0:52                         ` Mike Snitzer
2018-01-13  1:00                           ` Bart Van Assche
2018-01-13  1:37                             ` Mike Snitzer
2018-01-13 15:14                               ` Mike Snitzer
2018-01-12 22:31                       ` Mike Snitzer
2018-01-13 15:04                         ` Ming Lei
2018-01-13 15:10                           ` Mike Snitzer
2018-01-12 23:17                       ` Mike Snitzer
2018-01-12 23:42                         ` Bart Van Assche
2018-01-13  0:45                           ` Mike Snitzer
2018-01-13 14:34                       ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180112171840.GA4541@redhat.com \
    --to=snitzer@redhat.com \
    --cc=Bart.VanAssche@wdc.com \
    --cc=axboe@fb.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).