From: Laurence Oberman <loberman@redhat.com>
To: Ming Lei <ming.lei@redhat.com>, Jens Axboe <axboe@fb.com>,
linux-block@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Subject: Re: [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance
Date: Mon, 7 Aug 2017 08:48:14 -0400 [thread overview]
Message-ID: <df64b15d-a443-553c-a3c6-d834320648fd@redhat.com> (raw)
In-Reply-To: <20170805065705.12989-1-ming.lei@redhat.com>
On 08/05/2017 02:56 AM, Ming Lei wrote:
> In Red Hat internal storage test wrt. blk-mq scheduler, we
> found that I/O performance is much bad with mq-deadline, especially
> about sequential I/O on some multi-queue SCSI devcies(lpfc, qla2xxx,
> SRP...)
>
> Turns out one big issue causes the performance regression: requests
> are still dequeued from sw queue/scheduler queue even when ldd's
> queue is busy, so I/O merge becomes quite difficult to make, then
> sequential IO degrades a lot.
>
> The 1st five patches improve this situation, and brings back
> some performance loss.
>
> But looks they are still not enough. It is caused by
> the shared queue depth among all hw queues. For SCSI devices,
> .cmd_per_lun defines the max number of pending I/O on one
> request queue, which is per-request_queue depth. So during
> dispatch, if one hctx is too busy to move on, all hctxs can't
> dispatch too because of the per-request_queue depth.
>
> Patch 6 ~ 14 use per-request_queue dispatch list to avoid
> to dequeue requests from sw/scheduler queue when lld queue
> is busy.
>
> Patch 15 ~20 improve bio merge via hash table in sw queue,
> which makes bio merge more efficient than current approch
> in which only the last 8 requests are checked. Since patch
> 6~14 converts to the scheduler way of dequeuing one request
> from sw queue one time for SCSI device, and the times of
> acquring ctx->lock is increased, and merging bio via hash
> table decreases holding time of ctx->lock and should eliminate
> effect from patch 14.
>
> With this changes, SCSI-MQ sequential I/O performance is
> improved much, for lpfc, it is basically brought back
> compared with block legacy path[1], especially mq-deadline
> is improved by > X10 [1] on lpfc and by > 3X on SCSI SRP,
> For mq-none it is improved by 10% on lpfc, and write is
> improved by > 10% on SRP too.
>
> Also Bart worried that this patchset may affect SRP, so provide
> test data on SCSI SRP this time:
>
> - fio(libaio, bs:4k, dio, queue_depth:64, 64 jobs)
> - system(16 cores, dual sockets, mem: 96G)
>
> |v4.13-rc3 |v4.13-rc3 | v4.13-rc3+patches |
> |blk-legacy dd |blk-mq none | blk-mq none |
> -----------------------------------------------------------|
> read :iops| 587K | 526K | 537K |
> randread :iops| 115K | 140K | 139K |
> write :iops| 596K | 519K | 602K |
> randwrite:iops| 103K | 122K | 120K |
>
>
> |v4.13-rc3 |v4.13-rc3 | v4.13-rc3+patches
> |blk-legacy dd |blk-mq dd | blk-mq dd |
> ------------------------------------------------------------
> read :iops| 587K | 155K | 522K |
> randread :iops| 115K | 140K | 141K |
> write :iops| 596K | 135K | 587K |
> randwrite:iops| 103K | 120K | 118K |
>
> V2:
> - dequeue request from sw queues in round roubin's style
> as suggested by Bart, and introduces one helper in sbitmap
> for this purpose
> - improve bio merge via hash table from sw queue
> - add comments about using DISPATCH_BUSY state in lockless way,
> simplifying handling on busy state,
> - hold ctx->lock when clearing ctx busy bit as suggested
> by Bart
>
>
> [1] http://marc.info/?l=linux-block&m=150151989915776&w=2
>
> Ming Lei (20):
> blk-mq-sched: fix scheduler bad performance
> sbitmap: introduce __sbitmap_for_each_set()
> blk-mq: introduce blk_mq_dispatch_rq_from_ctx()
> blk-mq-sched: move actual dispatching into one helper
> blk-mq-sched: improve dispatching from sw queue
> blk-mq-sched: don't dequeue request until all in ->dispatch are
> flushed
> blk-mq-sched: introduce blk_mq_sched_queue_depth()
> blk-mq-sched: use q->queue_depth as hint for q->nr_requests
> blk-mq: introduce BLK_MQ_F_SHARED_DEPTH
> blk-mq-sched: introduce helpers for query, change busy state
> blk-mq: introduce helpers for operating ->dispatch list
> blk-mq: introduce pointers to dispatch lock & list
> blk-mq: pass 'request_queue *' to several helpers of operating BUSY
> blk-mq-sched: improve IO scheduling on SCSI devcie
> block: introduce rqhash helpers
> block: move actual bio merge code into __elv_merge
> block: add check on elevator for supporting bio merge via hashtable
> from blk-mq sw queue
> block: introduce .last_merge and .hash to blk_mq_ctx
> blk-mq-sched: refactor blk_mq_sched_try_merge()
> blk-mq: improve bio merge from blk-mq sw queue
>
> block/blk-mq-debugfs.c | 12 ++--
> block/blk-mq-sched.c | 187 +++++++++++++++++++++++++++++-------------------
> block/blk-mq-sched.h | 23 ++++++
> block/blk-mq.c | 133 +++++++++++++++++++++++++++++++---
> block/blk-mq.h | 73 +++++++++++++++++++
> block/blk-settings.c | 2 +
> block/blk.h | 55 ++++++++++++++
> block/elevator.c | 93 ++++++++++++++----------
> include/linux/blk-mq.h | 5 ++
> include/linux/blkdev.h | 5 ++
> include/linux/sbitmap.h | 54 ++++++++++----
> 11 files changed, 504 insertions(+), 138 deletions(-)
>
Hello
I tested this series using Ming's tests as well as my own set of tests
typically run against changes to upstream code in my SRP test-bed.
My tests also include very large sequential buffered and un-buffered I/O.
This series seems to be fine for me. I did uncover another issue that is
unrelated to these patches and also exists in 4.13-RC3 generic that I am
still debugging.
For what its worth:
Tested-by: Laurence Oberman <loberman@redhat.com>
next prev parent reply other threads:[~2017-08-07 12:48 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-05 6:56 [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance Ming Lei
2017-08-05 6:56 ` [PATCH V2 01/20] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-08-09 0:11 ` Omar Sandoval
2017-08-09 2:32 ` Ming Lei
2017-08-09 7:11 ` Omar Sandoval
2017-08-21 8:18 ` Ming Lei
2017-08-23 7:48 ` Ming Lei
2017-08-05 6:56 ` [PATCH V2 02/20] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
2017-08-22 18:28 ` Bart Van Assche
2017-08-24 3:57 ` Ming Lei
2017-08-25 21:36 ` Bart Van Assche
2017-08-26 8:43 ` Ming Lei
2017-08-22 18:37 ` Bart Van Assche
2017-08-24 4:02 ` Ming Lei
2017-08-05 6:56 ` [PATCH V2 03/20] blk-mq: introduce blk_mq_dispatch_rq_from_ctx() Ming Lei
2017-08-22 18:45 ` Bart Van Assche
2017-08-24 4:52 ` Ming Lei
2017-08-25 21:41 ` Bart Van Assche
2017-08-26 8:47 ` Ming Lei
2017-08-05 6:56 ` [PATCH V2 04/20] blk-mq-sched: move actual dispatching into one helper Ming Lei
2017-08-22 19:50 ` Bart Van Assche
2017-08-05 6:56 ` [PATCH V2 05/20] blk-mq-sched: improve dispatching from sw queue Ming Lei
2017-08-22 19:55 ` Bart Van Assche
2017-08-23 19:58 ` Jens Axboe
2017-08-24 5:52 ` Ming Lei
2017-08-22 20:57 ` Bart Van Assche
2017-08-24 6:12 ` Ming Lei
2017-08-05 6:56 ` [PATCH V2 06/20] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
2017-08-22 20:09 ` Bart Van Assche
2017-08-24 6:18 ` Ming Lei
2017-08-23 19:56 ` Jens Axboe
2017-08-24 6:38 ` Ming Lei
2017-08-25 10:19 ` Ming Lei
2017-08-05 6:56 ` [PATCH V2 07/20] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
2017-08-22 20:10 ` Bart Van Assche
2017-08-05 6:56 ` [PATCH V2 08/20] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
2017-08-22 20:20 ` Bart Van Assche
2017-08-24 6:39 ` Ming Lei
2017-08-05 6:56 ` [PATCH V2 09/20] blk-mq: introduce BLK_MQ_F_SHARED_DEPTH Ming Lei
2017-08-22 21:55 ` Bart Van Assche
2017-08-23 6:46 ` Hannes Reinecke
2017-08-24 6:52 ` Ming Lei
2017-08-25 22:23 ` Bart Van Assche
2017-08-26 8:53 ` Ming Lei
2017-08-05 6:56 ` [PATCH V2 10/20] blk-mq-sched: introduce helpers for query, change busy state Ming Lei
2017-08-22 20:41 ` Bart Van Assche
2017-08-23 20:02 ` Jens Axboe
2017-08-24 6:55 ` Ming Lei
2017-08-24 6:54 ` Ming Lei
2017-08-05 6:56 ` [PATCH V2 11/20] blk-mq: introduce helpers for operating ->dispatch list Ming Lei
2017-08-22 20:43 ` Bart Van Assche
2017-08-24 0:59 ` Damien Le Moal
2017-08-24 7:10 ` Ming Lei
2017-08-24 7:42 ` Damien Le Moal
2017-08-24 6:57 ` Ming Lei
2017-08-05 6:56 ` [PATCH V2 12/20] blk-mq: introduce pointers to dispatch lock & list Ming Lei
2017-08-05 6:56 ` [PATCH V2 13/20] blk-mq: pass 'request_queue *' to several helpers of operating BUSY Ming Lei
2017-08-05 6:56 ` [PATCH V2 14/20] blk-mq-sched: improve IO scheduling on SCSI devcie Ming Lei
2017-08-22 20:51 ` Bart Van Assche
2017-08-24 7:14 ` Ming Lei
2017-08-05 6:57 ` [PATCH V2 15/20] block: introduce rqhash helpers Ming Lei
2017-08-05 6:57 ` [PATCH V2 16/20] block: move actual bio merge code into __elv_merge Ming Lei
2017-08-05 6:57 ` [PATCH V2 17/20] block: add check on elevator for supporting bio merge via hashtable from blk-mq sw queue Ming Lei
2017-08-05 6:57 ` [PATCH V2 18/20] block: introduce .last_merge and .hash to blk_mq_ctx Ming Lei
2017-08-05 6:57 ` [PATCH V2 19/20] blk-mq-sched: refactor blk_mq_sched_try_merge() Ming Lei
2017-08-05 6:57 ` [PATCH V2 20/20] blk-mq: improve bio merge from blk-mq sw queue Ming Lei
2017-08-07 12:48 ` Laurence Oberman [this message]
2017-08-07 15:27 ` [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance Bart Van Assche
2017-08-07 17:29 ` Laurence Oberman
2017-08-07 18:46 ` Laurence Oberman
2017-08-07 19:46 ` Laurence Oberman
2017-08-07 23:04 ` Ming Lei
[not found] ` <CAFfF4qv3W6D-j8BSSZbwPLqhd_mmwk8CZQe7dSqud8cMMd2yPg@mail.gmail.com>
2017-08-07 22:29 ` Bart Van Assche
2017-08-07 23:17 ` Ming Lei
2017-08-08 13:41 ` Ming Lei
2017-08-08 13:58 ` Laurence Oberman
2017-08-08 8:09 ` Paolo Valente
2017-08-08 9:09 ` Ming Lei
2017-08-08 9:13 ` Paolo Valente
2017-08-11 8:11 ` Christoph Hellwig
2017-08-11 14:25 ` James Bottomley
2017-08-23 16:12 ` Bart Van Assche
2017-08-23 16:15 ` Jens Axboe
2017-08-23 16:24 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=df64b15d-a443-553c-a3c6-d834320648fd@redhat.com \
--to=loberman@redhat.com \
--cc=axboe@fb.com \
--cc=bart.vanassche@sandisk.com \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).