linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Laurence Oberman <loberman@redhat.com>
To: Ming Lei <ming.lei@redhat.com>, Jens Axboe <axboe@fb.com>,
	linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Subject: Re: [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance
Date: Mon, 7 Aug 2017 08:48:14 -0400	[thread overview]
Message-ID: <df64b15d-a443-553c-a3c6-d834320648fd@redhat.com> (raw)
In-Reply-To: <20170805065705.12989-1-ming.lei@redhat.com>



On 08/05/2017 02:56 AM, Ming Lei wrote:
> In Red Hat internal storage test wrt. blk-mq scheduler, we
> found that I/O performance is much bad with mq-deadline, especially
> about sequential I/O on some multi-queue SCSI devcies(lpfc, qla2xxx,
> SRP...)
> 
> Turns out one big issue causes the performance regression: requests
> are still dequeued from sw queue/scheduler queue even when ldd's
> queue is busy, so I/O merge becomes quite difficult to make, then
> sequential IO degrades a lot.
> 
> The 1st five patches improve this situation, and brings back
> some performance loss.
> 
> But looks they are still not enough. It is caused by
> the shared queue depth among all hw queues. For SCSI devices,
> .cmd_per_lun defines the max number of pending I/O on one
> request queue, which is per-request_queue depth. So during
> dispatch, if one hctx is too busy to move on, all hctxs can't
> dispatch too because of the per-request_queue depth.
> 
> Patch 6 ~ 14 use per-request_queue dispatch list to avoid
> to dequeue requests from sw/scheduler queue when lld queue
> is busy.
> 
> Patch 15 ~20 improve bio merge via hash table in sw queue,
> which makes bio merge more efficient than current approch
> in which only the last 8 requests are checked. Since patch
> 6~14 converts to the scheduler way of dequeuing one request
> from sw queue one time for SCSI device, and the times of
> acquring ctx->lock is increased, and merging bio via hash
> table decreases holding time of ctx->lock and should eliminate
> effect from patch 14.
> 
> With this changes, SCSI-MQ sequential I/O performance is
> improved much, for lpfc, it is basically brought back
> compared with block legacy path[1], especially mq-deadline
> is improved by > X10 [1] on lpfc and by > 3X on SCSI SRP,
> For mq-none it is improved by 10% on lpfc, and write is
> improved by > 10% on SRP too.
> 
> Also Bart worried that this patchset may affect SRP, so provide
> test data on SCSI SRP this time:
> 
> - fio(libaio, bs:4k, dio, queue_depth:64, 64 jobs)
> - system(16 cores, dual sockets, mem: 96G)
> 
>                |v4.13-rc3     |v4.13-rc3     | v4.13-rc3+patches |
>                |blk-legacy dd |blk-mq none   | blk-mq none  |
> -----------------------------------------------------------|
> read     :iops|         587K |         526K |         537K |
> randread :iops|         115K |         140K |         139K |
> write    :iops|         596K |         519K |         602K |
> randwrite:iops|         103K |         122K |         120K |
> 
> 
>                |v4.13-rc3     |v4.13-rc3     | v4.13-rc3+patches
>                |blk-legacy dd |blk-mq dd     | blk-mq dd    |
> ------------------------------------------------------------
> read     :iops|         587K |         155K |         522K |
> randread :iops|         115K |         140K |         141K |
> write    :iops|         596K |         135K |         587K |
> randwrite:iops|         103K |         120K |         118K |
> 
> V2:
> 	- dequeue request from sw queues in round roubin's style
> 	as suggested by Bart, and introduces one helper in sbitmap
> 	for this purpose
> 	- improve bio merge via hash table from sw queue
> 	- add comments about using DISPATCH_BUSY state in lockless way,
> 	simplifying handling on busy state,
> 	- hold ctx->lock when clearing ctx busy bit as suggested
> 	by Bart
> 
> 
> [1] http://marc.info/?l=linux-block&m=150151989915776&w=2
> 
> Ming Lei (20):
>    blk-mq-sched: fix scheduler bad performance
>    sbitmap: introduce __sbitmap_for_each_set()
>    blk-mq: introduce blk_mq_dispatch_rq_from_ctx()
>    blk-mq-sched: move actual dispatching into one helper
>    blk-mq-sched: improve dispatching from sw queue
>    blk-mq-sched: don't dequeue request until all in ->dispatch are
>      flushed
>    blk-mq-sched: introduce blk_mq_sched_queue_depth()
>    blk-mq-sched: use q->queue_depth as hint for q->nr_requests
>    blk-mq: introduce BLK_MQ_F_SHARED_DEPTH
>    blk-mq-sched: introduce helpers for query, change busy state
>    blk-mq: introduce helpers for operating ->dispatch list
>    blk-mq: introduce pointers to dispatch lock & list
>    blk-mq: pass 'request_queue *' to several helpers of operating BUSY
>    blk-mq-sched: improve IO scheduling on SCSI devcie
>    block: introduce rqhash helpers
>    block: move actual bio merge code into __elv_merge
>    block: add check on elevator for supporting bio merge via hashtable
>      from blk-mq sw queue
>    block: introduce .last_merge and .hash to blk_mq_ctx
>    blk-mq-sched: refactor blk_mq_sched_try_merge()
>    blk-mq: improve bio merge from blk-mq sw queue
> 
>   block/blk-mq-debugfs.c  |  12 ++--
>   block/blk-mq-sched.c    | 187 +++++++++++++++++++++++++++++-------------------
>   block/blk-mq-sched.h    |  23 ++++++
>   block/blk-mq.c          | 133 +++++++++++++++++++++++++++++++---
>   block/blk-mq.h          |  73 +++++++++++++++++++
>   block/blk-settings.c    |   2 +
>   block/blk.h             |  55 ++++++++++++++
>   block/elevator.c        |  93 ++++++++++++++----------
>   include/linux/blk-mq.h  |   5 ++
>   include/linux/blkdev.h  |   5 ++
>   include/linux/sbitmap.h |  54 ++++++++++----
>   11 files changed, 504 insertions(+), 138 deletions(-)
> 

Hello

I tested this series using Ming's tests as well as my own set of tests 
typically run against changes to upstream code in my SRP test-bed.
My tests also include very large sequential buffered and un-buffered I/O.

This series seems to be fine for me. I did uncover another issue that is 
unrelated to these patches and also exists in 4.13-RC3 generic that I am 
still debugging.

For what its worth:
Tested-by: Laurence Oberman <loberman@redhat.com>

  parent reply	other threads:[~2017-08-07 12:48 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-05  6:56 [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance Ming Lei
2017-08-05  6:56 ` [PATCH V2 01/20] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-08-09  0:11   ` Omar Sandoval
2017-08-09  2:32     ` Ming Lei
2017-08-09  7:11       ` Omar Sandoval
2017-08-21  8:18         ` Ming Lei
2017-08-23  7:48         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 02/20] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
2017-08-22 18:28   ` Bart Van Assche
2017-08-24  3:57     ` Ming Lei
2017-08-25 21:36       ` Bart Van Assche
2017-08-26  8:43         ` Ming Lei
2017-08-22 18:37   ` Bart Van Assche
2017-08-24  4:02     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 03/20] blk-mq: introduce blk_mq_dispatch_rq_from_ctx() Ming Lei
2017-08-22 18:45   ` Bart Van Assche
2017-08-24  4:52     ` Ming Lei
2017-08-25 21:41       ` Bart Van Assche
2017-08-26  8:47         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 04/20] blk-mq-sched: move actual dispatching into one helper Ming Lei
2017-08-22 19:50   ` Bart Van Assche
2017-08-05  6:56 ` [PATCH V2 05/20] blk-mq-sched: improve dispatching from sw queue Ming Lei
2017-08-22 19:55   ` Bart Van Assche
2017-08-23 19:58     ` Jens Axboe
2017-08-24  5:52     ` Ming Lei
2017-08-22 20:57   ` Bart Van Assche
2017-08-24  6:12     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 06/20] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
2017-08-22 20:09   ` Bart Van Assche
2017-08-24  6:18     ` Ming Lei
2017-08-23 19:56   ` Jens Axboe
2017-08-24  6:38     ` Ming Lei
2017-08-25 10:19       ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 07/20] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
2017-08-22 20:10   ` Bart Van Assche
2017-08-05  6:56 ` [PATCH V2 08/20] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
2017-08-22 20:20   ` Bart Van Assche
2017-08-24  6:39     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 09/20] blk-mq: introduce BLK_MQ_F_SHARED_DEPTH Ming Lei
2017-08-22 21:55   ` Bart Van Assche
2017-08-23  6:46     ` Hannes Reinecke
2017-08-24  6:52     ` Ming Lei
2017-08-25 22:23       ` Bart Van Assche
2017-08-26  8:53         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 10/20] blk-mq-sched: introduce helpers for query, change busy state Ming Lei
2017-08-22 20:41   ` Bart Van Assche
2017-08-23 20:02     ` Jens Axboe
2017-08-24  6:55       ` Ming Lei
2017-08-24  6:54     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 11/20] blk-mq: introduce helpers for operating ->dispatch list Ming Lei
2017-08-22 20:43   ` Bart Van Assche
2017-08-24  0:59     ` Damien Le Moal
2017-08-24  7:10       ` Ming Lei
2017-08-24  7:42         ` Damien Le Moal
2017-08-24  6:57     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 12/20] blk-mq: introduce pointers to dispatch lock & list Ming Lei
2017-08-05  6:56 ` [PATCH V2 13/20] blk-mq: pass 'request_queue *' to several helpers of operating BUSY Ming Lei
2017-08-05  6:56 ` [PATCH V2 14/20] blk-mq-sched: improve IO scheduling on SCSI devcie Ming Lei
2017-08-22 20:51   ` Bart Van Assche
2017-08-24  7:14     ` Ming Lei
2017-08-05  6:57 ` [PATCH V2 15/20] block: introduce rqhash helpers Ming Lei
2017-08-05  6:57 ` [PATCH V2 16/20] block: move actual bio merge code into __elv_merge Ming Lei
2017-08-05  6:57 ` [PATCH V2 17/20] block: add check on elevator for supporting bio merge via hashtable from blk-mq sw queue Ming Lei
2017-08-05  6:57 ` [PATCH V2 18/20] block: introduce .last_merge and .hash to blk_mq_ctx Ming Lei
2017-08-05  6:57 ` [PATCH V2 19/20] blk-mq-sched: refactor blk_mq_sched_try_merge() Ming Lei
2017-08-05  6:57 ` [PATCH V2 20/20] blk-mq: improve bio merge from blk-mq sw queue Ming Lei
2017-08-07 12:48 ` Laurence Oberman [this message]
2017-08-07 15:27   ` [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance Bart Van Assche
2017-08-07 17:29     ` Laurence Oberman
2017-08-07 18:46       ` Laurence Oberman
2017-08-07 19:46         ` Laurence Oberman
2017-08-07 23:04       ` Ming Lei
     [not found]   ` <CAFfF4qv3W6D-j8BSSZbwPLqhd_mmwk8CZQe7dSqud8cMMd2yPg@mail.gmail.com>
2017-08-07 22:29     ` Bart Van Assche
2017-08-07 23:17     ` Ming Lei
2017-08-08 13:41     ` Ming Lei
2017-08-08 13:58       ` Laurence Oberman
2017-08-08  8:09 ` Paolo Valente
2017-08-08  9:09   ` Ming Lei
2017-08-08  9:13     ` Paolo Valente
2017-08-11  8:11 ` Christoph Hellwig
2017-08-11 14:25   ` James Bottomley
2017-08-23 16:12 ` Bart Van Assche
2017-08-23 16:15   ` Jens Axboe
2017-08-23 16:24     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df64b15d-a443-553c-a3c6-d834320648fd@redhat.com \
    --to=loberman@redhat.com \
    --cc=axboe@fb.com \
    --cc=bart.vanassche@sandisk.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).