All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Bart Van Assche <bart.vanassche@sandisk.com>,
	Laurence Oberman <loberman@redhat.com>
Subject: Re: [PATCH V2 06/20] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
Date: Fri, 25 Aug 2017 18:19:46 +0800	[thread overview]
Message-ID: <20170825101944.GA13872@ming.t460p> (raw)
In-Reply-To: <20170824063835.GH12966@ming.t460p>

On Thu, Aug 24, 2017 at 02:38:36PM +0800, Ming Lei wrote:
> On Wed, Aug 23, 2017 at 01:56:50PM -0600, Jens Axboe wrote:
> > On Sat, Aug 05 2017, Ming Lei wrote:
> > > During dispatching, we moved all requests from hctx->dispatch to
> > > one temporary list, then dispatch them one by one from this list.
> > > Unfortunately duirng this period, run queue from other contexts
> > > may think the queue is idle, then start to dequeue from sw/scheduler
> > > queue and still try to dispatch because ->dispatch is empty. This way
> > > hurts sequential I/O performance because requests are dequeued when
> > > lld queue is busy.
> > > 
> > > This patch introduces the state of BLK_MQ_S_DISPATCH_BUSY to
> > > make sure that request isn't dequeued until ->dispatch is
> > > flushed.
> > 
> > I don't like how this patch introduces a bunch of locked setting of a
> > flag under the hctx lock. Especially since I think we can easily avoid
> > it.
> 
> Actually the lock isn't needed for setting the flag, will move it out
> in V3.

My fault, looks we can't move it out of the lock, because the new
added rqs can be flushed with the bit cleared together just
between adding list to ->dispatch and setting BLK_MQ_S_DISPATCH_BUSY,
then the bit is never cleared and I/O hang is caused.

> 
> > 
> > > -	} else if (!has_sched_dispatch & !q->queue_depth) {
> > > +		blk_mq_dispatch_rq_list(q, &rq_list);
> > > +
> > > +		/*
> > > +		 * We may clear DISPATCH_BUSY just after it
> > > +		 * is set from another context, the only cost
> > > +		 * is that one request is dequeued a bit early,
> > > +		 * we can survive that. Given the window is
> > > +		 * too small, no need to worry about performance
> > > +		 * effect.
> > > +		 */
> > > +		if (list_empty_careful(&hctx->dispatch))
> > > +			clear_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state);
> > 
> > This is basically the only place where we modify it without holding the
> > hctx lock. Can we move it into blk_mq_dispatch_rq_list()? The list is
> 
> The problem is that blk_mq_dispatch_rq_list() don't know if it is
> handling requests from hctx->dispatch or sw/scheduler queue. We only
> need to clear the bit after hctx->dispatch is flushed. So the clearing
> can't be moved into blk_mq_dispatch_rq_list().
> 
> > generally empty, unless for the case where we splice residuals back. If
> > we splice them back, we grab the lock anyway.
> > 
> > The other places it's set under the hctx lock, yet we end up using an
> > atomic operation to do it.

In theory, it is better to hold the lock to clear the bit, but
with cost of one extra lock acquiring no matter moving it to
blk_mq_dispatch_rq_list() or not.

We can move clear_bit() into blk_mq_dispatch_rq_list() and
pass one parameter to indicate if it is handling requests
from ->dispatch or not, the following code is needed at
the end of blk_mq_dispatch_rq_list():

	if (list_empty(list)) {
		if (rq_from_dispatch_list) {
			spin_lock(&hctx->lock);
			if (list_empty_careful(&hctx->dispatch))
				clear_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state);
			spin_unlock(&hctx->lock);
		}
	}

If we clear the bit lockless, the BUSY bit may be cleared early, then
dequeue early, that is what we can accept because the race window is
so small.

-- 
Ming

  reply	other threads:[~2017-08-25 10:19 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-05  6:56 [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance Ming Lei
2017-08-05  6:56 ` [PATCH V2 01/20] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-08-09  0:11   ` Omar Sandoval
2017-08-09  2:32     ` Ming Lei
2017-08-09  7:11       ` Omar Sandoval
2017-08-21  8:18         ` Ming Lei
2017-08-23  7:48         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 02/20] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
2017-08-22 18:28   ` Bart Van Assche
2017-08-24  3:57     ` Ming Lei
2017-08-25 21:36       ` Bart Van Assche
2017-08-26  8:43         ` Ming Lei
2017-08-22 18:37   ` Bart Van Assche
2017-08-24  4:02     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 03/20] blk-mq: introduce blk_mq_dispatch_rq_from_ctx() Ming Lei
2017-08-22 18:45   ` Bart Van Assche
2017-08-24  4:52     ` Ming Lei
2017-08-25 21:41       ` Bart Van Assche
2017-08-26  8:47         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 04/20] blk-mq-sched: move actual dispatching into one helper Ming Lei
2017-08-22 19:50   ` Bart Van Assche
2017-08-05  6:56 ` [PATCH V2 05/20] blk-mq-sched: improve dispatching from sw queue Ming Lei
2017-08-22 19:55   ` Bart Van Assche
2017-08-23 19:58     ` Jens Axboe
2017-08-24  5:52     ` Ming Lei
2017-08-22 20:57   ` Bart Van Assche
2017-08-24  6:12     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 06/20] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
2017-08-22 20:09   ` Bart Van Assche
2017-08-24  6:18     ` Ming Lei
2017-08-23 19:56   ` Jens Axboe
2017-08-24  6:38     ` Ming Lei
2017-08-25 10:19       ` Ming Lei [this message]
2017-08-05  6:56 ` [PATCH V2 07/20] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
2017-08-22 20:10   ` Bart Van Assche
2017-08-05  6:56 ` [PATCH V2 08/20] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
2017-08-22 20:20   ` Bart Van Assche
2017-08-24  6:39     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 09/20] blk-mq: introduce BLK_MQ_F_SHARED_DEPTH Ming Lei
2017-08-22 21:55   ` Bart Van Assche
2017-08-23  6:46     ` Hannes Reinecke
2017-08-24  6:52     ` Ming Lei
2017-08-25 22:23       ` Bart Van Assche
2017-08-26  8:53         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 10/20] blk-mq-sched: introduce helpers for query, change busy state Ming Lei
2017-08-22 20:41   ` Bart Van Assche
2017-08-23 20:02     ` Jens Axboe
2017-08-24  6:55       ` Ming Lei
2017-08-24  6:54     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 11/20] blk-mq: introduce helpers for operating ->dispatch list Ming Lei
2017-08-22 20:43   ` Bart Van Assche
2017-08-24  0:59     ` Damien Le Moal
2017-08-24  7:10       ` Ming Lei
2017-08-24  7:42         ` Damien Le Moal
2017-08-24  6:57     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 12/20] blk-mq: introduce pointers to dispatch lock & list Ming Lei
2017-08-05  6:56 ` [PATCH V2 13/20] blk-mq: pass 'request_queue *' to several helpers of operating BUSY Ming Lei
2017-08-05  6:56 ` [PATCH V2 14/20] blk-mq-sched: improve IO scheduling on SCSI devcie Ming Lei
2017-08-22 20:51   ` Bart Van Assche
2017-08-24  7:14     ` Ming Lei
2017-08-05  6:57 ` [PATCH V2 15/20] block: introduce rqhash helpers Ming Lei
2017-08-05  6:57 ` [PATCH V2 16/20] block: move actual bio merge code into __elv_merge Ming Lei
2017-08-05  6:57 ` [PATCH V2 17/20] block: add check on elevator for supporting bio merge via hashtable from blk-mq sw queue Ming Lei
2017-08-05  6:57 ` [PATCH V2 18/20] block: introduce .last_merge and .hash to blk_mq_ctx Ming Lei
2017-08-05  6:57 ` [PATCH V2 19/20] blk-mq-sched: refactor blk_mq_sched_try_merge() Ming Lei
2017-08-05  6:57 ` [PATCH V2 20/20] blk-mq: improve bio merge from blk-mq sw queue Ming Lei
2017-08-07 12:48 ` [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance Laurence Oberman
2017-08-07 15:27   ` Bart Van Assche
2017-08-07 17:29     ` Laurence Oberman
2017-08-07 18:46       ` Laurence Oberman
2017-08-07 19:46         ` Laurence Oberman
2017-08-07 23:04       ` Ming Lei
     [not found]   ` <CAFfF4qv3W6D-j8BSSZbwPLqhd_mmwk8CZQe7dSqud8cMMd2yPg@mail.gmail.com>
2017-08-07 22:29     ` Bart Van Assche
2017-08-07 23:17     ` Ming Lei
2017-08-08 13:41     ` Ming Lei
2017-08-08 13:58       ` Laurence Oberman
2017-08-08  8:09 ` Paolo Valente
2017-08-08  9:09   ` Ming Lei
2017-08-08  9:13     ` Paolo Valente
2017-08-11  8:11 ` Christoph Hellwig
2017-08-11 14:25   ` James Bottomley
2017-08-23 16:12 ` Bart Van Assche
2017-08-23 16:15   ` Jens Axboe
2017-08-23 16:24     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170825101944.GA13872@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bart.vanassche@sandisk.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=loberman@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.