Re: [PATCH v4 6/6] dm rq: Avoid that request processing stalls sporadically

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bart Van Assche <Bart.VanAssche@sandisk.com>
To: "ming.lei@redhat.com" <ming.lei@redhat.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>
Subject: Re: [PATCH v4 6/6] dm rq: Avoid that request processing stalls sporadically
Date: Wed, 12 Apr 2017 18:38:07 +0000	[thread overview]
Message-ID: <1492022286.2764.15.camel@sandisk.com> (raw)
In-Reply-To: <20170412034229.GA8835@ming.t460p>

On Wed, 2017-04-12 at 11:42 +0800, Ming Lei wrote:
> On Tue, Apr 11, 2017 at 06:18:36PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-04-11 at 14:03 -0400, Mike Snitzer wrote:
> > > Rather than working so hard to use DM code against me, your argument
> > > should be: "blk-mq drivers X, Y and Z rerun the hw queue; this is a well
> > > established pattern"
> > > 
> > > I see drivers/nvme/host/fc.c:nvme_fc_start_fcp_op() does.  But that is
> > > only one other driver out of ~20 BLK_MQ_RQ_QUEUE_BUSY returns
> > > tree-wide.
> > > 
> > > Could be there are some others, but hardly a well-established pattern.
> > 
> > Hello Mike,
> > 
> > Several blk-mq drivers that can return BLK_MQ_RQ_QUEUE_BUSY from their
> > .queue_rq() implementation stop the request queue (blk_mq_stop_hw_queue())
> > before returning "busy" and restart the queue after the busy condition has
> > been cleared (blk_mq_start_stopped_hw_queues()). Examples are virtio_blk and
> > xen-blkfront. However, this approach is not appropriate for the dm-mq core
> > nor for the scsi core since both drivers already use the "stopped" state for
> > another purpose than tracking whether or not a hardware queue is busy. Hence
> > the blk_mq_delay_run_hw_queue() and blk_mq_run_hw_queue() calls in these last
> > two drivers to rerun a hardware queue after the busy state has been cleared.
> 
> But looks this patch just reruns the hw queue after 100ms, which isn't
> that after the busy state has been cleared, right?

Hello Ming,

That patch can be considered as a first step that can be refined further, namely
by modifying the dm-rq code further such that dm-rq queues are only rerun after
the busy condition has been cleared. The patch at the start of this thread is
easier to review and easier to test than any patch that would only rerun dm-rq
queues after the busy condition has been cleared.

> Actually if BLK_MQ_RQ_QUEUE_BUSY is returned from .queue_rq(), blk-mq
> will buffer this request into hctx->dispatch and run the hw queue again,
> so looks blk_mq_delay_run_hw_queue() in this situation shouldn't have been
> needed at my 1st impression.

If the blk-mq core would always rerun a hardware queue if a block driver
returns BLK_MQ_RQ_QUEUE_BUSY then that would cause 100% of a single CPU core
to be busy with polling a hardware queue until the "busy" condition has been
cleared. One can see easily that that's not what the blk-mq core does. From
blk_mq_sched_dispatch_requests():

	if (!list_empty(&rq_list)) {
		blk_mq_sched_mark_restart_hctx(hctx);
		did_work = blk_mq_dispatch_rq_list(q, &rq_list);
	}

From the end of blk_mq_dispatch_rq_list():

	if (!list_empty(list)) {
		[ ... ]
		if (!blk_mq_sched_needs_restart(hctx) &&
		    !test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state))
			blk_mq_run_hw_queue(hctx, true);
	}

In other words, the BLK_MQ_S_SCHED_RESTART flag is set before the dispatch list
is examined and only if that flag gets cleared while blk_mq_dispatch_rq_list()
is in progress by a concurrent blk_mq_sched_restart_hctx() call then the
dispatch list will be rerun after a block driver returned BLK_MQ_RQ_QUEUE_BUSY.

Bart.

WARNING: multiple messages have this Message-ID (diff)

From: Bart Van Assche <Bart.VanAssche@sandisk.com>
To: "ming.lei@redhat.com" <ming.lei@redhat.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>
Subject: Re: [PATCH v4 6/6] dm rq: Avoid that request processing stalls sporadically
Date: Wed, 12 Apr 2017 18:38:07 +0000	[thread overview]
Message-ID: <1492022286.2764.15.camel@sandisk.com> (raw)
In-Reply-To: <20170412034229.GA8835@ming.t460p>

On Wed, 2017-04-12 at 11:42 +0800, Ming Lei wrote:
> On Tue, Apr 11, 2017 at 06:18:36PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-04-11 at 14:03 -0400, Mike Snitzer wrote:
> > > Rather than working so hard to use DM code against me, your argument
> > > should be: "blk-mq drivers X, Y and Z rerun the hw queue; this is a w=
ell
> > > established pattern"
> > >=20
> > > I see drivers/nvme/host/fc.c:nvme_fc_start_fcp_op() does.  But that i=
s
> > > only one other driver out of ~20 BLK_MQ_RQ_QUEUE_BUSY returns
> > > tree-wide.
> > >=20
> > > Could be there are some others, but hardly a well-established pattern=
.
> >=20
> > Hello Mike,
> >=20
> > Several blk-mq drivers that can return BLK_MQ_RQ_QUEUE_BUSY from their
> > .queue_rq() implementation stop the request queue=A0(blk_mq_stop_hw_que=
ue())
> > before returning "busy" and restart the queue after the busy condition =
has
> > been cleared (blk_mq_start_stopped_hw_queues()). Examples are virtio_bl=
k and
> > xen-blkfront. However, this approach is not appropriate for the dm-mq c=
ore
> > nor for the scsi core since both drivers already use the "stopped" stat=
e for
> > another purpose than tracking whether or not a hardware queue is busy. =
Hence
> > the blk_mq_delay_run_hw_queue() and blk_mq_run_hw_queue() calls in thes=
e last
> > two drivers to rerun a hardware queue after the busy state has been cle=
ared.
>=20
> But looks this patch just reruns the hw queue after 100ms, which isn't
> that after the busy state has been cleared, right?

Hello Ming,

That patch can be considered as a first step that can be refined further, n=
amely
by modifying the dm-rq code further such that dm-rq queues are only rerun a=
fter
the busy condition has been cleared. The patch at the start of this thread =
is
easier to review and easier to test than any patch that would only rerun dm=
-rq
queues after the busy condition has been cleared.

> Actually if BLK_MQ_RQ_QUEUE_BUSY is returned from .queue_rq(), blk-mq
> will buffer this request into hctx->dispatch and run the hw queue again,
> so looks blk_mq_delay_run_hw_queue() in this situation shouldn't have bee=
n
> needed at my 1st impression.

If the blk-mq core would always rerun a hardware queue if a block driver
returns BLK_MQ_RQ_QUEUE_BUSY then that would cause 100% of a single CPU cor=
e
to be busy with polling a hardware queue until the "busy" condition has bee=
n
cleared. One can see easily that that's not what the blk-mq core does. From
blk_mq_sched_dispatch_requests():

	if (!list_empty(&rq_list)) {
		blk_mq_sched_mark_restart_hctx(hctx);
		did_work =3D blk_mq_dispatch_rq_list(q, &rq_list);
	}

>From the end of blk_mq_dispatch_rq_list():

	if (!list_empty(list)) {
		[ ... ]
		if (!blk_mq_sched_needs_restart(hctx) &&
		=A0=A0=A0=A0!test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state))
			blk_mq_run_hw_queue(hctx, true);
	}

In other words, the BLK_MQ_S_SCHED_RESTART flag is set before the dispatch =
list
is examined and only if that flag gets cleared while blk_mq_dispatch_rq_lis=
t()
is in progress by a concurrent blk_mq_sched_restart_hctx() call then the
dispatch list will be rerun after a block driver returned=A0BLK_MQ_RQ_QUEUE=
_BUSY.

Bart.=

WARNING: multiple messages have this Message-ID (diff)

From: Bart Van Assche <Bart.VanAssche@sandisk.com>
To: "ming.lei@redhat.com" <ming.lei@redhat.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>
Subject: Re: [PATCH v4 6/6] dm rq: Avoid that request processing stalls sporadically
Date: Wed, 12 Apr 2017 18:38:07 +0000	[thread overview]
Message-ID: <1492022286.2764.15.camel@sandisk.com> (raw)
In-Reply-To: <20170412034229.GA8835@ming.t460p>

On Wed, 2017-04-12 at 11:42 +0800, Ming Lei wrote:
> On Tue, Apr 11, 2017 at 06:18:36PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-04-11 at 14:03 -0400, Mike Snitzer wrote:
> > > Rather than working so hard to use DM code against me, your argument
> > > should be: "blk-mq drivers X, Y and Z rerun the hw queue; this is a well
> > > established pattern"
> > > 
> > > I see drivers/nvme/host/fc.c:nvme_fc_start_fcp_op() does.  But that is
> > > only one other driver out of ~20 BLK_MQ_RQ_QUEUE_BUSY returns
> > > tree-wide.
> > > 
> > > Could be there are some others, but hardly a well-established pattern.
> > 
> > Hello Mike,
> > 
> > Several blk-mq drivers that can return BLK_MQ_RQ_QUEUE_BUSY from their
> > .queue_rq() implementation stop the request queue (blk_mq_stop_hw_queue())
> > before returning "busy" and restart the queue after the busy condition has
> > been cleared (blk_mq_start_stopped_hw_queues()). Examples are virtio_blk and
> > xen-blkfront. However, this approach is not appropriate for the dm-mq core
> > nor for the scsi core since both drivers already use the "stopped" state for
> > another purpose than tracking whether or not a hardware queue is busy. Hence
> > the blk_mq_delay_run_hw_queue() and blk_mq_run_hw_queue() calls in these last
> > two drivers to rerun a hardware queue after the busy state has been cleared.
> 
> But looks this patch just reruns the hw queue after 100ms, which isn't
> that after the busy state has been cleared, right?

Hello Ming,

That patch can be considered as a first step that can be refined further, namely
by modifying the dm-rq code further such that dm-rq queues are only rerun after
the busy condition has been cleared. The patch at the start of this thread is
easier to review and easier to test than any patch that would only rerun dm-rq
queues after the busy condition has been cleared.

> Actually if BLK_MQ_RQ_QUEUE_BUSY is returned from .queue_rq(), blk-mq
> will buffer this request into hctx->dispatch and run the hw queue again,
> so looks blk_mq_delay_run_hw_queue() in this situation shouldn't have been
> needed at my 1st impression.

If the blk-mq core would always rerun a hardware queue if a block driver
returns BLK_MQ_RQ_QUEUE_BUSY then that would cause 100% of a single CPU core
to be busy with polling a hardware queue until the "busy" condition has been
cleared. One can see easily that that's not what the blk-mq core does. From
blk_mq_sched_dispatch_requests():

	if (!list_empty(&rq_list)) {
		blk_mq_sched_mark_restart_hctx(hctx);
		did_work = blk_mq_dispatch_rq_list(q, &rq_list);
	}

>From the end of blk_mq_dispatch_rq_list():

	if (!list_empty(list)) {
		[ ... ]
		if (!blk_mq_sched_needs_restart(hctx) &&
		    !test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state))
			blk_mq_run_hw_queue(hctx, true);
	}

In other words, the BLK_MQ_S_SCHED_RESTART flag is set before the dispatch list
is examined and only if that flag gets cleared while blk_mq_dispatch_rq_list()
is in progress by a concurrent blk_mq_sched_restart_hctx() call then the
dispatch list will be rerun after a block driver returned BLK_MQ_RQ_QUEUE_BUSY.

Bart.

next prev parent reply	other threads:[~2017-04-12 18:38 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-07 18:16 [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically Bart Van Assche
2017-04-07 18:16 ` Bart Van Assche
2017-04-07 18:16 ` [PATCH v4 1/6] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list Bart Van Assche
2017-04-07 18:16   ` Bart Van Assche
2017-04-10  7:10   ` Christoph Hellwig
2017-04-07 18:16 ` [PATCH v4 2/6] blk-mq: Restart a single queue if tag sets are shared Bart Van Assche
2017-04-07 18:16   ` Bart Van Assche
2017-04-10  7:11   ` Christoph Hellwig
2017-04-07 18:16 ` [PATCH v4 3/6] blk-mq: Clarify comments in blk_mq_dispatch_rq_list() Bart Van Assche
2017-04-07 18:16   ` Bart Van Assche
2017-04-10  7:11   ` Christoph Hellwig
2017-04-07 18:16 ` [PATCH v4 4/6] blk-mq: Introduce blk_mq_delay_run_hw_queue() Bart Van Assche
2017-04-07 18:16   ` Bart Van Assche
2017-04-10  7:12   ` Christoph Hellwig
2017-04-10 15:02     ` Jens Axboe
2017-04-07 18:16 ` [PATCH v4 5/6] scsi: Avoid that SCSI queues get stuck Bart Van Assche
2017-04-07 18:16   ` Bart Van Assche
2017-04-10  7:12   ` Christoph Hellwig
2017-04-07 18:16 ` [PATCH v4 6/6] dm rq: Avoid that request processing stalls sporadically Bart Van Assche
2017-04-07 18:16   ` Bart Van Assche
2017-04-11 16:09   ` Mike Snitzer
2017-04-11 16:09     ` Mike Snitzer
2017-04-11 16:26     ` Bart Van Assche
2017-04-11 16:26       ` Bart Van Assche
2017-04-11 17:47       ` Mike Snitzer
2017-04-11 17:51         ` Bart Van Assche
2017-04-11 17:51           ` Bart Van Assche
2017-04-11 18:03           ` Mike Snitzer
2017-04-11 18:03             ` Mike Snitzer
2017-04-11 18:18             ` Bart Van Assche
2017-04-11 18:18               ` Bart Van Assche
2017-04-12  3:42               ` Ming Lei
2017-04-12  3:42                 ` Ming Lei
2017-04-12 18:38                 ` Bart Van Assche [this message]
2017-04-12 18:38                   ` Bart Van Assche
2017-04-12 18:38                   ` Bart Van Assche
2017-04-13  2:20                   ` Ming Lei
2017-04-13  2:20                     ` Ming Lei
2017-04-13 16:59                     ` Bart Van Assche
2017-04-14  1:13                       ` Ming Lei
2017-04-14 17:12                         ` Bart Van Assche
2017-04-14 17:12                           ` Bart Van Assche
2017-04-16 10:21                           ` Ming Lei
2017-04-07 18:23 ` [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue " Jens Axboe
2017-04-07 18:33   ` Bart Van Assche
2017-04-07 18:33     ` Bart Van Assche
2017-04-07 18:39     ` Bart Van Assche
2017-04-07 18:39       ` Bart Van Assche
2017-04-07 18:51       ` Jens Axboe
2017-04-12 10:55 ` Benjamin Block
2017-04-12 10:55   ` Benjamin Block
2017-04-12 18:11   ` Bart Van Assche
2017-04-12 18:11     ` Bart Van Assche
2017-04-13 12:23     ` Benjamin Block
2017-04-13 12:23       ` Benjamin Block

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1492022286.2764.15.camel@sandisk.com \
    --to=bart.vanassche@sandisk.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.