[PATCH v2 0/4] Make SCSI transport recovery more robust

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] Make SCSI transport recovery more robust
@ 2018-01-10 18:18 Bart Van Assche
  2018-01-10 18:18 ` [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device Bart Van Assche
       [not found] ` <20180110181817.25166-1-bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
  0 siblings, 2 replies; 10+ messages in thread
From: Bart Van Assche @ 2018-01-10 18:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, Martin K . Petersen,
	Christoph Hellwig, Hannes Reinecke, Jason Gunthorpe, Doug Ledford,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bart Van Assche

Hello Jens,

A longstanding issue with the SCSI core is that several SCSI transport drivers
use scsi_target_block() and scsi_target_unblock() to avoid concurrent
.queuecommand() calls during e.g. transport recovery but that this is not
sufficient to protect from such calls. Hence this patch series. An additional
benefit of this patch series is that it allows to remove an ugly hack from
the SRP initiator driver. Please consider this patch series for kernel v4.16.

Thanks,

Bart.

Changes compared to v1:
- Renamed blk_wait_if_quiesced() into blk_start_wait_if_quiesced().
- Mentioned in the comment above blk_start_wait_if_quiesced() that every call
  of this function has to be followed by a call of
  blk_finish_wait_if_quiesced().

Bart Van Assche (4):
  blk-mq: Rename request_queue.mq_freeze_wq into mq_wq
  block: Introduce blk_start_wait_if_quiesced() and
    blk_finish_wait_if_quiesced()
  scsi: Avoid that .queuecommand() gets called for a quiesced SCSI
    device
  IB/srp: Fix a sleep-in-invalid-context bug

 block/blk-core.c                    | 11 +++---
 block/blk-mq.c                      | 74 ++++++++++++++++++++++++++++++++++---
 drivers/infiniband/ulp/srp/ib_srp.c | 21 +----------
 drivers/scsi/scsi_error.c           |  3 ++
 include/linux/blk-mq.h              |  2 +
 include/linux/blkdev.h              |  2 +-
 6 files changed, 83 insertions(+), 30 deletions(-)

-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device
  2018-01-10 18:18 [PATCH v2 0/4] Make SCSI transport recovery more robust Bart Van Assche
@ 2018-01-10 18:18 ` Bart Van Assche
  2018-01-11  2:23   ` Ming Lei
       [not found] ` <20180110181817.25166-1-bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2018-01-10 18:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Martin K . Petersen, Christoph Hellwig,
	Hannes Reinecke, Jason Gunthorpe, Doug Ledford, linux-scsi,
	linux-rdma, Bart Van Assche, Johannes Thumshirn, Ming Lei

Several SCSI transport and LLD drivers surround code that does not
tolerate concurrent calls of .queuecommand() with scsi_target_block() /
scsi_target_unblock(). These last two functions use
blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request
queues to prevent concurrent .queuecommand() calls. However, that is
not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
Hence surround the .queuecommand() call from the SCSI error handler with
blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced().

Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into
code that calls blk_get_request(), e.g. scsi_execute_req(), is not an
option since scsi_send_eh_cmnd() can be called if all requests are
allocated and if no requests will make progress without aborting any
of these requests.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_error.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 62b56de38ae8..f7154ea86715 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1016,6 +1016,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 {
 	struct scsi_device *sdev = scmd->device;
 	struct Scsi_Host *shost = sdev->host;
+	struct request_queue *q = sdev->request_queue;
 	DECLARE_COMPLETION_ONSTACK(done);
 	unsigned long timeleft = timeout;
 	struct scsi_eh_save ses;
@@ -1028,7 +1029,9 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,

 	scsi_log_send(scmd);
 	scmd->scsi_done = scsi_eh_done;
+	blk_start_wait_if_quiesced(q);
 	rtn = shost->hostt->queuecommand(shost, scmd);
+	blk_finish_wait_if_quiesced(q);
 	if (rtn) {
 		if (timeleft > stall_for) {
 			scsi_eh_restore_cmnd(scmd, &ses);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device
  2018-01-10 18:18 ` [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device Bart Van Assche
@ 2018-01-11  2:23   ` Ming Lei
       [not found]     ` <20180111022341.GA7290-1Cy0MwPVgsaKEAUSAb389g@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2018-01-11  2:23 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Martin K . Petersen, Christoph Hellwig,
	Hannes Reinecke, Jason Gunthorpe, Doug Ledford, linux-scsi,
	linux-rdma, Johannes Thumshirn

On Wed, Jan 10, 2018 at 10:18:16AM -0800, Bart Van Assche wrote:
> Several SCSI transport and LLD drivers surround code that does not
> tolerate concurrent calls of .queuecommand() with scsi_target_block() /
> scsi_target_unblock(). These last two functions use
> blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request
> queues to prevent concurrent .queuecommand() calls. However, that is

Actually blk_mq_quiesce_queue() is supposed to disable and drain dispatch,
not for prevent concurrent .queuecommand() calls.

> not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().

Given it is error handling, do we need to prevent the .queuecommand() call
in scsi_send_eh_cmnd()? Could you share us what the actual issue
observed is from user view?

> Hence surround the .queuecommand() call from the SCSI error handler with
> blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced().
> 
> Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into
> code that calls blk_get_request(), e.g. scsi_execute_req(), is not an
> option since scsi_send_eh_cmnd() can be called if all requests are
> allocated and if no requests will make progress without aborting any
> of these requests.

If we need to prevent the .queuecommand() in scsi_send_eh_cmnd(), what
do you think of the approach by requeuing the EH command via
scsi_mq_requeue_cmd()/scsi_requeue_cmd()? And the EH request will be
dispatched finally when the queue becomes unquiesced or the STOPPED
is cleared.

-- 
Ming

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <20180111022341.GA7290-1Cy0MwPVgsaKEAUSAb389g@public.gmane.org>]

* Re: [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device
       [not found]     ` <20180111022341.GA7290-1Cy0MwPVgsaKEAUSAb389g@public.gmane.org>
@ 2018-01-12 22:45       ` Bart Van Assche
  2018-01-13 15:54         ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2018-01-12 22:45 UTC (permalink / raw)
  To: ming.lei-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	jthumshirn-l3A5Bk7waGM@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	jgg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	hch-jcswGhMUV9g@public.gmane.org,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	hare-IBi9RG/b67k@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1726 bytes --]

On Thu, 2018-01-11 at 10:23 +0800, Ming Lei wrote:
> > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
> 
> Given it is error handling, do we need to prevent the .queuecommand() call
> in scsi_send_eh_cmnd()? Could you share us what the actual issue
> observed is from user view?

Please have a look at the kernel bug report in the description of patch 4/4 of
this series.

> > Hence surround the .queuecommand() call from the SCSI error handler with
> > blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced().
> > 
> > Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into
> > code that calls blk_get_request(), e.g. scsi_execute_req(), is not an
> > option since scsi_send_eh_cmnd() can be called if all requests are
> > allocated and if no requests will make progress without aborting any
> > of these requests.
> 
> If we need to prevent the .queuecommand() in scsi_send_eh_cmnd(), what
> do you think of the approach by requeuing the EH command via
> scsi_mq_requeue_cmd()/scsi_requeue_cmd()? And the EH request will be
> dispatched finally when the queue becomes unquiesced or the STOPPED
> is cleared.

Calling scsi_mq_requeue_cmd() would trigger scsi_mq_uninit_cmd(),
blk_mq_put_driver_tag(), blk_mq_get_driver_tag() and scsi_mq_prep_fn().
That could cause scsi_eh_scmd_add() to be called multiple times for the
same SCSI command. However, that would break one of the assumptions
scsi_eh_scmd_add() is based on, namely that that function gets called
only once per SCSI command that is in progress.

Bart. N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±ÙšŠ{ayº\x1dÊ‡Ú™ë,j\a¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device
  2018-01-12 22:45       ` Bart Van Assche
@ 2018-01-13 15:54         ` Ming Lei
  2018-01-25 17:07           ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2018-01-13 15:54 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block@vger.kernel.org, jthumshirn@suse.de,
	linux-rdma@vger.kernel.org, jgg@mellanox.com, hch@lst.de,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, hare@suse.com, dledford@redhat.com

On Fri, Jan 12, 2018 at 10:45:57PM +0000, Bart Van Assche wrote:
> On Thu, 2018-01-11 at 10:23 +0800, Ming Lei wrote:
> > > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
> > 
> > Given it is error handling, do we need to prevent the .queuecommand() call
> > in scsi_send_eh_cmnd()? Could you share us what the actual issue
> > observed is from user view?
> 
> Please have a look at the kernel bug report in the description of patch 4/4 of
> this series.

Thanks for your mentioning, then I can find the following comment in
srp_queuecommand():

		/*
		 * The SCSI EH thread is the only context from which srp_queuecommand()
		 * can get invoked for blocked devices (SDEV_BLOCK /
		 * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by
		 * locking the rport mutex if invoked from inside the SCSI EH.
		 */

That means EH request is allowed to send to blocked device.

I also replied in patch 4/4, looks there is one simple one line change
which should address the issue of 'sleep in atomic context', please discuss that
in patch 4/4 thread.

-- 
Ming

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device
  2018-01-13 15:54         ` Ming Lei
@ 2018-01-25 17:07           ` Bart Van Assche
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2018-01-25 17:07 UTC (permalink / raw)
  To: ming.lei@redhat.com
  Cc: linux-block@vger.kernel.org, jthumshirn@suse.de,
	linux-rdma@vger.kernel.org, jgg@mellanox.com, hch@lst.de,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, hare@suse.com, dledford@redhat.com

On Sat, 2018-01-13 at 23:54 +0800, Ming Lei wrote:
> Thanks for your mentioning, then I can find the following comment in
> srp_queuecommand():
> 
> 		/*
> 		 * The SCSI EH thread is the only context from which srp_queuecommand()
> 		 * can get invoked for blocked devices (SDEV_BLOCK /
> 		 * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by
> 		 * locking the rport mutex if invoked from inside the SCSI EH.
> 		 */
> 
> That means EH request is allowed to send to blocked device.

No way. That's a bug and a bug that needs to be fixed.

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <20180110181817.25166-1-bart.vanassche-Sjgp3cTcYWE@public.gmane.org>]

* [PATCH v2 1/4] blk-mq: Rename request_queue.mq_freeze_wq into mq_wq
       [not found] ` <20180110181817.25166-1-bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
@ 2018-01-10 18:18   ` Bart Van Assche
  2018-01-10 18:18   ` [PATCH v2 2/4] block: Introduce blk_start_wait_if_quiesced() and blk_finish_wait_if_quiesced() Bart Van Assche
  2018-01-10 18:18   ` [PATCH v2 4/4] IB/srp: Fix a sleep-in-invalid-context bug Bart Van Assche
  2 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2018-01-10 18:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, Martin K . Petersen,
	Christoph Hellwig, Hannes Reinecke, Jason Gunthorpe, Doug Ledford,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bart Van Assche,
	Johannes Thumshirn, Ming Lei

Rename a waitqueue in struct request_queue since the next patch will
add code that uses this waitqueue outside the request queue freezing
implementation.

Signed-off-by: Bart Van Assche <bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
Reviewed-by: Hannes Reinecke <hare-l3A5Bk7waGM@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
Cc: Ming Lei <ming.lei-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 block/blk-core.c       | 10 +++++-----
 block/blk-mq.c         | 10 +++++-----
 include/linux/blkdev.h |  2 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index fa1cb95f7f6a..c10b4ce95248 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -377,7 +377,7 @@ void blk_clear_preempt_only(struct request_queue *q)
 
 	spin_lock_irqsave(q->queue_lock, flags);
 	queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
-	wake_up_all(&q->mq_freeze_wq);
+	wake_up_all(&q->mq_wq);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
@@ -648,7 +648,7 @@ void blk_set_queue_dying(struct request_queue *q)
 	}
 
 	/* Make blk_queue_enter() reexamine the DYING flag. */
-	wake_up_all(&q->mq_freeze_wq);
+	wake_up_all(&q->mq_wq);
 }
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
@@ -851,7 +851,7 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		 */
 		smp_rmb();
 
-		ret = wait_event_interruptible(q->mq_freeze_wq,
+		ret = wait_event_interruptible(q->mq_wq,
 				(atomic_read(&q->mq_freeze_depth) == 0 &&
 				 (preempt || !blk_queue_preempt_only(q))) ||
 				blk_queue_dying(q));
@@ -872,7 +872,7 @@ static void blk_queue_usage_counter_release(struct percpu_ref *ref)
 	struct request_queue *q =
 		container_of(ref, struct request_queue, q_usage_counter);
 
-	wake_up_all(&q->mq_freeze_wq);
+	wake_up_all(&q->mq_wq);
 }
 
 static void blk_rq_timed_out_timer(struct timer_list *t)
@@ -948,7 +948,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	q->bypass_depth = 1;
 	__set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags);
 
-	init_waitqueue_head(&q->mq_freeze_wq);
+	init_waitqueue_head(&q->mq_wq);
 
 	/*
 	 * Init percpu_ref in atomic mode so that it's faster to shutdown.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 29f140b4dbf7..a05ea7e9b415 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -137,16 +137,16 @@ EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
 
 void blk_mq_freeze_queue_wait(struct request_queue *q)
 {
-	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
+	wait_event(q->mq_wq, percpu_ref_is_zero(&q->q_usage_counter));
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 				     unsigned long timeout)
 {
-	return wait_event_timeout(q->mq_freeze_wq,
-					percpu_ref_is_zero(&q->q_usage_counter),
-					timeout);
+	return wait_event_timeout(q->mq_wq,
+				  percpu_ref_is_zero(&q->q_usage_counter),
+				  timeout);
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);
 
@@ -185,7 +185,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 	WARN_ON_ONCE(freeze_depth < 0);
 	if (!freeze_depth) {
 		percpu_ref_reinit(&q->q_usage_counter);
-		wake_up_all(&q->mq_freeze_wq);
+		wake_up_all(&q->mq_wq);
 	}
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 50fb1c18ec54..2c74c03a9d5f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -638,7 +638,7 @@ struct request_queue {
 	struct throtl_data *td;
 #endif
 	struct rcu_head		rcu_head;
-	wait_queue_head_t	mq_freeze_wq;
+	wait_queue_head_t	mq_wq;
 	struct percpu_ref	q_usage_counter;
 	struct list_head	all_q_node;
 
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/4] block: Introduce blk_start_wait_if_quiesced() and blk_finish_wait_if_quiesced()
       [not found] ` <20180110181817.25166-1-bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
  2018-01-10 18:18   ` [PATCH v2 1/4] blk-mq: Rename request_queue.mq_freeze_wq into mq_wq Bart Van Assche
@ 2018-01-10 18:18   ` Bart Van Assche
  2018-01-10 18:18   ` [PATCH v2 4/4] IB/srp: Fix a sleep-in-invalid-context bug Bart Van Assche
  2 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2018-01-10 18:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, Martin K . Petersen,
	Christoph Hellwig, Hannes Reinecke, Jason Gunthorpe, Doug Ledford,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Ming Lei

Introduce functions that allow block drivers to wait while a request
queue is in the quiesced state (blk-mq) or in the stopped state (legacy
block layer). The next patch will add calls to these functions in the
SCSI core.

Signed-off-by: Bart Van Assche <bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
Cc: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Hannes Reinecke <hare-l3A5Bk7waGM@public.gmane.org>
Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
Cc: Ming Lei <ming.lei-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 block/blk-core.c       |  1 +
 block/blk-mq.c         | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h |  2 ++
 3 files changed, 67 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index c10b4ce95248..06eaea15bae9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -287,6 +287,7 @@ void blk_start_queue(struct request_queue *q)
 	WARN_ON_ONCE(q->mq_ops);
 
 	queue_flag_clear(QUEUE_FLAG_STOPPED, q);
+	wake_up_all(&q->mq_wq);
 	__blk_run_queue(q);
 }
 EXPORT_SYMBOL(blk_start_queue);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a05ea7e9b415..87455977ad34 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -247,11 +247,75 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
 	queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
+	wake_up_all(&q->mq_wq);
+
 	/* dispatch requests which are inserted during quiescing */
 	blk_mq_run_hw_queues(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
+/**
+ * blk_start_wait_if_quiesced() - wait if a queue is quiesced (blk-mq) or stopped (legacy block layer)
+ * @q: Request queue pointer.
+ *
+ * Some block drivers, e.g. the SCSI core, can bypass the block layer core
+ * request submission mechanism. Surround such code with
+ * blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced() to avoid that
+ * request submission can happen while a queue is quiesced or stopped.
+ *
+ * Returns with the RCU read lock held (blk-mq) or with q->queue_lock held
+ * (legacy block layer).
+ *
+ * Notes:
+ * - Every call of this function must be followed by a call of
+ *   blk_finish_wait_if_quiesced().
+ * - This function does not support block drivers whose .queue_rq()
+ *   implementation can sleep (BLK_MQ_F_BLOCKING).
+ */
+int blk_start_wait_if_quiesced(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+
+	might_sleep();
+
+	if (q->mq_ops) {
+		queue_for_each_hw_ctx(q, hctx, i)
+			WARN_ON(hctx->flags & BLK_MQ_F_BLOCKING);
+
+		rcu_read_lock();
+		while (!blk_queue_dying(q) && blk_queue_quiesced(q)) {
+			rcu_read_unlock();
+			wait_event(q->mq_wq, blk_queue_dying(q) ||
+				   !blk_queue_quiesced(q));
+			rcu_read_lock();
+		}
+	} else {
+		spin_lock_irq(q->queue_lock);
+		wait_event_lock_irq(q->mq_wq,
+				    blk_queue_dying(q) || !blk_queue_stopped(q),
+				    *q->queue_lock);
+		q->request_fn_active++;
+	}
+	return blk_queue_dying(q) ? -ENODEV : 0;
+}
+EXPORT_SYMBOL(blk_start_wait_if_quiesced);
+
+/**
+ * blk_finish_wait_if_quiesced() - counterpart of blk_start_wait_if_quiesced()
+ * @q: Request queue pointer.
+ */
+void blk_finish_wait_if_quiesced(struct request_queue *q)
+{
+	if (q->mq_ops) {
+		rcu_read_unlock();
+	} else {
+		q->request_fn_active--;
+		spin_unlock_irq(q->queue_lock);
+	}
+}
+EXPORT_SYMBOL(blk_finish_wait_if_quiesced);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49796a3..15912cd348b5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -267,6 +267,8 @@ void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
 void blk_mq_quiesce_queue(struct request_queue *q);
 void blk_mq_unquiesce_queue(struct request_queue *q);
+int blk_start_wait_if_quiesced(struct request_queue *q);
+void blk_finish_wait_if_quiesced(struct request_queue *q);
 void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_run_hw_queues(struct request_queue *q, bool async);
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/4] IB/srp: Fix a sleep-in-invalid-context bug
       [not found] ` <20180110181817.25166-1-bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
  2018-01-10 18:18   ` [PATCH v2 1/4] blk-mq: Rename request_queue.mq_freeze_wq into mq_wq Bart Van Assche
  2018-01-10 18:18   ` [PATCH v2 2/4] block: Introduce blk_start_wait_if_quiesced() and blk_finish_wait_if_quiesced() Bart Van Assche
@ 2018-01-10 18:18   ` Bart Van Assche
       [not found]     ` <20180110181817.25166-5-bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
  2 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2018-01-10 18:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, Martin K . Petersen,
	Christoph Hellwig, Hannes Reinecke, Jason Gunthorpe, Doug Ledford,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bart Van Assche

The previous two patches guarantee that srp_queuecommand() does not get
invoked while reconnecting occurs. Hence remove the code from
srp_queuecommand() that prevents command queueing while reconnecting.
This patch avoids that the following can appear in the kernel log:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 5600, name: scsi_eh_9
1 lock held by scsi_eh_9/5600:
 #0:  (rcu_read_lock){....}, at: [<00000000cbb798c7>] __blk_mq_run_hw_queue+0xf1/0x1e0
Preemption disabled at:
[<00000000139badf2>] __blk_mq_delay_run_hw_queue+0x78/0xf0
CPU: 9 PID: 5600 Comm: scsi_eh_9 Tainted: G        W        4.15.0-rc4-dbg+ #1
Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS 2.5.4 01/22/2016
Call Trace:
 dump_stack+0x67/0x99
 ___might_sleep+0x16a/0x250 [ib_srp]
 __mutex_lock+0x46/0x9d0
 srp_queuecommand+0x356/0x420 [ib_srp]
 scsi_dispatch_cmd+0xf6/0x3f0
 scsi_queue_rq+0x4a8/0x5f0
 blk_mq_dispatch_rq_list+0x73/0x440
 blk_mq_sched_dispatch_requests+0x109/0x1a0
 __blk_mq_run_hw_queue+0x131/0x1e0
 __blk_mq_delay_run_hw_queue+0x9a/0xf0
 blk_mq_run_hw_queue+0xc0/0x1e0
 blk_mq_start_hw_queues+0x2c/0x40
 scsi_run_queue+0x18e/0x2d0
 scsi_run_host_queues+0x22/0x40
 scsi_error_handler+0x18d/0x5f0
 kthread+0x11c/0x140
 ret_from_fork+0x24/0x30

Signed-off-by: Bart Van Assche <bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
Cc: Jason Gunthorpe <jgg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 21 ++-------------------
 1 file changed, 2 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 972d4b3c5223..670f187ecb91 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2149,7 +2149,6 @@ static void srp_handle_qp_err(struct ib_cq *cq, struct ib_wc *wc,
 static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(shost);
-	struct srp_rport *rport = target->rport;
 	struct srp_rdma_ch *ch;
 	struct srp_request *req;
 	struct srp_iu *iu;
@@ -2159,16 +2158,6 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	u32 tag;
 	u16 idx;
 	int len, ret;
-	const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler;
-
-	/*
-	 * The SCSI EH thread is the only context from which srp_queuecommand()
-	 * can get invoked for blocked devices (SDEV_BLOCK /
-	 * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by
-	 * locking the rport mutex if invoked from inside the SCSI EH.
-	 */
-	if (in_scsi_eh)
-		mutex_lock(&rport->mutex);
 
 	scmnd->result = srp_chkready(target->rport);
 	if (unlikely(scmnd->result))
@@ -2230,13 +2219,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 		goto err_unmap;
 	}
 
-	ret = 0;
-
-unlock_rport:
-	if (in_scsi_eh)
-		mutex_unlock(&rport->mutex);
-
-	return ret;
+	return 0;
 
 err_unmap:
 	srp_unmap_data(scmnd, ch, req);
@@ -2258,7 +2241,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 		ret = SCSI_MLQUEUE_HOST_BUSY;
 	}
 
-	goto unlock_rport;
+	return ret;
 }
 
 /*
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

[parent not found: <20180110181817.25166-5-bart.vanassche-Sjgp3cTcYWE@public.gmane.org>]

* Re: [PATCH v2 4/4] IB/srp: Fix a sleep-in-invalid-context bug
       [not found]     ` <20180110181817.25166-5-bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
@ 2018-01-13 15:42       ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2018-01-13 15:42 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block-u79uwXL29TY76Z2rM5mHXA,
	Martin K . Petersen, Christoph Hellwig, Hannes Reinecke,
	Jason Gunthorpe, Doug Ledford, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Jan 10, 2018 at 10:18:17AM -0800, Bart Van Assche wrote:
> The previous two patches guarantee that srp_queuecommand() does not get
> invoked while reconnecting occurs. Hence remove the code from
> srp_queuecommand() that prevents command queueing while reconnecting.
> This patch avoids that the following can appear in the kernel log:
> 
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
> in_atomic(): 1, irqs_disabled(): 0, pid: 5600, name: scsi_eh_9
> 1 lock held by scsi_eh_9/5600:
>  #0:  (rcu_read_lock){....}, at: [<00000000cbb798c7>] __blk_mq_run_hw_queue+0xf1/0x1e0
> Preemption disabled at:
> [<00000000139badf2>] __blk_mq_delay_run_hw_queue+0x78/0xf0
> CPU: 9 PID: 5600 Comm: scsi_eh_9 Tainted: G        W        4.15.0-rc4-dbg+ #1
> Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS 2.5.4 01/22/2016
> Call Trace:
>  dump_stack+0x67/0x99
>  ___might_sleep+0x16a/0x250 [ib_srp]
>  __mutex_lock+0x46/0x9d0
>  srp_queuecommand+0x356/0x420 [ib_srp]
>  scsi_dispatch_cmd+0xf6/0x3f0
>  scsi_queue_rq+0x4a8/0x5f0
>  blk_mq_dispatch_rq_list+0x73/0x440
>  blk_mq_sched_dispatch_requests+0x109/0x1a0
>  __blk_mq_run_hw_queue+0x131/0x1e0
>  __blk_mq_delay_run_hw_queue+0x9a/0xf0
>  blk_mq_run_hw_queue+0xc0/0x1e0
>  blk_mq_start_hw_queues+0x2c/0x40
>  scsi_run_queue+0x18e/0x2d0
>  scsi_run_host_queues+0x22/0x40
>  scsi_error_handler+0x18d/0x5f0
>  kthread+0x11c/0x140
>  ret_from_fork+0x24/0x30
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
> Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
> Cc: Jason Gunthorpe <jgg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/infiniband/ulp/srp/ib_srp.c | 21 ++-------------------
>  1 file changed, 2 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 972d4b3c5223..670f187ecb91 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -2149,7 +2149,6 @@ static void srp_handle_qp_err(struct ib_cq *cq, struct ib_wc *wc,
>  static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
>  {
>  	struct srp_target_port *target = host_to_target(shost);
> -	struct srp_rport *rport = target->rport;
>  	struct srp_rdma_ch *ch;
>  	struct srp_request *req;
>  	struct srp_iu *iu;
> @@ -2159,16 +2158,6 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
>  	u32 tag;
>  	u16 idx;
>  	int len, ret;
> -	const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler;
> -
> -	/*
> -	 * The SCSI EH thread is the only context from which srp_queuecommand()
> -	 * can get invoked for blocked devices (SDEV_BLOCK /
> -	 * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by
> -	 * locking the rport mutex if invoked from inside the SCSI EH.
> -	 */
> -	if (in_scsi_eh)
> -		mutex_lock(&rport->mutex);

This issue is triggered because that the above EH handler context detection is
wrong since scsi_run_host_queues() from scsi_error_handler() is for
handling normal requests, see the comment below:

        /*
         * finally we need to re-initiate requests that may be pending.  we will
         * have had everything blocked while error handling is taking place, and
         * now that error recovery is done, we will need to ensure that these
         * requests are started.
         */
        scsi_run_host_queues(shost);

This issue should have been fixed by the following one line change simply:

	in_scsi_eh = !!scmd->eh_action;

-- 
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-01-25 17:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-10 18:18 [PATCH v2 0/4] Make SCSI transport recovery more robust Bart Van Assche
2018-01-10 18:18 ` [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device Bart Van Assche
2018-01-11  2:23   ` Ming Lei
     [not found]     ` <20180111022341.GA7290-1Cy0MwPVgsaKEAUSAb389g@public.gmane.org>
2018-01-12 22:45       ` Bart Van Assche
2018-01-13 15:54         ` Ming Lei
2018-01-25 17:07           ` Bart Van Assche
     [not found] ` <20180110181817.25166-1-bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
2018-01-10 18:18   ` [PATCH v2 1/4] blk-mq: Rename request_queue.mq_freeze_wq into mq_wq Bart Van Assche
2018-01-10 18:18   ` [PATCH v2 2/4] block: Introduce blk_start_wait_if_quiesced() and blk_finish_wait_if_quiesced() Bart Van Assche
2018-01-10 18:18   ` [PATCH v2 4/4] IB/srp: Fix a sleep-in-invalid-context bug Bart Van Assche
     [not found]     ` <20180110181817.25166-5-bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
2018-01-13 15:42       ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox