public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi_sysfs: fix hang when removing scsi device
@ 2017-03-02 14:45 Israel Rukshin
  2017-03-02 16:27 ` Bart Van Assche
  0 siblings, 1 reply; 2+ messages in thread
From: Israel Rukshin @ 2017-03-02 14:45 UTC (permalink / raw)
  To: linux-scsi; +Cc: Bart Van Assche, Max Gurtovoy, Israel Rukshin

The bug reproduce when unloading srp module with one port down.
device_del() hangs when __scsi_remove_device() get scsi_device with
state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE.
It hangs because device_del() is trying to send sync cache command
when the device is offline but with SDEV_CANCEL status.
The status was changed to SDEV_CANCEL by __scsi_remove_device()
before it calls to device_del().

This commit doesn't accept new commands if the original state was offline.

sysrq: SysRq : sysrq: Show Blocked State
task PC stack pid father
kworker/2:0 D ffff88046fa95c00 0 21178 2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
Call Trace:
[<ffffffff815dd985>] schedule+0x35/0x80
[<ffffffff815e02c7>] schedule_timeout+0x237/0x2d0
[<ffffffff815dcf46>] io_schedule_timeout+0xa6/0x110
[<ffffffff815de2f3>] wait_for_completion_io+0xa3/0x110
[<ffffffff812e66ff>] blk_execute_rq+0xdf/0x120
[<ffffffffa00135de>] scsi_execute+0xce/0x150 [scsi_mod]
[<ffffffffa001548f>] scsi_execute_req_flags+0x8f/0xf0 [scsi_mod]
[<ffffffffa0154849>] sd_sync_cache+0xa9/0x190 [sd_mod]
[<ffffffffa0154c3a>] sd_shutdown+0x6a/0x100 [sd_mod]
[<ffffffffa0154d34>] sd_remove+0x64/0xc0 [sd_mod]
[<ffffffff8144d3fd>] __device_release_driver+0x8d/0x120
[<ffffffff8144d4ae>] device_release_driver+0x1e/0x30
[<ffffffff8144c239>] bus_remove_device+0xf9/0x170
[<ffffffff81448bc7>] device_del+0x127/0x240
[<ffffffffa001c0f1>] __scsi_remove_device+0xc1/0xd0 [scsi_mod]
[<ffffffffa001a5d7>] scsi_forget_host+0x57/0x60 [scsi_mod]
[<ffffffffa000e3d2>] scsi_remove_host+0x72/0x110 [scsi_mod]
[<ffffffffa06f95ab>] srp_remove_work+0x8b/0x200 [ib_srp]
...

Fixes: e619e6cbe (Revert "scsi: Fix a bdi reregistration race")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/scsi/scsi_sysfs.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 82dfe07..ed55aac 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1282,6 +1282,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
 		return;
 
 	if (sdev->is_visible) {
+		enum scsi_device_state oldstate = sdev->sdev_state;
+
 		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
 			return;
 
@@ -1289,6 +1291,15 @@ void __scsi_remove_device(struct scsi_device *sdev)
 		device_unregister(&sdev->sdev_dev);
 		transport_remove_device(dev);
 		scsi_dh_remove_device(sdev);
+
+		/*
+		 * Fail new requests if the old state was offline.
+		 * This avoids from device_del() to hang.
+		 */
+		if (oldstate == SDEV_TRANSPORT_OFFLINE ||
+		    oldstate == SDEV_OFFLINE)
+			blk_set_queue_dying(sdev->request_queue);
+
 		device_del(dev);
 	} else
 		put_device(&sdev->sdev_dev);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] scsi_sysfs: fix hang when removing scsi device
  2017-03-02 14:45 [PATCH] scsi_sysfs: fix hang when removing scsi device Israel Rukshin
@ 2017-03-02 16:27 ` Bart Van Assche
  0 siblings, 0 replies; 2+ messages in thread
From: Bart Van Assche @ 2017-03-02 16:27 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org, israelr@mellanox.com; +Cc: maxg@mellanox.com

On Thu, 2017-03-02 at 16:45 +0200, Israel Rukshin wrote:
> The bug reproduce when unloading srp module with one port down.
> device_del() hangs when __scsi_remove_device() get scsi_device with
> state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE.
> It hangs because device_del() is trying to send sync cache command
> when the device is offline but with SDEV_CANCEL status.
> The status was changed to SDEV_CANCEL by __scsi_remove_device()
> before it calls to device_del().

It is not device_del() but sd_shutdown() that submits the SYNCHRONIZE
CACHE command. It should also be explained in the commit message why
the block layer timeout mechanism did not cause the SYNCHRONIZE CACHE
commands to fail after the timeout expired.

> Fixes: e619e6cbe (Revert "scsi: Fix a bdi reregistration race")

This does not make sense. A revert is never the original introduction
of a bug but at most a reintroduction. Please refer to the commit that
originally introduced this hang.

> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1282,6 +1282,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  		return;
>  
>  	if (sdev->is_visible) {
> +		enum scsi_device_state oldstate = sdev->sdev_state;
> +
>  		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>  			return;
>  
> @@ -1289,6 +1291,15 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  		device_unregister(&sdev->sdev_dev);
>  		transport_remove_device(dev);
>  		scsi_dh_remove_device(sdev);
> +
> +		/*
> +		 * Fail new requests if the old state was offline.
> +		 * This avoids from device_del() to hang.
> +		 */
> +		if (oldstate == SDEV_TRANSPORT_OFFLINE ||
> +		    oldstate == SDEV_OFFLINE)
> +			blk_set_queue_dying(sdev->request_queue);
> +
>  		device_del(dev);
>  	} else
>  		put_device(&sdev->sdev_dev);

Please refer to sd_shutdown() instead of device_del() in the above comment,
and also explain why the block layer timeout mechanism did not cause the
SYNCHRONIZE CACHE commands to fail.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-03-02 16:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-02 14:45 [PATCH] scsi_sysfs: fix hang when removing scsi device Israel Rukshin
2017-03-02 16:27 ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox