public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Bart Van Assche <Bart.VanAssche@sandisk.com>
To: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"israelr@mellanox.com" <israelr@mellanox.com>
Cc: "maxg@mellanox.com" <maxg@mellanox.com>
Subject: Re: [PATCH] scsi_sysfs: fix hang when removing scsi device
Date: Thu, 2 Mar 2017 16:27:21 +0000	[thread overview]
Message-ID: <1488472028.2659.1.camel@sandisk.com> (raw)
In-Reply-To: <1488465953-29832-1-git-send-email-israelr@mellanox.com>

On Thu, 2017-03-02 at 16:45 +0200, Israel Rukshin wrote:
> The bug reproduce when unloading srp module with one port down.
> device_del() hangs when __scsi_remove_device() get scsi_device with
> state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE.
> It hangs because device_del() is trying to send sync cache command
> when the device is offline but with SDEV_CANCEL status.
> The status was changed to SDEV_CANCEL by __scsi_remove_device()
> before it calls to device_del().

It is not device_del() but sd_shutdown() that submits the SYNCHRONIZE
CACHE command. It should also be explained in the commit message why
the block layer timeout mechanism did not cause the SYNCHRONIZE CACHE
commands to fail after the timeout expired.

> Fixes: e619e6cbe (Revert "scsi: Fix a bdi reregistration race")

This does not make sense. A revert is never the original introduction
of a bug but at most a reintroduction. Please refer to the commit that
originally introduced this hang.

> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1282,6 +1282,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  		return;
>  
>  	if (sdev->is_visible) {
> +		enum scsi_device_state oldstate = sdev->sdev_state;
> +
>  		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>  			return;
>  
> @@ -1289,6 +1291,15 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  		device_unregister(&sdev->sdev_dev);
>  		transport_remove_device(dev);
>  		scsi_dh_remove_device(sdev);
> +
> +		/*
> +		 * Fail new requests if the old state was offline.
> +		 * This avoids from device_del() to hang.
> +		 */
> +		if (oldstate == SDEV_TRANSPORT_OFFLINE ||
> +		    oldstate == SDEV_OFFLINE)
> +			blk_set_queue_dying(sdev->request_queue);
> +
>  		device_del(dev);
>  	} else
>  		put_device(&sdev->sdev_dev);

Please refer to sd_shutdown() instead of device_del() in the above comment,
and also explain why the block layer timeout mechanism did not cause the
SYNCHRONIZE CACHE commands to fail.

Thanks,

Bart.

      reply	other threads:[~2017-03-02 16:29 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-02 14:45 [PATCH] scsi_sysfs: fix hang when removing scsi device Israel Rukshin
2017-03-02 16:27 ` Bart Van Assche [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1488472028.2659.1.camel@sandisk.com \
    --to=bart.vanassche@sandisk.com \
    --cc=israelr@mellanox.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=maxg@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox