From: James Bottomley <jejb@linux.vnet.ibm.com>
To: Bart Van Assche <bart.vanassche@wdc.com>,
"Martin K . Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
Hannes Reinecke <hare@suse.com>,
Johannes Thumshirn <jthumshirn@suse.de>
Subject: Re: [PATCH] sd: Fix a disk probing hang
Date: Tue, 07 Nov 2017 10:09:00 -0800 [thread overview]
Message-ID: <1510078140.3118.18.camel@linux.vnet.ibm.com> (raw)
In-Reply-To: <20171107173807.14396-1-bart.vanassche@wdc.com>
On Tue, 2017-11-07 at 09:38 -0800, Bart Van Assche wrote:
> Avoid that disk probing hangs as follows if a SCSI host is removed
> after disk scanning started and before it completed:
>
> Call Trace:
> __schedule+0x2fa/0xbb0
> schedule+0x36/0x90
> schedule_timeout+0x22c/0x570
> io_schedule_timeout+0x1e/0x50
> wait_for_completion_io_timeout+0x11f/0x180
> blk_execute_rq+0x86/0xc0
> scsi_execute+0xdb/0x1f0
> sd_revalidate_disk+0xed/0x1c70 [sd_mod]
> sd_probe_async+0xc3/0x1d0 [sd_mod]
> async_run_entry_fn+0x38/0x160
> process_one_work+0x20a/0x660
> worker_thread+0x3d/0x3b0
> kthread+0x13a/0x150
> ret_from_fork+0x27/0x40
>
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
> drivers/scsi/sd.c | 23 ++++++++++++++++++++++-
> 1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 0313486d85c8..d5e2b73c02ea 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3225,11 +3225,13 @@ static void sd_probe_async(void *data,
> async_cookie_t cookie)
> {
> struct scsi_disk *sdkp = data;
> struct scsi_device *sdp;
> + struct Scsi_Host *host;
> struct gendisk *gd;
> u32 index;
> struct device *dev;
>
> sdp = sdkp->device;
> + host = sdp->host;
> gd = sdkp->disk;
> index = sdkp->index;
> dev = &sdp->sdev_gendev;
> @@ -3253,6 +3255,13 @@ static void sd_probe_async(void *data,
> async_cookie_t cookie)
> sdkp->first_scan = 1;
> sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS;
>
> + mutex_lock(&host->scan_mutex);
I really don't like this: by taking the scan mutex here, you
synchronize this with everything else and make this routine single
threaded with every other host scan operation. That would make the
name sd_probe_async() a complete lie.
Additionally, any reference to the disk should *automatically* hold the
host, because the last reference to the host is in the disk release
routine, so this explicit taking of a reference should be completely
unnecessary (and if it isn't, we need to fix the bug at source, not
hide it like this).
The whole point about our async routines is that they're supposed to
rely on refcounting. So, the host cannot be freed until the last
device reference is gone. However, the host and its devices can go
into DEL state, which means the mid-layer replies error for them and
the async scan is supposed to take that error and pass it up. The hang
you're getting may be the result of a missing scsi_device_online()
check, or it could be some premature failure of the underlying device
driver (going into SHOST_DEL with outstanding commands causes them to
get frozen) but can you investigate the root cause rather than trying
this bandaid?
Thanks,
James
next prev parent reply other threads:[~2017-11-07 18:09 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-07 17:38 [PATCH] sd: Fix a disk probing hang Bart Van Assche
2017-11-07 18:09 ` James Bottomley [this message]
2017-11-07 22:42 ` Bart Van Assche
2017-11-07 22:57 ` James Bottomley
2017-11-08 8:12 ` Hannes Reinecke
2017-11-08 16:31 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1510078140.3118.18.camel@linux.vnet.ibm.com \
--to=jejb@linux.vnet.ibm.com \
--cc=bart.vanassche@wdc.com \
--cc=hare@suse.com \
--cc=hch@lst.de \
--cc=jthumshirn@suse.de \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).