* [PATCH] scsi: ioctl reset should wait for IOs to complete
@ 2017-09-26 17:22 Lee Duncan
2017-09-26 18:24 ` Bart Van Assche
2017-09-26 18:33 ` James Bottomley
0 siblings, 2 replies; 5+ messages in thread
From: Lee Duncan @ 2017-09-26 17:22 UTC (permalink / raw)
To: linux-scsi
Cc: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
Lee Duncan
The SCSI ioctl reset path is smart enough to set the
flag tmf_in_progress when a user-requested reset is
processed, but it does not wait for IO that is in
flight. This can result in lost IOs and hung
processes. We should wait for a reasonable amount
of time for either the IOs to complete or to fail
the request.
Signed-off-by: Lee Duncan <lduncan@suse.com>
---
drivers/scsi/scsi_error.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 38942050b265..b964152611c3 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -57,6 +57,14 @@
#define BUS_RESET_SETTLE_TIME (10)
#define HOST_RESET_SETTLE_TIME (10)
+/*
+ * Time to wait for outstanding IOs when about to send
+ * a device reset, e.g. sg_reset. The msecs to wait must
+ * be an multiple of the msecs to wait per try.
+ */
+#define MSECS_PER_TRY_FOR_IO_ON_RESET 500
+#define MSECS_TO_WAIT_FOR_IO_ON_RESET (MSECS_PER_TRY_FOR_IO_ON_RESET * 10)
+
static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
static int scsi_try_to_abort_cmd(struct scsi_host_template *,
struct scsi_cmnd *);
@@ -2269,6 +2277,7 @@ void scsi_report_device_reset(struct Scsi_Host *shost, int channel, int target)
struct request *rq;
unsigned long flags;
int error = 0, rtn, val;
+ unsigned int msecs_to_wait = MSECS_TO_WAIT_FOR_IO_ON_RESET;
if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
return -EACCES;
@@ -2301,6 +2310,22 @@ void scsi_report_device_reset(struct Scsi_Host *shost, int channel, int target)
spin_lock_irqsave(shost->host_lock, flags);
shost->tmf_in_progress = 1;
+
+ /* if any IOs in progress wait for them a while */
+ while ((atomic_read(&shost->host_busy) > 0) && (msecs_to_wait > 0)) {
+ spin_unlock_irqrestore(shost->host_lock, flags);
+ msleep(MSECS_PER_TRY_FOR_IO_ON_RESET);
+ msecs_to_wait -= MSECS_PER_TRY_FOR_IO_ON_RESET;
+ spin_lock_irqsave(shost->host_lock, flags);
+ }
+ if (atomic_read(&shost->host_busy)) {
+ shost->tmf_in_progress = 0;
+ spin_unlock_irqrestore(shost->host_lock, flags);
+ SCSI_LOG_ERROR_RECOVERY(3,
+ printk("%s: device reset failed: outstanding IO\n", __func__));
+ goto out_put_scmd_and_free;
+ }
+
spin_unlock_irqrestore(shost->host_lock, flags);
switch (val & ~SG_SCSI_RESET_NO_ESCALATE) {
@@ -2349,6 +2374,7 @@ void scsi_report_device_reset(struct Scsi_Host *shost, int channel, int target)
wake_up(&shost->host_wait);
scsi_run_host_queues(shost);
+out_put_scmd_and_free:
scsi_put_command(scmd);
kfree(rq);
--
1.8.5.6
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] scsi: ioctl reset should wait for IOs to complete
2017-09-26 17:22 [PATCH] scsi: ioctl reset should wait for IOs to complete Lee Duncan
@ 2017-09-26 18:24 ` Bart Van Assche
2017-09-27 6:58 ` Hannes Reinecke
2017-09-26 18:33 ` James Bottomley
1 sibling, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2017-09-26 18:24 UTC (permalink / raw)
To: linux-scsi@vger.kernel.org, lduncan@suse.com
Cc: jejb@linux.vnet.ibm.com, hare@suse.com,
martin.petersen@oracle.com
On Tue, 2017-09-26 at 10:22 -0700, Lee Duncan wrote:
> The SCSI ioctl reset path is smart enough to set the
> flag tmf_in_progress when a user-requested reset is
> processed, but it does not wait for IO that is in
> flight. This can result in lost IOs and hung
> processes. We should wait for a reasonable amount
> of time for either the IOs to complete or to fail
> the request.
Hello Lee,
I'm using this functionality all the time to test how SCSI target code handles
TMFs while SCSI commands are in progress. So I would regret if the SCSI reset
ioctl code would be modified such that it waits for outstanding requests.
Isn't the behavior you described a SCSI LLD bug? Shouldn't such bugs be fixed
instead of implementing a work-around in the SCSI core?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] scsi: ioctl reset should wait for IOs to complete
2017-09-26 17:22 [PATCH] scsi: ioctl reset should wait for IOs to complete Lee Duncan
2017-09-26 18:24 ` Bart Van Assche
@ 2017-09-26 18:33 ` James Bottomley
1 sibling, 0 replies; 5+ messages in thread
From: James Bottomley @ 2017-09-26 18:33 UTC (permalink / raw)
To: Lee Duncan, linux-scsi; +Cc: Martin K . Petersen, Hannes Reinecke
On Tue, 2017-09-26 at 10:22 -0700, Lee Duncan wrote:
> The SCSI ioctl reset path is smart enough to set the
> flag tmf_in_progress when a user-requested reset is
> processed, but it does not wait for IO that is in
> flight. This can result in lost IOs and hung
> processes. We should wait for a reasonable amount
> of time for either the IOs to complete or to fail
> the request.
The idea of the SCSI ioctl was to be async: issue reset and return. Do
we have nothing in userspace that expects async behaviour? (if we have,
you could still add a flag or a new ioctl).
However:
> Signed-off-by: Lee Duncan <lduncan@suse.com>
> ---
> drivers/scsi/scsi_error.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 38942050b265..b964152611c3 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -57,6 +57,14 @@
> #define BUS_RESET_SETTLE_TIME (10)
> #define HOST_RESET_SETTLE_TIME (10)
>
> +/*
> + * Time to wait for outstanding IOs when about to send
> + * a device reset, e.g. sg_reset. The msecs to wait must
> + * be an multiple of the msecs to wait per try.
> + */
> +#define MSECS_PER_TRY_FOR_IO_ON_RESET 500
> +#define MSECS_TO_WAIT_FOR_IO_ON_RESET (MSECS_PER_TRY_FOR_IO_O
> N_RESET * 10)
> +
> static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
> static int scsi_try_to_abort_cmd(struct scsi_host_template *,
> struct scsi_cmnd *);
> @@ -2269,6 +2277,7 @@ void scsi_report_device_reset(struct Scsi_Host
> *shost, int channel, int target)
> struct request *rq;
> unsigned long flags;
> int error = 0, rtn, val;
> + unsigned int msecs_to_wait = MSECS_TO_WAIT_FOR_IO_ON_RESET;
scsi_report_device_reset is the wrong place to do the wait: that's used
by low level device drivers to report that they've detected a bus reset
and is expected to return immediately.
James
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] scsi: ioctl reset should wait for IOs to complete
2017-09-26 18:24 ` Bart Van Assche
@ 2017-09-27 6:58 ` Hannes Reinecke
2017-09-27 16:00 ` Lee Duncan
0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2017-09-27 6:58 UTC (permalink / raw)
To: Bart Van Assche, linux-scsi@vger.kernel.org, lduncan@suse.com
Cc: jejb@linux.vnet.ibm.com, martin.petersen@oracle.com
On 09/26/2017 08:24 PM, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 10:22 -0700, Lee Duncan wrote:
>> The SCSI ioctl reset path is smart enough to set the
>> flag tmf_in_progress when a user-requested reset is
>> processed, but it does not wait for IO that is in
>> flight. This can result in lost IOs and hung
>> processes. We should wait for a reasonable amount
>> of time for either the IOs to complete or to fail
>> the request.
>
> Hello Lee,
>
> I'm using this functionality all the time to test how SCSI target code handles
> TMFs while SCSI commands are in progress. So I would regret if the SCSI reset
> ioctl code would be modified such that it waits for outstanding requests.
> Isn't the behavior you described a SCSI LLD bug? Shouldn't such bugs be fixed
> instead of implementing a work-around in the SCSI core?
>
Well, thing is that there is an asymmetry here; originally all SCSI EH
functions were supposed to run with no I/O in flight.
(I've modified that with the asynchronous ABORT TASK TMF, but still).
But when called with sg_reset this is no longer true, we're disallowing
new requests, but do not wait for the in-flight I/O to complete.
And we've had a customer report where calling sg_reset -t on an iSCSI
device caused I/O to become stuck as the in-flight I/O was terminated by
the target reset, but the iSCSI stack never sent a completion for that I/O.
However, we could also defer this problem until my SCSI EH rework goes
in; that clears up the sg_reset path and might clarify things a bit.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.com +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] scsi: ioctl reset should wait for IOs to complete
2017-09-27 6:58 ` Hannes Reinecke
@ 2017-09-27 16:00 ` Lee Duncan
0 siblings, 0 replies; 5+ messages in thread
From: Lee Duncan @ 2017-09-27 16:00 UTC (permalink / raw)
To: Hannes Reinecke, Bart Van Assche, linux-scsi@vger.kernel.org
Cc: jejb@linux.vnet.ibm.com, martin.petersen@oracle.com
On 09/26/2017 11:58 PM, Hannes Reinecke wrote:
> On 09/26/2017 08:24 PM, Bart Van Assche wrote:
>> On Tue, 2017-09-26 at 10:22 -0700, Lee Duncan wrote:
>>> The SCSI ioctl reset path is smart enough to set the
>>> flag tmf_in_progress when a user-requested reset is
>>> processed, but it does not wait for IO that is in
>>> flight. This can result in lost IOs and hung
>>> processes. We should wait for a reasonable amount
>>> of time for either the IOs to complete or to fail
>>> the request.
>>
>> Hello Lee,
>>
>> I'm using this functionality all the time to test how SCSI target code handles
>> TMFs while SCSI commands are in progress. So I would regret if the SCSI reset
>> ioctl code would be modified such that it waits for outstanding requests.
>> Isn't the behavior you described a SCSI LLD bug? Shouldn't such bugs be fixed
>> instead of implementing a work-around in the SCSI core?
>>
> Well, thing is that there is an asymmetry here; originally all SCSI EH
> functions were supposed to run with no I/O in flight.
> (I've modified that with the asynchronous ABORT TASK TMF, but still).
> But when called with sg_reset this is no longer true, we're disallowing
> new requests, but do not wait for the in-flight I/O to complete.
> And we've had a customer report where calling sg_reset -t on an iSCSI
> device caused I/O to become stuck as the in-flight I/O was terminated by
> the target reset, but the iSCSI stack never sent a completion for that I/O.
>
> However, we could also defer this problem until my SCSI EH rework goes
> in; that clears up the sg_reset path and might clarify things a bit.
>
> Cheers,
>
> Hannes
>
I will wait and see if the problem still exists there, and address it if
it does.
Thank you.
--
Lee Duncan
SUSE Labs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-09-27 16:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-26 17:22 [PATCH] scsi: ioctl reset should wait for IOs to complete Lee Duncan
2017-09-26 18:24 ` Bart Van Assche
2017-09-27 6:58 ` Hannes Reinecke
2017-09-27 16:00 ` Lee Duncan
2017-09-26 18:33 ` James Bottomley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox