[PATCH] Protect against overflow in dev_loss

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Protect against overflow in dev_loss_tmo
@ 2010-03-09  9:18 Hannes Reinecke
  2010-03-09 14:14 ` James Smart
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Reinecke @ 2010-03-09  9:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi


The rport structure defines dev_loss_tmo as u32, which is
later multiplied with HZ to get the actual timeout value.
This might overflow for large dev_loss_tmo values. So we
should be better using u64 as intermediate variables here
to protect against overflow.

Signed-off-by: Hannes Reinecke <hare@suse.de>

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 79660ee..9860322 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -833,7 +833,7 @@ static ssize_t
 store_fc_rport_dev_loss_tmo(struct device *dev, struct device_attribute *attr,
 			    const char *buf, size_t count)
 {
-	int val;
+	unsigned long val;
 	struct fc_rport *rport = transport_class_to_rport(dev);
 	struct Scsi_Host *shost = rport_to_shost(rport);
 	struct fc_internal *i = to_fc_internal(shost->transportt);
@@ -847,6 +847,12 @@ store_fc_rport_dev_loss_tmo(struct device *dev, struct device_attribute *attr,
 		return -EINVAL;
 
 	/*
+	 * Check for overflow; dev_loss_tmo is u32
+	 */
+	if (val > UINT_MAX)
+		return -EINVAL;
+
+	/*
 	 * If fast_io_fail is off we have to cap
 	 * dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT
 	 */
@@ -2852,7 +2858,7 @@ void
 fc_remote_port_delete(struct fc_rport  *rport)
 {
 	struct Scsi_Host *shost = rport_to_shost(rport);
-	int timeout = rport->dev_loss_tmo;
+	unsigned long timeout = rport->dev_loss_tmo;
 	unsigned long flags;
 
 	/*

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] Protect against overflow in dev_loss_tmo
  2010-03-09  9:18 [PATCH] Protect against overflow in dev_loss_tmo Hannes Reinecke
@ 2010-03-09 14:14 ` James Smart
  2010-03-16 13:04   ` Christof Schmitt
  0 siblings, 1 reply; 9+ messages in thread
From: James Smart @ 2010-03-09 14:14 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: James Bottomley, linux-scsi@vger.kernel.org

I don't ever expect to see large dev_loss_tmo values, but the patch is fine.

Acked-by: James Smart <james.smart@emulex.com>

-- james s


Hannes Reinecke wrote:
> The rport structure defines dev_loss_tmo as u32, which is
> later multiplied with HZ to get the actual timeout value.
> This might overflow for large dev_loss_tmo values. So we
> should be better using u64 as intermediate variables here
> to protect against overflow.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
>
> diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
> index 79660ee..9860322 100644
> --- a/drivers/scsi/scsi_transport_fc.c
> +++ b/drivers/scsi/scsi_transport_fc.c
> @@ -833,7 +833,7 @@ static ssize_t
>  store_fc_rport_dev_loss_tmo(struct device *dev, struct device_attribute *attr,
>  			    const char *buf, size_t count)
>  {
> -	int val;
> +	unsigned long val;
>  	struct fc_rport *rport = transport_class_to_rport(dev);
>  	struct Scsi_Host *shost = rport_to_shost(rport);
>  	struct fc_internal *i = to_fc_internal(shost->transportt);
> @@ -847,6 +847,12 @@ store_fc_rport_dev_loss_tmo(struct device *dev, struct device_attribute *attr,
>  		return -EINVAL;
>  
>  	/*
> +	 * Check for overflow; dev_loss_tmo is u32
> +	 */
> +	if (val > UINT_MAX)
> +		return -EINVAL;
> +
> +	/*
>  	 * If fast_io_fail is off we have to cap
>  	 * dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT
>  	 */
> @@ -2852,7 +2858,7 @@ void
>  fc_remote_port_delete(struct fc_rport  *rport)
>  {
>  	struct Scsi_Host *shost = rport_to_shost(rport);
> -	int timeout = rport->dev_loss_tmo;
> +	unsigned long timeout = rport->dev_loss_tmo;
>  	unsigned long flags;
>  
>  	/*
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Protect against overflow in dev_loss_tmo
  2010-03-09 14:14 ` James Smart
@ 2010-03-16 13:04   ` Christof Schmitt
  2010-03-17  9:40     ` Hannes Reinecke
  2010-03-18  4:24     ` Mike Christie
  0 siblings, 2 replies; 9+ messages in thread
From: Christof Schmitt @ 2010-03-16 13:04 UTC (permalink / raw)
  To: James Smart; +Cc: Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org

On Tue, Mar 09, 2010 at 09:14:37AM -0500, James Smart wrote:
> I don't ever expect to see large dev_loss_tmo values, but the patch is fine.
> 
> Acked-by: James Smart <james.smart@emulex.com>
> 
> -- james s
> 
> 
> Hannes Reinecke wrote:
> >The rport structure defines dev_loss_tmo as u32, which is
> >later multiplied with HZ to get the actual timeout value.
> >This might overflow for large dev_loss_tmo values. So we
> >should be better using u64 as intermediate variables here
> >to protect against overflow.
> >
> >Signed-off-by: Hannes Reinecke <hare@suse.de>
[...]

I guess this is the intended use to prevent the dev_loss_tmo from
removing the SCSI devices:
http://git.kernel.org/?p=linux/kernel/git/hare/multipath-tools.git;a=commitdiff;h=b9903e2e8a6cdc5042897719fbae6c9346820bbf;hp=ed1dc6164fe530d146cfe65d4f99e44ec9b54b95

But does this raise the question again how to run SCSI EH with remote
port failures?

The SCSI FC LLDs call fc_block_scsi_eh to wait until the fc_rport
leaves the state FC_PORTSTATE_BLOCKED. This effectively prevents SCSI
devices from being taken offline when there is a command timeout and
the fc_rport is BLOCKED. With the large dev_loss_tmo, the dev_loss_tmo
never expires and a problem with a single remote port can block the
host error handler.

--
Christof

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Protect against overflow in dev_loss_tmo
  2010-03-16 13:04   ` Christof Schmitt
@ 2010-03-17  9:40     ` Hannes Reinecke
  2010-03-18  4:33       ` Mike Christie
  2010-03-18  4:24     ` Mike Christie
  1 sibling, 1 reply; 9+ messages in thread
From: Hannes Reinecke @ 2010-03-17  9:40 UTC (permalink / raw)
  To: Christof Schmitt; +Cc: James Smart, James Bottomley, linux-scsi@vger.kernel.org

Christof Schmitt wrote:
> On Tue, Mar 09, 2010 at 09:14:37AM -0500, James Smart wrote:
>> I don't ever expect to see large dev_loss_tmo values, but the patch is fine.
>>
>> Acked-by: James Smart <james.smart@emulex.com>
>>
>> -- james s
>>
>>
>> Hannes Reinecke wrote:
>>> The rport structure defines dev_loss_tmo as u32, which is
>>> later multiplied with HZ to get the actual timeout value.
>>> This might overflow for large dev_loss_tmo values. So we
>>> should be better using u64 as intermediate variables here
>>> to protect against overflow.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
> [...]
> 
> I guess this is the intended use to prevent the dev_loss_tmo from
> removing the SCSI devices:
> http://git.kernel.org/?p=linux/kernel/git/hare/multipath-tools.git;a=commitdiff;h=b9903e2e8a6cdc5042897719fbae6c9346820bbf;hp=ed1dc6164fe530d146cfe65d4f99e44ec9b54b95
> 
> But does this raise the question again how to run SCSI EH with remote
> port failures?
> 
> The SCSI FC LLDs call fc_block_scsi_eh to wait until the fc_rport
> leaves the state FC_PORTSTATE_BLOCKED. This effectively prevents SCSI
> devices from being taken offline when there is a command timeout and
> the fc_rport is BLOCKED. With the large dev_loss_tmo, the dev_loss_tmo
> never expires and a problem with a single remote port can block the
> host error handler.
> 
I was under the impression that terminate_rport_io() would cancel/terminate
all outstanding I/O, while any new I/O would be blocked due to FC_PORTSTATE_BLOCKED.

A device would only be taken offline if the full error recovery is run,
something we cannot do (reliably) if the path is down, so from that
point of view it totally reasonable to defer the error recovery here.

However, I'm curious as how one could get into that state while
the port is blocked.
The only way I can imagine is that an I/O has started before the
port entered FC_PORTSTATE_BLOCKED, and would return (with error)
before terminate_rport_io has been called.
So eh would be delayed due to fc_block_scsi_eh().
But I would have assumed that terminate_rport_io() would terminate
even this failing I/O with DID_TRANSPORT_FAILFAST (or somesuch),
thus avoiding proper eh altogether.

Am I wrong here?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Protect against overflow in dev_loss_tmo
  2010-03-16 13:04   ` Christof Schmitt
  2010-03-17  9:40     ` Hannes Reinecke
@ 2010-03-18  4:24     ` Mike Christie
  2010-03-22 14:12       ` Christof Schmitt
  1 sibling, 1 reply; 9+ messages in thread
From: Mike Christie @ 2010-03-18  4:24 UTC (permalink / raw)
  To: Christof Schmitt
  Cc: James Smart, Hannes Reinecke, James Bottomley,
	linux-scsi@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2697 bytes --]

On 03/16/2010 08:04 AM, Christof Schmitt wrote:
> On Tue, Mar 09, 2010 at 09:14:37AM -0500, James Smart wrote:
>> I don't ever expect to see large dev_loss_tmo values, but the patch is fine.
>>
>> Acked-by: James Smart<james.smart@emulex.com>
>>
>> -- james s
>>
>>
>> Hannes Reinecke wrote:
>>> The rport structure defines dev_loss_tmo as u32, which is
>>> later multiplied with HZ to get the actual timeout value.
>>> This might overflow for large dev_loss_tmo values. So we
>>> should be better using u64 as intermediate variables here
>>> to protect against overflow.
>>>
>>> Signed-off-by: Hannes Reinecke<hare@suse.de>
> [...]
>
> I guess this is the intended use to prevent the dev_loss_tmo from
> removing the SCSI devices:
> http://git.kernel.org/?p=linux/kernel/git/hare/multipath-tools.git;a=commitdiff;h=b9903e2e8a6cdc5042897719fbae6c9346820bbf;hp=ed1dc6164fe530d146cfe65d4f99e44ec9b54b95
>
> But does this raise the question again how to run SCSI EH with remote
> port failures?
>
> The SCSI FC LLDs call fc_block_scsi_eh to wait until the fc_rport
> leaves the state FC_PORTSTATE_BLOCKED. This effectively prevents SCSI
> devices from being taken offline when there is a command timeout and
> the fc_rport is BLOCKED. With the large dev_loss_tmo, the dev_loss_tmo
> never expires and a problem with a single remote port can block the
> host error handler.
>

I did the attached patch for a problem where we were supposed to be 
getting a RSCN but we were not, so the scsi eh ended up running. Then we 
sat in the scsi eh for a long time. multipath was being used, so we 
wanted to fast fail, but the the fc block scsi eh helper only waits for 
the rport state to change.

It ended up the switch was busted and should have given a RSCN, but I 
think the patch is helpful for your problem where we fall into the long 
wait for legitimate reasons.

This patch works nicely when using multipath to get IO failed and hosts 
unblocked, but the problem is that it could offline the devices more 
quickly than what we are prepared for. We have something similar for 
iscsi, but the iscsi userspace daemon has a way to set the device back 
to running when it recovers. FC does not, and dm-multipath does not do 
this, so I am not pushing the patch.

I think if we can figure something out for that offline issue the patch 
could be helpful. I was thinking that if the the scsi eh failed because 
the fast io fail timer fired, then we could set the devices to a new 
state. The scsi_transport_fc and scsi_transport_iscsi code could use 
this new device state to indicate that the fast io fail timer has fired 
and we are failing IO, but the device is not in offline state and is not 
blocked.

[-- Attachment #2: fc-fast-fail-block-scsi.patch --]
[-- Type: text/plain, Size: 2444 bytes --]

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 46720b2..ba69b7b 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -717,7 +717,8 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd)
 	struct req_que *req = vha->req;
 	srb_t *spt;
 
-	fc_block_scsi_eh(cmd);
+	if (fc_block_scsi_eh(cmd) == DID_TRANSPORT_FAILFAST << 16)
+		return ret;
 
 	if (!CMD_SP(cmd))
 		return SUCCESS;
@@ -848,7 +849,8 @@ __qla2xxx_eh_generic_reset(char *name, enum nexus_wait_type type,
 	fc_port_t *fcport = (struct fc_port *) cmd->device->hostdata;
 	int err;
 
-	fc_block_scsi_eh(cmd);
+	if (fc_block_scsi_eh(cmd) == DID_TRANSPORT_FAILFAST << 16)
+		return FAILED;
 
 	if (!fcport)
 		return FAILED;
@@ -928,7 +930,8 @@ qla2xxx_eh_bus_reset(struct scsi_cmnd *cmd)
 	unsigned long serial;
 	srb_t *sp = (srb_t *) CMD_SP(cmd);
 
-	fc_block_scsi_eh(cmd);
+	if (fc_block_scsi_eh(cmd) == DID_TRANSPORT_FAILFAST << 16)
+		return ret;
 
 	id = cmd->device->id;
 	lun = cmd->device->lun;
@@ -991,7 +994,8 @@ qla2xxx_eh_host_reset(struct scsi_cmnd *cmd)
 	srb_t *sp = (srb_t *) CMD_SP(cmd);
 	scsi_qla_host_t *base_vha = pci_get_drvdata(ha->pdev);
 
-	fc_block_scsi_eh(cmd);
+	if (fc_block_scsi_eh(cmd) == DID_TRANSPORT_FAILFAST << 16)
+		return ret;
 
 	id = cmd->device->id;
 	lun = cmd->device->lun;
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 1d5b721..6239efe 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3194,19 +3194,24 @@ fc_scsi_scan_rport(struct work_struct *work)
  * failing recovery actions for blocked rports which would lead to
  * offlined SCSI devices.
  */
-void fc_block_scsi_eh(struct scsi_cmnd *cmnd)
+int fc_block_scsi_eh(struct scsi_cmnd *cmnd)
 {
 	struct Scsi_Host *shost = cmnd->device->host;
 	struct fc_rport *rport = starget_to_rport(scsi_target(cmnd->device));
 	unsigned long flags;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	while (rport->port_state == FC_PORTSTATE_BLOCKED) {
+	while (rport->port_state == FC_PORTSTATE_BLOCKED &&
+		!(rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)) {
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		msleep(1000);
 		spin_lock_irqsave(shost->host_lock, flags);
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
+
+	if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
+		return DID_TRANSPORT_FAILFAST << 16;
+	return 0;
 }
 EXPORT_SYMBOL(fc_block_scsi_eh);
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] Protect against overflow in dev_loss_tmo
  2010-03-17  9:40     ` Hannes Reinecke
@ 2010-03-18  4:33       ` Mike Christie
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Christie @ 2010-03-18  4:33 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christof Schmitt, James Smart, James Bottomley,
	linux-scsi@vger.kernel.org

On 03/17/2010 04:40 AM, Hannes Reinecke wrote:
> Christof Schmitt wrote:
>> On Tue, Mar 09, 2010 at 09:14:37AM -0500, James Smart wrote:
>>> I don't ever expect to see large dev_loss_tmo values, but the patch is fine.
>>>
>>> Acked-by: James Smart<james.smart@emulex.com>
>>>
>>> -- james s
>>>
>>>
>>> Hannes Reinecke wrote:
>>>> The rport structure defines dev_loss_tmo as u32, which is
>>>> later multiplied with HZ to get the actual timeout value.
>>>> This might overflow for large dev_loss_tmo values. So we
>>>> should be better using u64 as intermediate variables here
>>>> to protect against overflow.
>>>>
>>>> Signed-off-by: Hannes Reinecke<hare@suse.de>
>> [...]
>>
>> I guess this is the intended use to prevent the dev_loss_tmo from
>> removing the SCSI devices:
>> http://git.kernel.org/?p=linux/kernel/git/hare/multipath-tools.git;a=commitdiff;h=b9903e2e8a6cdc5042897719fbae6c9346820bbf;hp=ed1dc6164fe530d146cfe65d4f99e44ec9b54b95
>>
>> But does this raise the question again how to run SCSI EH with remote
>> port failures?
>>
>> The SCSI FC LLDs call fc_block_scsi_eh to wait until the fc_rport
>> leaves the state FC_PORTSTATE_BLOCKED. This effectively prevents SCSI
>> devices from being taken offline when there is a command timeout and
>> the fc_rport is BLOCKED. With the large dev_loss_tmo, the dev_loss_tmo
>> never expires and a problem with a single remote port can block the
>> host error handler.
>>
> I was under the impression that terminate_rport_io() would cancel/terminate
> all outstanding I/O, while any new I/O would be blocked due to FC_PORTSTATE_BLOCKED.
>

terminate_rport_io kills outstanding IO when the fast io fail or dev 
loss tmo fires. If the fast io fail tmo fires, then oustanding io is 
killed by terminate_rport_io and new IO is fast failed because 
fc_terminate_rport_io  will unblock the rport devices (before getting 
called the ~FC_RPORT_FAST_FAIL_TIMEDOUT bit is set) and when IO is sent 
to the driver fc_remote_port_chkready will see the bit is set and return 
DID_TRANSPORT_FASTFAILED.

In the other mail, I was saying we should add a new device state for 
when the fast io fail timer fires (iscsi has something similar in the 
replacement/recovery timeout) that we could set the device to indicate 
that IO is going to get failed, but the device is not exactly offlined 
by the scsi eh, not blocked, and not deleted or canceled.

> A device would only be taken offline if the full error recovery is run,
> something we cannot do (reliably) if the path is down, so from that
> point of view it totally reasonable to defer the error recovery here.
>
> However, I'm curious as how one could get into that state while
> the port is blocked.
> The only way I can imagine is that an I/O has started before the
> port entered FC_PORTSTATE_BLOCKED, and would return (with error)
> before terminate_rport_io has been called.
> So eh would be delayed due to fc_block_scsi_eh().
> But I would have assumed that terminate_rport_io() would terminate
> even this failing I/O with DID_TRANSPORT_FAILFAST (or somesuch),
> thus avoiding proper eh altogether.
>

If the link goes down while the scsi eh is running we hit the problem 
Chrisof is describing. If you pulled a cable while the the abort handler 
was being called, then when you hit the drivers abort handler, and you 
would hit the block. The terminate_rport_io call from the fast io fail 
handling just fails the IO. There is nothing in the scsi eh to see that 
all cmds are now accounted for and that it can stop trying to unjam the 
host. You have to wait until dev loss tmo fires and that changes the 
rport port_state out of blocked.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Protect against overflow in dev_loss_tmo
  2010-03-18  4:24     ` Mike Christie
@ 2010-03-22 14:12       ` Christof Schmitt
  2010-03-24 16:02         ` Christof Schmitt
  0 siblings, 1 reply; 9+ messages in thread
From: Christof Schmitt @ 2010-03-22 14:12 UTC (permalink / raw)
  To: Mike Christie
  Cc: James Smart, Hannes Reinecke, James Bottomley,
	linux-scsi@vger.kernel.org

On Wed, Mar 17, 2010 at 11:24:52PM -0500, Mike Christie wrote:
> On 03/16/2010 08:04 AM, Christof Schmitt wrote:
> >On Tue, Mar 09, 2010 at 09:14:37AM -0500, James Smart wrote:
> >>I don't ever expect to see large dev_loss_tmo values, but the patch is fine.
> >>
> >>Acked-by: James Smart<james.smart@emulex.com>
> >>
> >>-- james s
> >>
> >>
> >>Hannes Reinecke wrote:
> >>>The rport structure defines dev_loss_tmo as u32, which is
> >>>later multiplied with HZ to get the actual timeout value.
> >>>This might overflow for large dev_loss_tmo values. So we
> >>>should be better using u64 as intermediate variables here
> >>>to protect against overflow.
> >>>
> >>>Signed-off-by: Hannes Reinecke<hare@suse.de>
> >[...]
> >
> >I guess this is the intended use to prevent the dev_loss_tmo from
> >removing the SCSI devices:
> >http://git.kernel.org/?p=linux/kernel/git/hare/multipath-tools.git;a=commitdiff;h=b9903e2e8a6cdc5042897719fbae6c9346820bbf;hp=ed1dc6164fe530d146cfe65d4f99e44ec9b54b95
> >
> >But does this raise the question again how to run SCSI EH with remote
> >port failures?
> >
> >The SCSI FC LLDs call fc_block_scsi_eh to wait until the fc_rport
> >leaves the state FC_PORTSTATE_BLOCKED. This effectively prevents SCSI
> >devices from being taken offline when there is a command timeout and
> >the fc_rport is BLOCKED. With the large dev_loss_tmo, the dev_loss_tmo
> >never expires and a problem with a single remote port can block the
> >host error handler.
> >
> 
> I did the attached patch for a problem where we were supposed to be
> getting a RSCN but we were not, so the scsi eh ended up running.
> Then we sat in the scsi eh for a long time. multipath was being
> used, so we wanted to fast fail, but the the fc block scsi eh helper
> only waits for the rport state to change.
> 
> It ended up the switch was busted and should have given a RSCN, but
> I think the patch is helpful for your problem where we fall into the
> long wait for legitimate reasons.
> 
> This patch works nicely when using multipath to get IO failed and
> hosts unblocked, but the problem is that it could offline the
> devices more quickly than what we are prepared for. We have
> something similar for iscsi, but the iscsi userspace daemon has a
> way to set the device back to running when it recovers. FC does not,
> and dm-multipath does not do this, so I am not pushing the patch.
> 
> I think if we can figure something out for that offline issue the
> patch could be helpful. I was thinking that if the the scsi eh
> failed because the fast io fail timer fired, then we could set the
> devices to a new state. The scsi_transport_fc and
> scsi_transport_iscsi code could use this new device state to
> indicate that the fast io fail timer has fired and we are failing
> IO, but the device is not in offline state and is not blocked.

Thanks for providing the information and ideas. I think the main point
is to get the information that I/O has been failed to the scsi eh.
What about introducing a return value to indicate that I/O has been
failed with terminate_rport_io? This is a first shot to implement this
for the abort callback, but it needs to be extended for the other
callbacks as well. Or should a new SCSI device state be preferred?

---
 drivers/scsi/scsi_error.c        |    5 +++--
 drivers/scsi/scsi_transport_fc.c |   20 +++++++++++++++-----
 include/scsi/scsi.h              |    1 +
 include/scsi/scsi_transport_fc.h |    2 +-
 4 files changed, 20 insertions(+), 8 deletions(-)

--- a/drivers/scsi/scsi_error.c	2009-12-03 17:41:27.000000000 +0100
+++ b/drivers/scsi/scsi_error.c	2010-03-22 14:58:07.000000000 +0100
@@ -956,10 +956,11 @@ static int scsi_eh_abort_cmds(struct lis
 						  "0x%p\n", current->comm,
 						  scmd));
 		rtn = scsi_try_to_abort_cmd(scmd);
-		if (rtn == SUCCESS) {
+		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
 			scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
 			if (!scsi_device_online(scmd->device) ||
-			    !scsi_eh_tur(scmd)) {
+			    !scsi_eh_tur(scmd) ||
+			    rtn == FAST_IO_FAIL) {
 				scsi_eh_finish_cmd(scmd, done_q);
 			}
 				
--- a/drivers/scsi/scsi_transport_fc.c	2010-02-26 12:59:27.000000000 +0100
+++ b/drivers/scsi/scsi_transport_fc.c	2010-03-22 14:56:28.000000000 +0100
@@ -3162,23 +3162,33 @@ fc_scsi_scan_rport(struct work_struct *w
  *
  * This routine can be called from a FC LLD scsi_eh callback. It
  * blocks the scsi_eh thread until the fc_rport leaves the
- * FC_PORTSTATE_BLOCKED. This is necessary to avoid the scsi_eh
- * failing recovery actions for blocked rports which would lead to
- * offlined SCSI devices.
+ * FC_PORTSTATE_BLOCKED, or the fast_io_fail_tmo fires. This is
+ * necessary to avoid the scsi_eh failing recovery actions for blocked
+ * rports which would lead to offlined SCSI devices.
+ *
+ * Returns: 0 if the fc_rport left the state FC_PORTSTATE_BLOCKED.
+ * 	    FAST_IO_FAIL if the fast_io_fail_tmo fired, this should be
+ * 	    passed back to scsi_eh.
  */
-void fc_block_scsi_eh(struct scsi_cmnd *cmnd)
+int fc_block_scsi_eh(struct scsi_cmnd *cmnd)
 {
 	struct Scsi_Host *shost = cmnd->device->host;
 	struct fc_rport *rport = starget_to_rport(scsi_target(cmnd->device));
 	unsigned long flags;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	while (rport->port_state == FC_PORTSTATE_BLOCKED) {
+	while (rport->port_state == FC_PORTSTATE_BLOCKED &&
+	       !(rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)) {
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		msleep(1000);
 		spin_lock_irqsave(shost->host_lock, flags);
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
+
+	if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
+		return FAST_IO_FAIL;
+
+	return 0;
 }
 EXPORT_SYMBOL(fc_block_scsi_eh);
 
--- a/include/scsi/scsi.h	2010-02-26 12:59:46.000000000 +0100
+++ b/include/scsi/scsi.h	2010-03-22 14:56:48.000000000 +0100
@@ -423,6 +423,7 @@ static inline int scsi_is_wlun(unsigned 
 #define ADD_TO_MLQUEUE  0x2006
 #define TIMEOUT_ERROR   0x2007
 #define SCSI_RETURN_NOT_HANDLED   0x2008
+#define FAST_IO_FAIL	0x2009
 
 /*
  * Midlevel queue return values.
--- a/include/scsi/scsi_transport_fc.h	2009-11-05 16:59:33.000000000 +0100
+++ b/include/scsi/scsi_transport_fc.h	2010-03-22 14:41:36.000000000 +0100
@@ -807,6 +807,6 @@ void fc_host_post_vendor_event(struct Sc
 struct fc_vport *fc_vport_create(struct Scsi_Host *shost, int channel,
 		struct fc_vport_identifiers *);
 int fc_vport_terminate(struct fc_vport *vport);
-void fc_block_scsi_eh(struct scsi_cmnd *cmnd);
+int fc_block_scsi_eh(struct scsi_cmnd *cmnd);
 
 #endif /* SCSI_TRANSPORT_FC_H */

The low-level driver simply passes the FAST_IO_FAIL from the eh callback
back to scsi eh:

---
 drivers/s390/scsi/zfcp_scsi.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/s390/scsi/zfcp_scsi.c	2010-03-02 14:02:37.000000000 +0100
+++ b/drivers/s390/scsi/zfcp_scsi.c	2010-03-22 14:50:31.000000000 +0100
@@ -174,7 +174,7 @@ static int zfcp_scsi_eh_abort_handler(st
 	struct zfcp_fsf_req *old_req, *abrt_req;
 	unsigned long flags;
 	unsigned long old_reqid = (unsigned long) scpnt->host_scribble;
-	int retval = SUCCESS;
+	int retval = SUCCESS, ret;
 	int retry = 3;
 	char *dbf_tag;
 
@@ -199,7 +199,9 @@ static int zfcp_scsi_eh_abort_handler(st
 			break;
 
 		zfcp_erp_wait(adapter);
-		fc_block_scsi_eh(scpnt);
+		ret = fc_block_scsi_eh(scpnt);
+		if (ret)
+			return ret;
 		if (!(atomic_read(&adapter->status) &
 		      ZFCP_STATUS_COMMON_RUNNING)) {
 			zfcp_dbf_scsi_abort("nres", adapter->dbf, scpnt, NULL,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Protect against overflow in dev_loss_tmo
  2010-03-22 14:12       ` Christof Schmitt
@ 2010-03-24 16:02         ` Christof Schmitt
  2010-03-26  9:50           ` Hannes Reinecke
  0 siblings, 1 reply; 9+ messages in thread
From: Christof Schmitt @ 2010-03-24 16:02 UTC (permalink / raw)
  To: Mike Christie
  Cc: James Smart, Hannes Reinecke, James Bottomley,
	linux-scsi@vger.kernel.org

On Mon, Mar 22, 2010 at 03:12:40PM +0100, Christof Schmitt wrote:
> On Wed, Mar 17, 2010 at 11:24:52PM -0500, Mike Christie wrote:
> > On 03/16/2010 08:04 AM, Christof Schmitt wrote:
> > >On Tue, Mar 09, 2010 at 09:14:37AM -0500, James Smart wrote:
> > >>I don't ever expect to see large dev_loss_tmo values, but the patch is fine.
> > >>
> > >>Acked-by: James Smart<james.smart@emulex.com>
> > >>
> > >>-- james s
> > >>
> > >>
> > >>Hannes Reinecke wrote:
> > >>>The rport structure defines dev_loss_tmo as u32, which is
> > >>>later multiplied with HZ to get the actual timeout value.
> > >>>This might overflow for large dev_loss_tmo values. So we
> > >>>should be better using u64 as intermediate variables here
> > >>>to protect against overflow.
> > >>>
> > >>>Signed-off-by: Hannes Reinecke<hare@suse.de>
> > >[...]
> > >
> > >I guess this is the intended use to prevent the dev_loss_tmo from
> > >removing the SCSI devices:
> > >http://git.kernel.org/?p=linux/kernel/git/hare/multipath-tools.git;a=commitdiff;h=b9903e2e8a6cdc5042897719fbae6c9346820bbf;hp=ed1dc6164fe530d146cfe65d4f99e44ec9b54b95
> > >
> > >But does this raise the question again how to run SCSI EH with remote
> > >port failures?
> > >
> > >The SCSI FC LLDs call fc_block_scsi_eh to wait until the fc_rport
> > >leaves the state FC_PORTSTATE_BLOCKED. This effectively prevents SCSI
> > >devices from being taken offline when there is a command timeout and
> > >the fc_rport is BLOCKED. With the large dev_loss_tmo, the dev_loss_tmo
> > >never expires and a problem with a single remote port can block the
> > >host error handler.
> > >
> > 
> > I did the attached patch for a problem where we were supposed to be
> > getting a RSCN but we were not, so the scsi eh ended up running.
> > Then we sat in the scsi eh for a long time. multipath was being
> > used, so we wanted to fast fail, but the the fc block scsi eh helper
> > only waits for the rport state to change.
> > 
> > It ended up the switch was busted and should have given a RSCN, but
> > I think the patch is helpful for your problem where we fall into the
> > long wait for legitimate reasons.
> > 
> > This patch works nicely when using multipath to get IO failed and
> > hosts unblocked, but the problem is that it could offline the
> > devices more quickly than what we are prepared for. We have
> > something similar for iscsi, but the iscsi userspace daemon has a
> > way to set the device back to running when it recovers. FC does not,
> > and dm-multipath does not do this, so I am not pushing the patch.
> > 
> > I think if we can figure something out for that offline issue the
> > patch could be helpful. I was thinking that if the the scsi eh
> > failed because the fast io fail timer fired, then we could set the
> > devices to a new state. The scsi_transport_fc and
> > scsi_transport_iscsi code could use this new device state to
> > indicate that the fast io fail timer has fired and we are failing
> > IO, but the device is not in offline state and is not blocked.
> 
> Thanks for providing the information and ideas. I think the main point
> is to get the information that I/O has been failed to the scsi eh.
> What about introducing a return value to indicate that I/O has been
> failed with terminate_rport_io? This is a first shot to implement this
> for the abort callback, but it needs to be extended for the other
> callbacks as well. Or should a new SCSI device state be preferred?
> 
> ---
>  drivers/scsi/scsi_error.c        |    5 +++--
>  drivers/scsi/scsi_transport_fc.c |   20 +++++++++++++++-----
>  include/scsi/scsi.h              |    1 +
>  include/scsi/scsi_transport_fc.h |    2 +-
>  4 files changed, 20 insertions(+), 8 deletions(-)
> 
> --- a/drivers/scsi/scsi_error.c	2009-12-03 17:41:27.000000000 +0100
> +++ b/drivers/scsi/scsi_error.c	2010-03-22 14:58:07.000000000 +0100
> @@ -956,10 +956,11 @@ static int scsi_eh_abort_cmds(struct lis
>  						  "0x%p\n", current->comm,
>  						  scmd));
>  		rtn = scsi_try_to_abort_cmd(scmd);
> -		if (rtn == SUCCESS) {
> +		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
>  			scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
>  			if (!scsi_device_online(scmd->device) ||
> -			    !scsi_eh_tur(scmd)) {
> +			    !scsi_eh_tur(scmd) ||
> +			    rtn == FAST_IO_FAIL) {
>  				scsi_eh_finish_cmd(scmd, done_q);
>  			}
>  				
> --- a/drivers/scsi/scsi_transport_fc.c	2010-02-26 12:59:27.000000000 +0100
> +++ b/drivers/scsi/scsi_transport_fc.c	2010-03-22 14:56:28.000000000 +0100
> @@ -3162,23 +3162,33 @@ fc_scsi_scan_rport(struct work_struct *w
>   *
>   * This routine can be called from a FC LLD scsi_eh callback. It
>   * blocks the scsi_eh thread until the fc_rport leaves the
> - * FC_PORTSTATE_BLOCKED. This is necessary to avoid the scsi_eh
> - * failing recovery actions for blocked rports which would lead to
> - * offlined SCSI devices.
> + * FC_PORTSTATE_BLOCKED, or the fast_io_fail_tmo fires. This is
> + * necessary to avoid the scsi_eh failing recovery actions for blocked
> + * rports which would lead to offlined SCSI devices.
> + *
> + * Returns: 0 if the fc_rport left the state FC_PORTSTATE_BLOCKED.
> + * 	    FAST_IO_FAIL if the fast_io_fail_tmo fired, this should be
> + * 	    passed back to scsi_eh.
>   */
> -void fc_block_scsi_eh(struct scsi_cmnd *cmnd)
> +int fc_block_scsi_eh(struct scsi_cmnd *cmnd)
>  {
>  	struct Scsi_Host *shost = cmnd->device->host;
>  	struct fc_rport *rport = starget_to_rport(scsi_target(cmnd->device));
>  	unsigned long flags;
> 
>  	spin_lock_irqsave(shost->host_lock, flags);
> -	while (rport->port_state == FC_PORTSTATE_BLOCKED) {
> +	while (rport->port_state == FC_PORTSTATE_BLOCKED &&
> +	       !(rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)) {
>  		spin_unlock_irqrestore(shost->host_lock, flags);
>  		msleep(1000);
>  		spin_lock_irqsave(shost->host_lock, flags);
>  	}
>  	spin_unlock_irqrestore(shost->host_lock, flags);
> +
> +	if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
> +		return FAST_IO_FAIL;
> +
> +	return 0;
>  }
>  EXPORT_SYMBOL(fc_block_scsi_eh);
> 
> --- a/include/scsi/scsi.h	2010-02-26 12:59:46.000000000 +0100
> +++ b/include/scsi/scsi.h	2010-03-22 14:56:48.000000000 +0100
> @@ -423,6 +423,7 @@ static inline int scsi_is_wlun(unsigned 
>  #define ADD_TO_MLQUEUE  0x2006
>  #define TIMEOUT_ERROR   0x2007
>  #define SCSI_RETURN_NOT_HANDLED   0x2008
> +#define FAST_IO_FAIL	0x2009
> 
>  /*
>   * Midlevel queue return values.
> --- a/include/scsi/scsi_transport_fc.h	2009-11-05 16:59:33.000000000 +0100
> +++ b/include/scsi/scsi_transport_fc.h	2010-03-22 14:41:36.000000000 +0100
> @@ -807,6 +807,6 @@ void fc_host_post_vendor_event(struct Sc
>  struct fc_vport *fc_vport_create(struct Scsi_Host *shost, int channel,
>  		struct fc_vport_identifiers *);
>  int fc_vport_terminate(struct fc_vport *vport);
> -void fc_block_scsi_eh(struct scsi_cmnd *cmnd);
> +int fc_block_scsi_eh(struct scsi_cmnd *cmnd);
> 
>  #endif /* SCSI_TRANSPORT_FC_H */
> 
> The low-level driver simply passes the FAST_IO_FAIL from the eh callback
> back to scsi eh:
> 
> ---
>  drivers/s390/scsi/zfcp_scsi.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> --- a/drivers/s390/scsi/zfcp_scsi.c	2010-03-02 14:02:37.000000000 +0100
> +++ b/drivers/s390/scsi/zfcp_scsi.c	2010-03-22 14:50:31.000000000 +0100
> @@ -174,7 +174,7 @@ static int zfcp_scsi_eh_abort_handler(st
>  	struct zfcp_fsf_req *old_req, *abrt_req;
>  	unsigned long flags;
>  	unsigned long old_reqid = (unsigned long) scpnt->host_scribble;
> -	int retval = SUCCESS;
> +	int retval = SUCCESS, ret;
>  	int retry = 3;
>  	char *dbf_tag;
> 
> @@ -199,7 +199,9 @@ static int zfcp_scsi_eh_abort_handler(st
>  			break;
> 
>  		zfcp_erp_wait(adapter);
> -		fc_block_scsi_eh(scpnt);
> +		ret = fc_block_scsi_eh(scpnt);
> +		if (ret)
> +			return ret;
>  		if (!(atomic_read(&adapter->status) &
>  		      ZFCP_STATUS_COMMON_RUNNING)) {
>  			zfcp_dbf_scsi_abort("nres", adapter->dbf, scpnt, NULL,

I just posted the complete version of the patch and the adaption of
zfcp:
http://marc.info/?l=linux-scsi&m=126944631211269&w=2
http://marc.info/?l=linux-scsi&m=126944631211272&w=2
http://marc.info/?l=linux-scsi&m=126944631611276&w=2

Any comments about this approach for unblocking the scsi eh thread?

--
Christof

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Protect against overflow in dev_loss_tmo
  2010-03-24 16:02         ` Christof Schmitt
@ 2010-03-26  9:50           ` Hannes Reinecke
  0 siblings, 0 replies; 9+ messages in thread
From: Hannes Reinecke @ 2010-03-26  9:50 UTC (permalink / raw)
  To: Christof Schmitt
  Cc: Mike Christie, James Smart, James Bottomley,
	linux-scsi@vger.kernel.org

Christof Schmitt wrote:
[ .. ]
> 
> I just posted the complete version of the patch and the adaption of
> zfcp:
> http://marc.info/?l=linux-scsi&m=126944631211269&w=2
> http://marc.info/?l=linux-scsi&m=126944631211272&w=2
> http://marc.info/?l=linux-scsi&m=126944631611276&w=2
> 
> Any comments about this approach for unblocking the scsi eh thread?
> 
That looks like a really good idea. And it might even solve the problem
we had with path flapping on a multipathed raid device.
I'll give it a go here locally.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-03-26  9:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-09  9:18 [PATCH] Protect against overflow in dev_loss_tmo Hannes Reinecke
2010-03-09 14:14 ` James Smart
2010-03-16 13:04   ` Christof Schmitt
2010-03-17  9:40     ` Hannes Reinecke
2010-03-18  4:33       ` Mike Christie
2010-03-18  4:24     ` Mike Christie
2010-03-22 14:12       ` Christof Schmitt
2010-03-24 16:02         ` Christof Schmitt
2010-03-26  9:50           ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).