All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset
@ 2012-12-07 14:51 Hannes Reinecke
  2012-12-07 18:58 ` Mike Christie
  2012-12-10  2:27 ` Michael Christie
  0 siblings, 2 replies; 11+ messages in thread
From: Hannes Reinecke @ 2012-12-07 14:51 UTC (permalink / raw)
  To: linux-scsi
  Cc: Hannes Reinecke, Mike Christie, James Smart, Andrew Vasquez,
	Chad Dupuis, James Bottomley

'Bus reset' is not really applicable to FibreChannel, as
the concept of a bus doesn't apply. Hence all FC LLDD
simulate a 'bus reset' by sending a target reset to each
attached remote port, causing error handling to spill
over to unaffected devices.

This patch implements a 'I_T nexus reset' handler,
which attempts to reset the I_T nexus to the remote
port. This way only the affected remote ports are
reset; other ports are left untouched.

I_T nexus reset is done by invoking the dev_loss_tmo
mechanism with a '0' fast fail timeout. This causes
any outstanding I/O to be aborted immediately.
The port is then set to 'blocked' to indicate that
no further I/O should be issued to this port.
The standard dev_loss_tmo mechanism is then
invoked to clear up any outstanding resources.

In my test this patch cuts down the total time
for recovery from 100 secs to 60 secs. And,
of course, with no interruption to the other
remote ports.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Smart <james.smart@emulex.com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: James Bottomley <jbottomley@parallels.com>
---
 drivers/scsi/bfa/bfad_im.c       |    4 +-
 drivers/scsi/lpfc/lpfc_scsi.c    |    4 +-
 drivers/scsi/qla2xxx/qla_os.c    |    2 +-
 drivers/scsi/scsi_transport_fc.c |  146 ++++++++++++++++++++++++--------------
 include/scsi/scsi_transport_fc.h |    1 +
 5 files changed, 99 insertions(+), 58 deletions(-)

diff --git a/drivers/scsi/bfa/bfad_im.c b/drivers/scsi/bfa/bfad_im.c
index 8f92732..d6555aa 100644
--- a/drivers/scsi/bfa/bfad_im.c
+++ b/drivers/scsi/bfa/bfad_im.c
@@ -793,7 +793,7 @@ struct scsi_host_template bfad_im_scsi_host_template = {
 	.queuecommand = bfad_im_queuecommand,
 	.eh_abort_handler = bfad_im_abort_handler,
 	.eh_device_reset_handler = bfad_im_reset_lun_handler,
-	.eh_bus_reset_handler = bfad_im_reset_bus_handler,
+	.eh_bus_reset_handler = fc_eh_reset_it_nexus_handler,
 
 	.slave_alloc = bfad_im_slave_alloc,
 	.slave_configure = bfad_im_slave_configure,
@@ -815,7 +815,7 @@ struct scsi_host_template bfad_im_vport_template = {
 	.queuecommand = bfad_im_queuecommand,
 	.eh_abort_handler = bfad_im_abort_handler,
 	.eh_device_reset_handler = bfad_im_reset_lun_handler,
-	.eh_bus_reset_handler = bfad_im_reset_bus_handler,
+	.eh_bus_reset_handler = fc_eh_reset_it_nexus_handler,
 
 	.slave_alloc = bfad_im_slave_alloc,
 	.slave_configure = bfad_im_slave_configure,
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 60e5a17..2fd67c1 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -5136,7 +5136,7 @@ struct scsi_host_template lpfc_template = {
 	.eh_abort_handler	= lpfc_abort_handler,
 	.eh_device_reset_handler = lpfc_device_reset_handler,
 	.eh_target_reset_handler = lpfc_target_reset_handler,
-	.eh_bus_reset_handler	= lpfc_bus_reset_handler,
+	.eh_bus_reset_handler	= fc_eh_reset_it_nexus_handler,
 	.eh_host_reset_handler  = lpfc_host_reset_handler,
 	.slave_alloc		= lpfc_slave_alloc,
 	.slave_configure	= lpfc_slave_configure,
@@ -5160,7 +5160,7 @@ struct scsi_host_template lpfc_vport_template = {
 	.eh_abort_handler	= lpfc_abort_handler,
 	.eh_device_reset_handler = lpfc_device_reset_handler,
 	.eh_target_reset_handler = lpfc_target_reset_handler,
-	.eh_bus_reset_handler	= lpfc_bus_reset_handler,
+	.eh_bus_reset_handler	= fc_eh_reset_it_nexus_handler,
 	.slave_alloc		= lpfc_slave_alloc,
 	.slave_configure	= lpfc_slave_configure,
 	.slave_destroy		= lpfc_slave_destroy,
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 3a1661c..5d59284 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -246,7 +246,7 @@ struct scsi_host_template qla2xxx_driver_template = {
 	.eh_abort_handler	= qla2xxx_eh_abort,
 	.eh_device_reset_handler = qla2xxx_eh_device_reset,
 	.eh_target_reset_handler = qla2xxx_eh_target_reset,
-	.eh_bus_reset_handler	= qla2xxx_eh_bus_reset,
+	.eh_bus_reset_handler	= fc_eh_reset_it_nexus_handler,
 	.eh_host_reset_handler	= qla2xxx_eh_host_reset,
 
 	.slave_configure	= qla2xxx_slave_configure,
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index e894ca7..e1da601 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2920,6 +2920,62 @@ fc_remote_port_add(struct Scsi_Host *shost, int channel,
 EXPORT_SYMBOL(fc_remote_port_add);
 
 
+void
+__fc_remote_port_delete(struct fc_rport *rport, int fast_io_fail_tmo)
+{
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	unsigned long timeout = rport->dev_loss_tmo;
+	unsigned long flags;
+
+	/*
+	 * No need to flush the fc_host work_q's, as all adds are synchronous.
+	 *
+	 * We do need to reclaim the rport scan work element, so eventually
+	 * (in fc_rport_final_delete()) we'll flush the scsi host work_q if
+	 * there's still a scan pending.
+	 */
+
+	spin_lock_irqsave(shost->host_lock, flags);
+
+	if (rport->port_state != FC_PORTSTATE_ONLINE) {
+		spin_unlock_irqrestore(shost->host_lock, flags);
+		return;
+	}
+
+	/*
+	 * In the past, we if this was not an FCP-Target, we would
+	 * unconditionally just jump to deleting the rport.
+	 * However, rports can be used as node containers by the LLDD,
+	 * and its not appropriate to just terminate the rport at the
+	 * first sign of a loss in connectivity. The LLDD may want to
+	 * send ELS traffic to re-validate the login. If the rport is
+	 * immediately deleted, it makes it inappropriate for a node
+	 * container.
+	 * So... we now unconditionally wait dev_loss_tmo before
+	 * destroying an rport.
+	 */
+
+	rport->port_state = FC_PORTSTATE_BLOCKED;
+
+	rport->flags |= FC_RPORT_DEVLOSS_PENDING;
+
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
+	if (rport->roles & FC_PORT_ROLE_FCP_INITIATOR &&
+	    shost->active_mode & MODE_TARGET)
+		fc_tgt_it_nexus_destroy(shost, (unsigned long)rport);
+
+	scsi_target_block(&rport->dev);
+
+	/* see if we need to kill io faster than waiting for device loss */
+	if ((fast_io_fail_tmo != -1) && (fast_io_fail_tmo < timeout))
+		fc_queue_devloss_work(shost, &rport->fail_io_work,
+					fast_io_fail_tmo * HZ);
+
+	/* cap the length the devices can be blocked until they are deleted */
+	fc_queue_devloss_work(shost, &rport->dev_loss_work, timeout * HZ);
+}
+
 /**
  * fc_remote_port_delete - notifies the fc transport that a remote port is no longer in existence.
  * @rport:	The remote port that no longer exists
@@ -2973,58 +3029,7 @@ EXPORT_SYMBOL(fc_remote_port_add);
 void
 fc_remote_port_delete(struct fc_rport  *rport)
 {
-	struct Scsi_Host *shost = rport_to_shost(rport);
-	unsigned long timeout = rport->dev_loss_tmo;
-	unsigned long flags;
-
-	/*
-	 * No need to flush the fc_host work_q's, as all adds are synchronous.
-	 *
-	 * We do need to reclaim the rport scan work element, so eventually
-	 * (in fc_rport_final_delete()) we'll flush the scsi host work_q if
-	 * there's still a scan pending.
-	 */
-
-	spin_lock_irqsave(shost->host_lock, flags);
-
-	if (rport->port_state != FC_PORTSTATE_ONLINE) {
-		spin_unlock_irqrestore(shost->host_lock, flags);
-		return;
-	}
-
-	/*
-	 * In the past, we if this was not an FCP-Target, we would
-	 * unconditionally just jump to deleting the rport.
-	 * However, rports can be used as node containers by the LLDD,
-	 * and its not appropriate to just terminate the rport at the
-	 * first sign of a loss in connectivity. The LLDD may want to
-	 * send ELS traffic to re-validate the login. If the rport is
-	 * immediately deleted, it makes it inappropriate for a node
-	 * container.
-	 * So... we now unconditionally wait dev_loss_tmo before
-	 * destroying an rport.
-	 */
-
-	rport->port_state = FC_PORTSTATE_BLOCKED;
-
-	rport->flags |= FC_RPORT_DEVLOSS_PENDING;
-
-	spin_unlock_irqrestore(shost->host_lock, flags);
-
-	if (rport->roles & FC_PORT_ROLE_FCP_INITIATOR &&
-	    shost->active_mode & MODE_TARGET)
-		fc_tgt_it_nexus_destroy(shost, (unsigned long)rport);
-
-	scsi_target_block(&rport->dev);
-
-	/* see if we need to kill io faster than waiting for device loss */
-	if ((rport->fast_io_fail_tmo != -1) &&
-	    (rport->fast_io_fail_tmo < timeout))
-		fc_queue_devloss_work(shost, &rport->fail_io_work,
-					rport->fast_io_fail_tmo * HZ);
-
-	/* cap the length the devices can be blocked until they are deleted */
-	fc_queue_devloss_work(shost, &rport->dev_loss_work, timeout * HZ);
+	__fc_remote_port_delete(rport, rport->fast_io_fail_tmo);
 }
 EXPORT_SYMBOL(fc_remote_port_delete);
 
@@ -3266,8 +3271,8 @@ fc_timeout_fail_rport_io(struct work_struct *work)
 	if (rport->port_state != FC_PORTSTATE_BLOCKED)
 		return;
 
-	rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
 	fc_terminate_rport_io(rport);
+	rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
 }
 
 /**
@@ -3332,6 +3337,41 @@ int fc_block_scsi_eh(struct scsi_cmnd *cmnd)
 EXPORT_SYMBOL(fc_block_scsi_eh);
 
 /**
+ * fc_eh_reset_it_nexus_handler - Reset I_T nexus
+ * @cmnd: SCSI command that scsi_eh is trying to recover
+ *
+ * This routine can be called from a FC LLD scsi_eh callback. It
+ * attempts to perform an REMOVE I_T NEXUS transport management
+ * function by failing all outstanding commands and invoke
+ * dev_loss_tmo() on the affected port.
+ *
+ * Returns: SUCCESS if all commands on the remote port have been
+ *	    terminated or the port is in PORTSTATE_ONLINE again
+ *	    FAST_IO_FAIL if the fast_io_fail_tmo fired and there
+ *	    is still I/O in flight
+ *	    FAILED otherwise.
+ */
+int
+fc_eh_reset_it_nexus_handler(struct scsi_cmnd *cmnd)
+{
+	struct scsi_target *starget = scsi_target(cmnd->device);
+	struct fc_rport *rport = starget_to_rport(starget);
+	int ret;
+
+	__fc_remote_port_delete(rport, 0);
+	ret = fc_block_scsi_eh(cmnd);
+	if (ret != FAST_IO_FAIL) {
+		if (rport->port_state == FC_PORTSTATE_ONLINE)
+			ret = SUCCESS;
+		else
+			ret = FAILED;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(fc_eh_reset_it_nexus_handler);
+
+/**
  * fc_vport_setup - allocates and creates a FC virtual port.
  * @shost:	scsi host the virtual port is connected to.
  * @channel:	Channel on shost port connected to.
diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
index b797e8f..f884305 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -851,5 +851,6 @@ struct fc_vport *fc_vport_create(struct Scsi_Host *shost, int channel,
 		struct fc_vport_identifiers *);
 int fc_vport_terminate(struct fc_vport *vport);
 int fc_block_scsi_eh(struct scsi_cmnd *cmnd);
+int fc_eh_reset_it_nexus_handler(struct scsi_cmnd *cmnd);
 
 #endif /* SCSI_TRANSPORT_FC_H */
-- 
1.7.4.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-07 14:51 [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset Hannes Reinecke
@ 2012-12-07 18:58 ` Mike Christie
  2012-12-07 19:58   ` Chad Dupuis
  2012-12-09 15:40   ` Hannes Reinecke
  2012-12-10  2:27 ` Michael Christie
  1 sibling, 2 replies; 11+ messages in thread
From: Mike Christie @ 2012-12-07 18:58 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, James Smart, Andrew Vasquez, Chad Dupuis,
	James Bottomley

On 12/07/2012 08:51 AM, Hannes Reinecke wrote:
> 'Bus reset' is not really applicable to FibreChannel, as
> the concept of a bus doesn't apply. Hence all FC LLDD
> simulate a 'bus reset' by sending a target reset to each
> attached remote port, causing error handling to spill
> over to unaffected devices.
> 
> This patch implements a 'I_T nexus reset' handler,
> which attempts to reset the I_T nexus to the remote
> port. This way only the affected remote ports are
> reset; other ports are left untouched.

Is the I_T nexus reset we are doing in this patch supposed to be the
same one defined in SAM? Was the I_T nexus reset TMF added to SAM at the
same time the target reset one was removed? In SAM 4 and 5 there is no
target reset anymore is there?

I think we should just kill the bus reset use from the FC drivers. Add a
new I_T nexus reset callout to the scsi_host_template or to the
scsi_transport_template. Then have scsi-ml call just either target reset
eh callout or I_T nexus eh reset callout depending on what the target
supports.

To figure out what the target supports could we do a REPORTED SUPPORTED
TASK MANAGEMENT FUNCTION command. If the target supports that command
and reports that the target supports the I_T nexus reset TMF then call
that eh callback, else drop down to older target reset eh callback.

It seems that if we do I_T nexus reset we do not need to also do a
target reset do we?



> @@ -3266,8 +3271,8 @@ fc_timeout_fail_rport_io(struct work_struct *work)
>  	if (rport->port_state != FC_PORTSTATE_BLOCKED)
>  		return;
>  
> -	rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
>  	fc_terminate_rport_io(rport);
> +	rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
>  }
>  

What was the reason for moving this? For the eh case in this patch was
it causing IO to be failed with DID_TRANSPORT_FAILFAST when we wanted it
failed with some other error.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-07 18:58 ` Mike Christie
@ 2012-12-07 19:58   ` Chad Dupuis
  2012-12-07 21:05     ` Jeremy Linton
  2012-12-10 10:18     ` Hannes Reinecke
  2012-12-09 15:40   ` Hannes Reinecke
  1 sibling, 2 replies; 11+ messages in thread
From: Chad Dupuis @ 2012-12-07 19:58 UTC (permalink / raw)
  To: Mike Christie
  Cc: Hannes Reinecke, linux-scsi@vger.kernel.org, James Smart,
	Andrew Vasquez, James Bottomley



On Fri, 7 Dec 2012, Mike Christie wrote:

> On 12/07/2012 08:51 AM, Hannes Reinecke wrote:
>> 'Bus reset' is not really applicable to FibreChannel, as
>> the concept of a bus doesn't apply. Hence all FC LLDD
>> simulate a 'bus reset' by sending a target reset to each
>> attached remote port, causing error handling to spill
>> over to unaffected devices.
>>
>> This patch implements a 'I_T nexus reset' handler,
>> which attempts to reset the I_T nexus to the remote
>> port. This way only the affected remote ports are
>> reset; other ports are left untouched.
>
> Is the I_T nexus reset we are doing in this patch supposed to be the
> same one defined in SAM? Was the I_T nexus reset TMF added to SAM at the
> same time the target reset one was removed? In SAM 4 and 5 there is no
> target reset anymore is there?
>
> I think we should just kill the bus reset use from the FC drivers. Add a
> new I_T nexus reset callout to the scsi_host_template or to the
> scsi_transport_template. Then have scsi-ml call just either target reset
> eh callout or I_T nexus eh reset callout depending on what the target
> supports.
>
> To figure out what the target supports could we do a REPORTED SUPPORTED
> TASK MANAGEMENT FUNCTION command. If the target supports that command
> and reports that the target supports the I_T nexus reset TMF then call
> that eh callback, else drop down to older target reset eh callback.
>
> It seems that if we do I_T nexus reset we do not need to also do a
> target reset do we?
>

It would be good to have a specific I_T nexus reset callout in either the
host or transport so we can clean things up.

Currently in the bus reset handler after we perform all the
target resets we wait for all the associated DMA activity to clear up
before we return from the bus_reset handler.  It would be good to have
the same capability in a new I_T nexus reset handler as well.

Also, are there certain port types we wouldn't want to send this reset to
such as tape?

>
>
>> @@ -3266,8 +3271,8 @@ fc_timeout_fail_rport_io(struct work_struct *work)
>>      if (rport->port_state != FC_PORTSTATE_BLOCKED)
>>              return;
>>
>> -    rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
>>      fc_terminate_rport_io(rport);
>> +    rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
>>  }
>>
>
> What was the reason for moving this? For the eh case in this patch was
> it causing IO to be failed with DID_TRANSPORT_FAILFAST when we wanted it
> failed with some other error.
>
>
>

________________________________

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-07 19:58   ` Chad Dupuis
@ 2012-12-07 21:05     ` Jeremy Linton
  2012-12-07 21:20       ` Mike Christie
  2012-12-10 10:18     ` Hannes Reinecke
  1 sibling, 1 reply; 11+ messages in thread
From: Jeremy Linton @ 2012-12-07 21:05 UTC (permalink / raw)
  Cc: chad.dupuis, Hannes Reinecke, linux-scsi@vger.kernel.org

On 12/7/2012 1:58 PM, Chad Dupuis wrote:
> Also, are there certain port types we wouldn't want to send this reset to 
> such as tape?

	AFAIK, for tape it shouldn't cause any more harm than the target reset which
happens immediately before it. This patch is infinitely better than the
previous "bus" reset behavior.

	That said, its far from perfect. The code (as I understand it) isn't
differentiating between isolating the failure, or bringing out the big
hammer in an attempt to correct problems on a specific I_T_L. If you
drop/reset the I_T because one of the LUN's is misbehaving before verifying
the status of other LUN's on the target, you risk interrupting operations to
functional devices.







^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-07 21:05     ` Jeremy Linton
@ 2012-12-07 21:20       ` Mike Christie
  2012-12-07 22:33         ` Jeremy Linton
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Christie @ 2012-12-07 21:20 UTC (permalink / raw)
  To: Jeremy Linton; +Cc: chad.dupuis, Hannes Reinecke, linux-scsi@vger.kernel.org

On 12/07/2012 03:05 PM, Jeremy Linton wrote:
> On 12/7/2012 1:58 PM, Chad Dupuis wrote:
>> Also, are there certain port types we wouldn't want to send this reset to 
>> such as tape?
> 
> 	AFAIK, for tape it shouldn't cause any more harm than the target reset which
> happens immediately before it. This patch is infinitely better than the
> previous "bus" reset behavior.
> 
> 	That said, its far from perfect. The code (as I understand it) isn't
> differentiating between isolating the failure, or bringing out the big
> hammer in an attempt to correct problems on a specific I_T_L. If you
> drop/reset the I_T because one of the LUN's is misbehaving before verifying
> the status of other LUN's on the target, you risk interrupting operations to
> functional devices.
> 

When this code is called the scsi eh has run the abort handler for each
outstanding command and that has failed, and it has run the lun/device
reset handler and that has failed (or the eh operations succeeded but
the TUR checkup the scsi eh does failed).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-07 21:20       ` Mike Christie
@ 2012-12-07 22:33         ` Jeremy Linton
  0 siblings, 0 replies; 11+ messages in thread
From: Jeremy Linton @ 2012-12-07 22:33 UTC (permalink / raw)
  To: Mike Christie; +Cc: linux-scsi@vger.kernel.org

On 12/7/2012 3:20 PM, Mike Christie wrote:
> On 12/07/2012 03:05 PM, Jeremy Linton wrote:
>> That said, its far from perfect. The code (as I understand it) isn't 
>> differentiating between isolating the failure, or bringing out the big 
>> hammer in an attempt to correct problems on a specific I_T_L. If you 
>> drop/reset the I_T because one of the LUN's is misbehaving before
>> verifying the status of other LUN's on the target, you risk interrupting
>> operations to functional devices.
> 
> When this code is called the scsi eh has run the abort handler for each 
> outstanding command and that has failed, and it has run the lun/device 
> reset handler and that has failed (or the eh operations succeeded but the
> TUR checkup the scsi eh does failed).

	I think my issue with the error handler (rather than this patch in
particular) surrounds the fact that when scsi_eh_bus_device_reset (which maps
to lun reset) fails, it falls to scsi_eh_target_reset which issues a TARGET
RESET which then broadens the problem to devices which may be working fine,
and just happen to be on the same I_T.

	I think there should be some attempt to determine if there are other devices on
the I_T, and whether they have failed before going into target_reset. It looks
like there may have been a plan to do that in bus_device_reset, but it doesn't
appear to be complete.


	Now, all that said, I have a few things I wonder about in the
eh_bus_device_reset code. For one the use of TUR rather than a command with a
more straightforward return status like INQUIRY which also preserves the check
conditions.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-07 18:58 ` Mike Christie
  2012-12-07 19:58   ` Chad Dupuis
@ 2012-12-09 15:40   ` Hannes Reinecke
  2012-12-09 23:19     ` Mike Christie
  1 sibling, 1 reply; 11+ messages in thread
From: Hannes Reinecke @ 2012-12-09 15:40 UTC (permalink / raw)
  To: Mike Christie
  Cc: linux-scsi, James Smart, Andrew Vasquez, Chad Dupuis,
	James Bottomley

Am 12/07/2012 07:58 PM, schrieb Mike Christie:
> On 12/07/2012 08:51 AM, Hannes Reinecke wrote:
>> 'Bus reset' is not really applicable to FibreChannel, as
>> the concept of a bus doesn't apply. Hence all FC LLDD
>> simulate a 'bus reset' by sending a target reset to each
>> attached remote port, causing error handling to spill
>> over to unaffected devices.
>>
>> This patch implements a 'I_T nexus reset' handler,
>> which attempts to reset the I_T nexus to the remote
>> port. This way only the affected remote ports are
>> reset; other ports are left untouched.
>
> Is the I_T nexus reset we are doing in this patch supposed to be the
> same one defined in SAM? Was the I_T nexus reset TMF added to SAM at the
> same time the target reset one was removed? In SAM 4 and 5 there is no
> target reset anymore is there?
>
> I think we should just kill the bus reset use from the FC drivers. Add a
> new I_T nexus reset callout to the scsi_host_template or to the
> scsi_transport_template. Then have scsi-ml call just either target reset
> eh callout or I_T nexus eh reset callout depending on what the target
> supports.
>
> To figure out what the target supports could we do a REPORTED SUPPORTED
> TASK MANAGEMENT FUNCTION command. If the target supports that command
> and reports that the target supports the I_T nexus reset TMF then call
> that eh callback, else drop down to older target reset eh callback.
>
> It seems that if we do I_T nexus reset we do not need to also do a
> target reset do we?
>
Hmm. I would rather check the actual LLDDs if they do anything sensible 
for target reset.
If not we sure can remove it.

>
>
>> @@ -3266,8 +3271,8 @@ fc_timeout_fail_rport_io(struct work_struct *work)
>>   	if (rport->port_state != FC_PORTSTATE_BLOCKED)
>>   		return;
>>
>> -	rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
>>   	fc_terminate_rport_io(rport);
>> +	rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
>>   }
>>
>
> What was the reason for moving this? For the eh case in this patch was
> it causing IO to be failed with DID_TRANSPORT_FAILFAST when we wanted it
> failed with some other error.
>
I wanted to ensure that fc_terminate_rport_io() was run when checking 
FC_RPORT_FAST_FAIL_TIMEOUT.
Without the move there is a race window between clearing the flag and
calling fc_terminate_rport_io(), which one might trigger by just 
checking the flag.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-09 15:40   ` Hannes Reinecke
@ 2012-12-09 23:19     ` Mike Christie
  2012-12-09 23:31       ` Mike Christie
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Christie @ 2012-12-09 23:19 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, James Smart, Andrew Vasquez, Chad Dupuis,
	James Bottomley

On 12/09/2012 09:40 AM, Hannes Reinecke wrote:
> Am 12/07/2012 07:58 PM, schrieb Mike Christie:
>> On 12/07/2012 08:51 AM, Hannes Reinecke wrote:
>>> 'Bus reset' is not really applicable to FibreChannel, as
>>> the concept of a bus doesn't apply. Hence all FC LLDD
>>> simulate a 'bus reset' by sending a target reset to each
>>> attached remote port, causing error handling to spill
>>> over to unaffected devices.
>>>
>>> This patch implements a 'I_T nexus reset' handler,
>>> which attempts to reset the I_T nexus to the remote
>>> port. This way only the affected remote ports are
>>> reset; other ports are left untouched.
>>
>> Is the I_T nexus reset we are doing in this patch supposed to be the
>> same one defined in SAM? Was the I_T nexus reset TMF added to SAM at the
>> same time the target reset one was removed? In SAM 4 and 5 there is no
>> target reset anymore is there?
>>
>> I think we should just kill the bus reset use from the FC drivers. Add a
>> new I_T nexus reset callout to the scsi_host_template or to the
>> scsi_transport_template. Then have scsi-ml call just either target reset
>> eh callout or I_T nexus eh reset callout depending on what the target
>> supports.
>>
>> To figure out what the target supports could we do a REPORTED SUPPORTED
>> TASK MANAGEMENT FUNCTION command. If the target supports that command
>> and reports that the target supports the I_T nexus reset TMF then call
>> that eh callback, else drop down to older target reset eh callback.
>>
>> It seems that if we do I_T nexus reset we do not need to also do a
>> target reset do we?
>>
> Hmm. I would rather check the actual LLDDs if they do anything sensible
> for target reset.
> If not we sure can remove it.


I am not suggesting to remove the target reset support completely. I am
just asking if we want to only do one or the other depending on what the
target supports. I was saying it is not clear to me why we need to do
both when in SAM 4 and 5 target reset TMF support looks like it is removed.



> 
>>
>>
>>> @@ -3266,8 +3271,8 @@ fc_timeout_fail_rport_io(struct work_struct *work)
>>>       if (rport->port_state != FC_PORTSTATE_BLOCKED)
>>>           return;
>>>
>>> -    rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
>>>       fc_terminate_rport_io(rport);
>>> +    rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
>>>   }
>>>
>>
>> What was the reason for moving this? For the eh case in this patch was
>> it causing IO to be failed with DID_TRANSPORT_FAILFAST when we wanted it
>> failed with some other error.
>>
> I wanted to ensure that fc_terminate_rport_io() was run when checking
> FC_RPORT_FAST_FAIL_TIMEOUT.
> Without the move there is a race window between clearing the flag and
> calling fc_terminate_rport_io(), which one might trigger by just
> checking the flag.
> 

What code is this? I am not sure what you mean. fc_terminate_rport_io is
always going to get run. There does not seem to be checks in it for
FC_RPORT_FAST_FAIL_TIMEOUT.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-09 23:19     ` Mike Christie
@ 2012-12-09 23:31       ` Mike Christie
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Christie @ 2012-12-09 23:31 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, James Smart, Andrew Vasquez, Chad Dupuis,
	James Bottomley

On 12/09/2012 05:19 PM, Mike Christie wrote:
> 
>> > 
>>> >>
>>> >>
>>>> >>> @@ -3266,8 +3271,8 @@ fc_timeout_fail_rport_io(struct work_struct *work)
>>>> >>>       if (rport->port_state != FC_PORTSTATE_BLOCKED)
>>>> >>>           return;
>>>> >>>
>>>> >>> -    rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
>>>> >>>       fc_terminate_rport_io(rport);
>>>> >>> +    rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
>>>> >>>   }
>>>> >>>
>>> >>
>>> >> What was the reason for moving this? For the eh case in this patch was
>>> >> it causing IO to be failed with DID_TRANSPORT_FAILFAST when we wanted it
>>> >> failed with some other error.
>>> >>
>> > I wanted to ensure that fc_terminate_rport_io() was run when checking
>> > FC_RPORT_FAST_FAIL_TIMEOUT.
>> > Without the move there is a race window between clearing the flag and
>> > calling fc_terminate_rport_io(), which one might trigger by just
>> > checking the flag.
>> > 
> What code is this? I am not sure what you mean. fc_terminate_rport_io is
> always going to get run. There does not seem to be checks in it for
> FC_RPORT_FAST_FAIL_TIMEOUT.

Ignore this. I misread your mail. I see what you meant.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-07 14:51 [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset Hannes Reinecke
  2012-12-07 18:58 ` Mike Christie
@ 2012-12-10  2:27 ` Michael Christie
  1 sibling, 0 replies; 11+ messages in thread
From: Michael Christie @ 2012-12-10  2:27 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, James Smart, Andrew Vasquez, Chad Dupuis,
	James Bottomley


On Dec 7, 2012, at 8:51 AM, Hannes Reinecke <hare@suse.de> wrote:
> 
> diff --git a/drivers/scsi/bfa/bfad_im.c b/drivers/scsi/bfa/bfad_im.c
> index 8f92732..d6555aa 100644
> --- a/drivers/scsi/bfa/bfad_im.c
> +++ b/drivers/scsi/bfa/bfad_im.c
> @@ -793,7 +793,7 @@ struct scsi_host_template bfad_im_scsi_host_template = {
> 	.queuecommand = bfad_im_queuecommand,
> 	.eh_abort_handler = bfad_im_abort_handler,
> 	.eh_device_reset_handler = bfad_im_reset_lun_handler,
> -	.eh_bus_reset_handler = bfad_im_reset_bus_handler,
> +	.eh_bus_reset_handler = fc_eh_reset_it_nexus_handler,
> 
> 	.slave_alloc = bfad_im_slave_alloc,
> 	.slave_configure = bfad_im_slave_configure,
> @@ -815,7 +815,7 @@ struct scsi_host_template bfad_im_vport_template = {
> 	.queuecommand = bfad_im_queuecommand,
> 	.eh_abort_handler = bfad_im_abort_handler,
> 	.eh_device_reset_handler = bfad_im_reset_lun_handler,
> -	.eh_bus_reset_handler = bfad_im_reset_bus_handler,
> +	.eh_bus_reset_handler = fc_eh_reset_it_nexus_handler,
> 
> 	.slave_alloc = bfad_im_slave_alloc,
> 	.slave_configure = bfad_im_slave_configure,
> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
> index 60e5a17..2fd67c1 100644
> --- a/drivers/scsi/lpfc/lpfc_scsi.c
> +++ b/drivers/scsi/lpfc/lpfc_scsi.c
> @@ -5136,7 +5136,7 @@ struct scsi_host_template lpfc_template = {
> 	.eh_abort_handler	= lpfc_abort_handler,
> 	.eh_device_reset_handler = lpfc_device_reset_handler,
> 	.eh_target_reset_handler = lpfc_target_reset_handler,
> -	.eh_bus_reset_handler	= lpfc_bus_reset_handler,
> +	.eh_bus_reset_handler	= fc_eh_reset_it_nexus_handler,
> 	.eh_host_reset_handler  = lpfc_host_reset_handler,
> 	.slave_alloc		= lpfc_slave_alloc,
> 	.slave_configure	= lpfc_slave_configure,
> @@ -5160,7 +5160,7 @@ struct scsi_host_template lpfc_vport_template = {
> 	.eh_abort_handler	= lpfc_abort_handler,
> 	.eh_device_reset_handler = lpfc_device_reset_handler,
> 	.eh_target_reset_handler = lpfc_target_reset_handler,
> -	.eh_bus_reset_handler	= lpfc_bus_reset_handler,
> +	.eh_bus_reset_handler	= fc_eh_reset_it_nexus_handler,
> 	.slave_alloc		= lpfc_slave_alloc,
> 	.slave_configure	= lpfc_slave_configure,
> 	.slave_destroy		= lpfc_slave_destroy,
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 3a1661c..5d59284 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -246,7 +246,7 @@ struct scsi_host_template qla2xxx_driver_template = {
> 	.eh_abort_handler	= qla2xxx_eh_abort,
> 	.eh_device_reset_handler = qla2xxx_eh_device_reset,
> 	.eh_target_reset_handler = qla2xxx_eh_target_reset,
> -	.eh_bus_reset_handler	= qla2xxx_eh_bus_reset,
> +	.eh_bus_reset_handler	= fc_eh_reset_it_nexus_handler,
> 	.eh_host_reset_handler	= qla2xxx_eh_host_reset,


Hey,

One other comment. I do not think we can use the bus reset callout as is. I think for all fc drivers but mptfc, the channel is always 0. All targets on those hosts have the same channel value, and it seems  this code in scsi_error.c:scsi_eh_bus_reset():

                rtn = scsi_try_bus_reset(chan_scmd);
                if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
                        list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
                                if (channel == scmd_channel(scmd)) {

would end up matching all the commands on the host instead of just the commands on the specific target that was passed into scsi_try_bus_reset if SUCCESS or FAST_IO_FAIL was returned by scsi_try_bus_reset. If there were 2 faulty targets on a host, you would end up calling fc_eh_reset_it_nexus_handler on only 1.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-07 19:58   ` Chad Dupuis
  2012-12-07 21:05     ` Jeremy Linton
@ 2012-12-10 10:18     ` Hannes Reinecke
  1 sibling, 0 replies; 11+ messages in thread
From: Hannes Reinecke @ 2012-12-10 10:18 UTC (permalink / raw)
  To: Chad Dupuis
  Cc: Mike Christie, linux-scsi@vger.kernel.org, James Smart,
	Andrew Vasquez, James Bottomley

Am 12/07/2012 08:58 PM, schrieb Chad Dupuis:
>
>
> On Fri, 7 Dec 2012, Mike Christie wrote:
>
>> On 12/07/2012 08:51 AM, Hannes Reinecke wrote:
>>> 'Bus reset' is not really applicable to FibreChannel, as
>>> the concept of a bus doesn't apply. Hence all FC LLDD
>>> simulate a 'bus reset' by sending a target reset to each
>>> attached remote port, causing error handling to spill
>>> over to unaffected devices.
>>>
>>> This patch implements a 'I_T nexus reset' handler,
>>> which attempts to reset the I_T nexus to the remote
>>> port. This way only the affected remote ports are
>>> reset; other ports are left untouched.
>>
>> Is the I_T nexus reset we are doing in this patch supposed to be the
>> same one defined in SAM? Was the I_T nexus reset TMF added to SAM at the
>> same time the target reset one was removed? In SAM 4 and 5 there is no
>> target reset anymore is there?
>>
>> I think we should just kill the bus reset use from the FC drivers. Add a
>> new I_T nexus reset callout to the scsi_host_template or to the
>> scsi_transport_template. Then have scsi-ml call just either target reset
>> eh callout or I_T nexus eh reset callout depending on what the target
>> supports.
>>
>> To figure out what the target supports could we do a REPORTED SUPPORTED
>> TASK MANAGEMENT FUNCTION command. If the target supports that command
>> and reports that the target supports the I_T nexus reset TMF then call
>> that eh callback, else drop down to older target reset eh callback.
>>
>> It seems that if we do I_T nexus reset we do not need to also do a
>> target reset do we?
>>
>
> It would be good to have a specific I_T nexus reset callout in either the
> host or transport so we can clean things up.
>
> Currently in the bus reset handler after we perform all the
> target resets we wait for all the associated DMA activity to clear up
> before we return from the bus_reset handler.  It would be good to have
> the same capability in a new I_T nexus reset handler as well.
>
Hmm. Thanks for bringing this up, as this is exactly one of the issues
I wanted to get feedback from the hardware guys.

When doing an I_T Nexus reset I'd need:
- Set the port type to BLOCKED
- Abort all outstanding I/O
- invoke dev_loss_tmo to reset the port
- Return status 'GOOD' if we could abort all I/O or FAILED if not

I'm not sure about the third item; I _think_ we might be okay by just 
setting the port state to 'BLOCKED' and invoke the dev_loss_tmo
mechanism here. That should be resetting the port eventually.

However, the tricky bit is the second item.
fc_terminate_rport_io() is more often than not invoked asynchronously,
so the only thing I can be sure of is that we have _started_ to
abort all I/O, not neccessarily that all I/O is actually aborted.

So hooking up eh_target_reset there instead of fc_terminate_rport_io()
would not only clear up any outstanding I/O but also give me a nice
status back.

Hmm. That actually makes sense.
I think I'll be drafting out a new version of the patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-12-10  8:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-07 14:51 [PATCH][RFC] scsi_transport_fc: Implement I_T nexus reset Hannes Reinecke
2012-12-07 18:58 ` Mike Christie
2012-12-07 19:58   ` Chad Dupuis
2012-12-07 21:05     ` Jeremy Linton
2012-12-07 21:20       ` Mike Christie
2012-12-07 22:33         ` Jeremy Linton
2012-12-10 10:18     ` Hannes Reinecke
2012-12-09 15:40   ` Hannes Reinecke
2012-12-09 23:19     ` Mike Christie
2012-12-09 23:31       ` Mike Christie
2012-12-10  2:27 ` Michael Christie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.