[RFC] fc transport: extensions for fast fail and dev loss

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] fc transport: extensions for fast fail and dev loss
@ 2006-06-20 18:45 James Smart
  2006-07-25 17:12 ` Mike Christie
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: James Smart @ 2006-06-20 18:45 UTC (permalink / raw)
  To: linux-scsi

Folks,

The following addresses some long standing todo items I've had in the
FC transport. They primarily arise when considering multipathing, or
trying to marry driver internal state to transport state. It is intended
that this same type of functionality would be usable in other transports
as well.

Here's what is contained:

- dev_loss_tmo LLDD callback :
  Currently, there is no notification to the LLDD of when the transport
  gives up on the device returning and starts to return DID_NO_CONNECT
  in the queuecommand helper function. This callback notifies the LLDD
  that the transport has now given up on the rport, thereby acknowledging
  the prior fc_remote_port_delete() call. The callback also expects the
  LLDD to initiate the termination of any outstanding i/o on the rport.

- fast_io_fail_tmo and LLD callback:
  There are some cases where it may take a long while to truly determine
  device loss, but the system is in a multipathing configuration that if
  the i/o was failed quickly (faster than dev_loss_tmo), it could be
  redirected to a different path and completed sooner (assuming the 
  multipath thing knew that the sdev was blocked).
  
  iSCSI is one of the transports that may vary dev_loss_tmo values
  per session, and you would like fast io failure.

- fast_loss_time recommendation:
  In discussing how a admin should set dev_loss_tmo in a multipathing
  environment, it became apparent that we expected the admin to know
  a lot. They had to know the transport type, what the minimum setting
  can be that still survives normal link bouncing, and they may even
  have to know about device specifics.  For iSCSI, the proper loss time
  may vary widely from session to session.

  This attribute is an exported "recommendation" by the LLDD and transport
  on what the lowest setting for dev_loss_tmo should be for a multipathing
  environment. Thus, the admin only needs to cat this attribute to obtain
  the value to echo into dev_loss_tmo.
 
I have one criticism of these changes. The callbacks are calling into
the LLDD with an rport post the driver's rport_delete call. What it means
is that we are essentially extending the lifetime of an rport until the
dev_loss_tmo call occurs.

-- james s



diff -upNr a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
--- a/include/scsi/scsi_transport_fc.h	2006-06-14 11:37:54.000000000 -0400
+++ b/include/scsi/scsi_transport_fc.h	2006-06-16 10:29:22.000000000 -0400
@@ -187,6 +187,8 @@ struct fc_rport {	/* aka fc_starget_attr
 
 	/* Dynamic Attributes */
 	u32 dev_loss_tmo;	/* Remote Port loss timeout in seconds. */
+	u32 fast_loss_time;	/* Fastest setting for dev_loss_tmo to
+				 * detect a path failure. */
 
 	/* Private (Transport-managed) Attributes */
 	u64 node_name;
@@ -195,6 +197,7 @@ struct fc_rport {	/* aka fc_starget_attr
 	u32 roles;
 	enum fc_port_state port_state;	/* Will only be ONLINE or UNKNOWN */
 	u32 scsi_target_id;
+	u32 fast_io_fail_tmo;
 
 	/* exported data */
 	void *dd_data;			/* Used for driver-specific storage */
@@ -399,6 +402,7 @@ struct fc_host_attrs {
 struct fc_function_template {
 	void    (*get_rport_dev_loss_tmo)(struct fc_rport *);
 	void	(*set_rport_dev_loss_tmo)(struct fc_rport *, u32);
+	void    (*get_rport_fast_loss_time)(struct fc_rport *);
 
 	void	(*get_starget_node_name)(struct scsi_target *);
 	void	(*get_starget_port_name)(struct scsi_target *);
@@ -416,6 +420,9 @@ struct fc_function_template {
 
 	int	(*issue_fc_host_lip)(struct Scsi_Host *);
 
+	void    (*dev_loss_tmo_callbk)(struct fc_rport *);
+	void	(*terminate_rport_io)(struct fc_rport *);
+
 	/* allocation lengths for host-specific data */
 	u32	 			dd_fcrport_size;
 
diff -upNr a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
--- a/drivers/scsi/scsi_transport_fc.c	2006-06-15 15:10:47.000000000 -0400
+++ b/drivers/scsi/scsi_transport_fc.c	2006-06-16 10:41:05.000000000 -0400
@@ -216,6 +216,7 @@ fc_bitfield_name_search(remote_port_role
 
 
 static void fc_timeout_deleted_rport(void *data);
+static void fc_timeout_fail_rport_io(void *data);
 static void fc_scsi_scan_rport(void *data);
 
 /*
@@ -223,7 +224,7 @@ static void fc_scsi_scan_rport(void *dat
  * Increase these values if you add attributes
  */
 #define FC_STARGET_NUM_ATTRS 	3
-#define FC_RPORT_NUM_ATTRS	9
+#define FC_RPORT_NUM_ATTRS	11
 #define FC_HOST_NUM_ATTRS	17
 
 struct fc_internal {
@@ -377,6 +378,19 @@ MODULE_PARM_DESC(dev_loss_tmo,
 		 " exceeded, the scsi target is removed. Value should be"
 		 " between 1 and SCSI_DEVICE_BLOCK_MAX_TIMEOUT.");
 
+/*
+ * fast_loss_time: the minimum number of seconds at which the FC transport
+ *   can detect a real device loss. The user can set dev_loss_tmo to this
+ *   value in multipath configurations that want fast path-loss detection.
+ */
+static unsigned int fc_fast_loss_time = 5;	/* seconds */
+
+module_param_named(fast_loss_time, fc_fast_loss_time, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(fast_loss_time,
+		 "Minimum number of seconds at which the FC transport can"
+		 " detect the loss of a remote port. To be meaningful, the"
+		 " value should be less than the dev_loss_tmo parameter.");
+
 
 static __init int fc_transport_init(void)
 {
@@ -510,6 +524,13 @@ static FC_CLASS_DEVICE_ATTR(rport, title
 	if (i->f->show_rport_##field)					\
 		count++
 
+#define SETUP_PRIVATE_RPORT_ATTRIBUTE_RW(field)				\
+{									\
+	i->private_rport_attrs[count] = class_device_attr_rport_##field; \
+	i->rport_attrs[count] = &i->private_rport_attrs[count];		\
+	count++;							\
+}
+
 
 /* The FC Transport Remote Port Attributes: */
 
@@ -555,6 +576,28 @@ store_fc_rport_dev_loss_tmo(struct class
 static FC_CLASS_DEVICE_ATTR(rport, dev_loss_tmo, S_IRUGO | S_IWUSR,
 		show_fc_rport_dev_loss_tmo, store_fc_rport_dev_loss_tmo);
 
+/*
+ * fast_loss_time attribute
+ */
+static ssize_t
+show_fc_rport_fast_loss_time(struct class_device *cdev, char *buf)
+{
+	struct fc_rport *rport = transport_class_to_rport(cdev);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct fc_internal *i = to_fc_internal(shost->transportt);
+	if ((i->f->get_rport_fast_loss_time) &&
+	    !((rport->port_state == FC_PORTSTATE_BLOCKED) ||
+	      (rport->port_state == FC_PORTSTATE_DELETED) ||
+	      (rport->port_state == FC_PORTSTATE_NOTPRESENT)))
+		i->f->get_rport_fast_loss_time(rport);
+	/* never return a value greater than dev_loss_tmo */
+	return snprintf(buf, 20, "%d\n", 
+		((rport->fast_loss_time < rport->dev_loss_tmo) ?
+			rport->fast_loss_time :  rport->dev_loss_tmo));
+}
+static FC_CLASS_DEVICE_ATTR(rport, fast_loss_time, S_IRUGO,
+		show_fc_rport_fast_loss_time, NULL);
+
 
 /* Private Remote Port Attributes */
 
@@ -597,6 +640,40 @@ static FC_CLASS_DEVICE_ATTR(rport, roles
 fc_private_rport_rd_enum_attr(port_state, FC_PORTSTATE_MAX_NAMELEN);
 fc_private_rport_rd_attr(scsi_target_id, "%d\n", 20);
 
+/*
+ * fast_io_fail_tmo attribute
+ */
+static ssize_t
+show_fc_rport_##field (struct class_device *cdev, char *buf)
+{
+	struct fc_rport *rport = transport_class_to_rport(cdev);
+	if (rport->fast_io_fail_tmo == -1)
+		return snprintf(buf, 5, "off\n");
+	return snprintf(buf, 20, "%d\n", rport->fast_io_fail_tmo);
+}
+static ssize_t
+store_fc_rport_fast_io_fail_tmo(struct class_device *cdev, const char *buf,
+			   size_t count)
+{
+	int val;
+	struct fc_rport *rport = transport_class_to_rport(cdev);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	if ((rport->port_state == FC_PORTSTATE_BLOCKED) ||
+	    (rport->port_state == FC_PORTSTATE_DELETED) ||
+	    (rport->port_state == FC_PORTSTATE_NOTPRESENT))
+		return -EBUSY;
+	if (strncmp(buf, "off", 3) == 0)
+		rport->fast_io_fail_tmo = -1;
+	else {
+		val = simple_strtoul(buf, NULL, 0);
+		if ((val < 0) || (val >= rport->dev_loss_tmo))
+			return -EINVAL;
+		rport->fast_io_fail_tmo = val;
+	}
+	return count;
+}
+static FC_CLASS_DEVICE_ATTR(rport, fast_io_fail_tmo, S_IRUGO | S_IWUSR,
+	show_fc_rport_fast_io_fail_tmo, store_fc_rport_fast_io_fail_tmo);
 
 
 /*
@@ -1251,12 +1328,15 @@ fc_attach_transport(struct fc_function_t
 	SETUP_RPORT_ATTRIBUTE_RD(maxframe_size);
 	SETUP_RPORT_ATTRIBUTE_RD(supported_classes);
 	SETUP_RPORT_ATTRIBUTE_RW(dev_loss_tmo);
+	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(fast_loss_time);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(node_name);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_name);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_id);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(roles);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_state);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(scsi_target_id);
+	if (ft->terminate_rport_io)
+		SETUP_PRIVATE_RPORT_ATTRIBUTE_RW(fast_io_fail_tmo);
 
 	BUG_ON(count > FC_RPORT_NUM_ATTRS);
 
@@ -1461,10 +1541,17 @@ fc_rport_final_delete(void *data)
 	struct fc_rport *rport = (struct fc_rport *)data;
 	struct device *dev = &rport->dev;
 	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct fc_internal *i = to_fc_internal(shost->transportt);
 
-	/* Delete SCSI target and sdevs */
-	if (rport->scsi_target_id != -1)
-		fc_starget_delete(data);
+	/*
+	 * Involve the LLDD if possible. All io on the rport is to
+	 * be terminated, either as part of the dev_loss_tmo callback
+	 * processing, or via the terminate_rport_io function.
+	 */
+	if (i->f->dev_loss_tmo_callbk)
+		i->f->dev_loss_tmo_callbk(rport);
+	else if (i->f->terminate_rport_io)
+		i->f->terminate_rport_io(rport);
 
 	/*
 	 * if a scan is pending, flush the SCSI Host work_q so that 
@@ -1473,6 +1560,10 @@ fc_rport_final_delete(void *data)
 	if (rport->flags & FC_RPORT_SCAN_PENDING)
 		scsi_flush_work(shost);
 
+	/* Delete SCSI target and sdevs */
+	if (rport->scsi_target_id != -1)
+		fc_starget_delete(data);
+
 	transport_remove_device(dev);
 	device_del(dev);
 	transport_destroy_device(dev);
@@ -1515,6 +1606,7 @@ fc_rport_create(struct Scsi_Host *shost,
 	rport->maxframe_size = -1;
 	rport->supported_classes = FC_COS_UNSPECIFIED;
 	rport->dev_loss_tmo = fc_dev_loss_tmo;
+	rport->fast_loss_time = fc_fast_loss_time;
 	memcpy(&rport->node_name, &ids->node_name, sizeof(rport->node_name));
 	memcpy(&rport->port_name, &ids->port_name, sizeof(rport->port_name));
 	rport->port_id = ids->port_id;
@@ -1523,8 +1615,10 @@ fc_rport_create(struct Scsi_Host *shost,
 	if (fci->f->dd_fcrport_size)
 		rport->dd_data = &rport[1];
 	rport->channel = channel;
+	rport->fast_io_fail_tmo = -1;
 
 	INIT_WORK(&rport->dev_loss_work, fc_timeout_deleted_rport, rport);
+	INIT_WORK(&rport->fail_io_work, fc_timeout_fail_rport_io, rport);
 	INIT_WORK(&rport->scan_work, fc_scsi_scan_rport, rport);
 	INIT_WORK(&rport->stgt_delete_work, fc_starget_delete, rport);
 	INIT_WORK(&rport->rport_delete_work, fc_rport_final_delete, rport);
@@ -1837,6 +1931,7 @@ void
 fc_remote_port_delete(struct fc_rport  *rport)
 {
 	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct fc_internal *i = to_fc_internal(shost->transportt);
 	int timeout = rport->dev_loss_tmo;
 	unsigned long flags;
 
@@ -1869,6 +1964,12 @@ fc_remote_port_delete(struct fc_rport  *
 
 	/* cap the length the devices can be blocked until they are deleted */
 	fc_queue_devloss_work(shost, &rport->dev_loss_work, timeout * HZ);
+
+	/* see if we need to kill io faster than waiting for device loss */
+	if ((rport->fast_io_fail_tmo != -1) &&
+	    (rport->fast_io_fail_tmo < timeout) && (i->f->terminate_rport_io))
+		fc_queue_devloss_work(shost, &rport->fail_io_work,
+					rport->fast_io_fail_tmo * HZ);
 }
 EXPORT_SYMBOL(fc_remote_port_delete);
 
@@ -2047,6 +2148,28 @@ fc_timeout_deleted_rport(void  *data)
 }
 
 /**
+ * fc_timeout_fail_rport_io - Timeout handler for a fast io failing on a
+ *                       disconnected SCSI target.
+ * 
+ * @data:	rport to terminate io on.
+ *
+ * Notes: Only requests the failure of the io, not that all are flushed
+ *    prior to returning.
+ **/
+static void
+fc_timeout_fail_rport_io(void  *data)
+{
+	struct fc_rport *rport = (struct fc_rport *)data;
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct fc_internal *i = to_fc_internal(shost->transportt);
+
+	if (rport->port_state != FC_PORTSTATE_BLOCKED)
+		return;
+
+	i->f->terminate_rport_io(rport);
+}
+
+/**
  * fc_scsi_scan_rport - called to perform a scsi scan on a remote port.
  *
  * @data:	remote port to be scanned.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] fc transport: extensions for fast fail and dev loss
  2006-06-20 18:45 [RFC] fc transport: extensions for fast fail and dev loss James Smart
@ 2006-07-25 17:12 ` Mike Christie
  2006-07-25 18:49   ` James Smart
  2006-07-26  9:20 ` Christoph Hellwig
  2006-08-08 17:54 ` [RFC] [Last Rites] " James Smart
  2 siblings, 1 reply; 15+ messages in thread
From: Mike Christie @ 2006-07-25 17:12 UTC (permalink / raw)
  To: James.Smart; +Cc: linux-scsi

James Smart wrote:
> Folks,
> 
> The following addresses some long standing todo items I've had in the
> FC transport. They primarily arise when considering multipathing, or
> trying to marry driver internal state to transport state. It is intended
> that this same type of functionality would be usable in other transports
> as well.
> 

I agree we need something like this. iSCSI is going to move to something
closer to FC in 2.6.19 to better integrate qla4xxx and give FC and iSCSI
a similar interface when it makes sense.

> Here's what is contained:
> 
> - dev_loss_tmo LLDD callback :
>   Currently, there is no notification to the LLDD of when the transport
>   gives up on the device returning and starts to return DID_NO_CONNECT
>   in the queuecommand helper function. This callback notifies the LLDD
>   that the transport has now given up on the rport, thereby acknowledging
>   the prior fc_remote_port_delete() call. The callback also expects the
>   LLDD to initiate the termination of any outstanding i/o on the rport.
> 

iSCSI does something like this at the lower level right now. For the
common lower level iscsi layer that software drivers share we have a
callback that allows drivers to do the same thing as your dev_loss_tmo
callback. When we move to the new model we will need something like this.

> - fast_io_fail_tmo and LLD callback:
>   There are some cases where it may take a long while to truly determine
>   device loss, but the system is in a multipathing configuration that if
>   the i/o was failed quickly (faster than dev_loss_tmo), it could be
>   redirected to a different path and completed sooner (assuming the 
>   multipath thing knew that the sdev was blocked).
>   
>   iSCSI is one of the transports that may vary dev_loss_tmo values
>   per session, and you would like fast io failure.
> 

Agree. Currently we are sort of doing this in userspace, but since
qla4xxx does more in the kernel we would like to move it there so
qla4xxxx and other HW iscsi cards do not have to jump to so many hoops
to use the functionality.

> - fast_loss_time recommendation:
>   In discussing how a admin should set dev_loss_tmo in a multipathing
>   environment, it became apparent that we expected the admin to know
>   a lot. They had to know the transport type, what the minimum setting
>   can be that still survives normal link bouncing, and they may even
>   have to know about device specifics.  For iSCSI, the proper loss time
>   may vary widely from session to session.
> 
>   This attribute is an exported "recommendation" by the LLDD and transport
>   on what the lowest setting for dev_loss_tmo should be for a multipathing
>   environment. Thus, the admin only needs to cat this attribute to obtain
>   the value to echo into dev_loss_tmo.
>  
> I have one criticism of these changes. The callbacks are calling into
> the LLDD with an rport post the driver's rport_delete call. What it means
> is that we are essentially extending the lifetime of an rport until the
> dev_loss_tmo call occurs.
> 

So is the fast_io_fail_tmo callback the terminate_rport_io callback? If
so, are we supposed to unblock the rport/session/target from
fc_timeout_fail_rport_io and call into the LLD and the LLD will set some
bit (or maybe check some rport/session/target/scsi_device bit) so that
incoming IO and IO sitting in the driver will be failed with something
like DID_BUS_BUSY so it goes to the upper layers? I think I only the
unblock happen on success or fc_starget_delete, so IO in the driver
looks like it can get failed upwards but IO sitting in the queue sits
there until fc_rport_final_delete or success.

If that is correct, what about a new device state? When the fail fast
tmo expires we can set the device to the new state, run the queue and
incoming IO or IO in the request_queue marked with FAILFAST can be
failed upwards by scsi-ml.

I just woke up though :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] fc transport: extensions for fast fail and dev loss
  2006-07-25 17:12 ` Mike Christie
@ 2006-07-25 18:49   ` James Smart
  2006-07-25 21:15     ` Michael Reed
  0 siblings, 1 reply; 15+ messages in thread
From: James Smart @ 2006-07-25 18:49 UTC (permalink / raw)
  To: Mike Christie; +Cc: linux-scsi

Mike Christie wrote:
> 
> So is the fast_io_fail_tmo callback the terminate_rport_io callback?

Yes.
When fast_io_fail_tmo expires, it calls the terminate_rport_io() callback.

 > If
> so, are we supposed to unblock the rport/session/target from
> fc_timeout_fail_rport_io

No... don't unblock.

 > and call into the LLD and the LLD will set some
> bit (or maybe check some rport/session/target/scsi_device bit) so that
> incoming IO and IO sitting in the driver will be failed with something
> like DID_BUS_BUSY so it goes to the upper layers?

The way this is managed in the fc transport is - the LLD calls the
transport when it establishes connectivity (an "add" call), and when it
loses connectivity (a "delete" call). When the transport receives the
delete call, it changs the rport state, blocks the rport, and starts the
dev_loss timeout (and potentially the fast_io_fail_tmo if < dev_loss).
If the LLD makes the add call prior to dev_loss expiring, then updates
the state, and unblocks the rport.  If dev_loss expires, it updates
state again (essentially the true deleted state) and tears down the
target tree.

To deal with requests being received while blocked, etc - the LLD's use
a helper routine (fc_remote_port_chkready()), which validates the rport
state, and if not valid (e.g. blocked or removed) returns the appropriate
status to return to the midlayer. If blocked, it returns DID_IMM_RETRY.
If deleted, it returns DID_NO_CONNECT.

What the above never dealt with was the i/o already in the driver. The
driver always had the option to terminate the active i/o when the loss
of connectivity occured, or it could just wait for it to timeout, etc
and be killed that way. This patch added the callback at dev_loss_tmo
to guarantee i/o is killed, and added the fast_io_fail_tmo if you
wanted a faster guarantee. If fast_io_fail_tmo expires and the callback
is called - it just kills the outstanding i/o and does nothing to the
rport's blocked state.

 > I think I only the
> unblock happen on success or fc_starget_delete, so IO in the driver
> looks like it can get failed upwards but IO sitting in the queue sits
> there until fc_rport_final_delete or success.

Yeah - essentially this is correct. I hope the above read that way.
I'm also hoping the iSER folks are reading this to get the general
feel of what's happening with block, dev_loss, etc.

> 
> If that is correct, what about a new device state? When the fail fast
> tmo expires we can set the device to the new state, run the queue and
> incoming IO or IO in the request_queue marked with FAILFAST can be
> failed upwards by scsi-ml.
> 
> I just woke up though :)

Sounds reasonable. It is adding a new semantic to what was meant by
fast_fail - but it's in line with our goal.  The goal was to terminate
i/o so that they could be quick rescheduled on a different path rather
than wait (what may be a long time) for the dev_loss connectivity
timer to fire.  Makes sense you would want to make new i/o requests
bound by that same window.

-- james s

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] fc transport: extensions for fast fail and dev loss
  2006-07-25 18:49   ` James Smart
@ 2006-07-25 21:15     ` Michael Reed
  2006-07-26  3:33       ` James Smart
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Reed @ 2006-07-25 21:15 UTC (permalink / raw)
  To: James.Smart
  Cc: Mike Christie, linux-scsi, Andrew Vasquez, aherrman,
	Christoph Hellwig, duane.grigsby, Moore, Eric Dean,
	Shirron, Stephen, Jeremy Higdon



James Smart wrote:
> 
> 
> Mike Christie wrote:
>>
>> So is the fast_io_fail_tmo callback the terminate_rport_io callback?
> 
> Yes.
> When fast_io_fail_tmo expires, it calls the terminate_rport_io() callback.
> 
>> If
>> so, are we supposed to unblock the rport/session/target from
>> fc_timeout_fail_rport_io
> 
> No... don't unblock.
> 
>> and call into the LLD and the LLD will set some
>> bit (or maybe check some rport/session/target/scsi_device bit) so that
>> incoming IO and IO sitting in the driver will be failed with something
>> like DID_BUS_BUSY so it goes to the upper layers?
> 
> The way this is managed in the fc transport is - the LLD calls the
> transport when it establishes connectivity (an "add" call), and when it
> loses connectivity (a "delete" call). When the transport receives the
> delete call, it changes the rport state, blocks the rport, and starts the
> dev_loss timeout (and potentially the fast_io_fail_tmo if < dev_loss).
> If the LLD makes the add call prior to dev_loss expiring, then updates
> the state, and unblocks the rport.  If dev_loss expires, it updates
> state again (essentially the true deleted state) and tears down the
> target tree.
> 
> To deal with requests being received while blocked, etc - the LLD's use
> a helper routine (fc_remote_port_chkready()), which validates the rport
> state, and if not valid (e.g. blocked or removed) returns the appropriate
> status to return to the midlayer. If blocked, it returns DID_IMM_RETRY.
> If deleted, it returns DID_NO_CONNECT.
> 
> What the above never dealt with was the i/o already in the driver. The
> driver always had the option to terminate the active i/o when the loss
> of connectivity occurred, or it could just wait for it to timeout, etc
> and be killed that way. This patch added the callback at dev_loss_tmo
> to guarantee i/o is killed, and added the fast_io_fail_tmo if you
> wanted a faster guarantee. If fast_io_fail_tmo expires and the callback
> is called - it just kills the outstanding i/o and does nothing to the
> rport's blocked state.

Haven't most drivers / board firmware generally cleaned up any outstanding
i/o at the time (or shortly thereafter) of the fc_remote_port_delete()
call?  I would think it reasonable to just require that the driver clean
up the i/o after calling fc_remote_port_delete().  Is there a significant
reason to keep the i/o alive in the driver?  The rport has just been
deleted....  Would this eliminate the need for the callback?  If the
driver implements this, could it just have a NULL callback routine?

Mike

> 
>> I think I only the
>> unblock happen on success or fc_starget_delete, so IO in the driver
>> looks like it can get failed upwards but IO sitting in the queue sits
>> there until fc_rport_final_delete or success.
> 
> Yeah - essentially this is correct. I hope the above read that way.
> I'm also hoping the iSER folks are reading this to get the general
> feel of what's happening with block, dev_loss, etc.
> 
>>
>> If that is correct, what about a new device state? When the fail fast
>> tmo expires we can set the device to the new state, run the queue and
>> incoming IO or IO in the request_queue marked with FAILFAST can be
>> failed upwards by scsi-ml.
>>
>> I just woke up though :)
> 
> Sounds reasonable. It is adding a new semantic to what was meant by
> fast_fail - but it's in line with our goal.  The goal was to terminate
> i/o so that they could be quick rescheduled on a different path rather
> than wait (what may be a long time) for the dev_loss connectivity
> timer to fire.  Makes sense you would want to make new i/o requests
> bound by that same window.
> 
> -- james s
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] fc transport: extensions for fast fail and dev loss
  2006-07-25 21:15     ` Michael Reed
@ 2006-07-26  3:33       ` James Smart
  0 siblings, 0 replies; 15+ messages in thread
From: James Smart @ 2006-07-26  3:33 UTC (permalink / raw)
  To: Michael Reed
  Cc: Mike Christie, linux-scsi, Andrew Vasquez, aherrman,
	Christoph Hellwig, duane.grigsby, Moore, Eric Dean,
	Shirron, Stephen, Jeremy Higdon

Michael Reed wrote:

> Haven't most drivers / board firmware generally cleaned up any outstanding
> i/o at the time (or shortly thereafter) of the fc_remote_port_delete()
> call?  I would think it reasonable to just require that the driver clean
> up the i/o after calling fc_remote_port_delete().  Is there a significant
> reason to keep the i/o alive in the driver?  The rport has just been
> deleted....  Would this eliminate the need for the callback?  If the
> driver implements this, could it just have a NULL callback routine?

You don't want to kill i/o at the time that the rport loses
connectivity (e.g. when we call delete). The biggest reason is for devices
that support FCP Recovery (Tape being a prime example, but it's not limited
to them). There are also several topology glitches, which can be seconds
in length, which are abberant, and really didn't reflect true device loss.
Why kill and reissue these io's as well ? it really just adds more load to
the system to abort and retry the i/o's.  Also, FC-DA compliance is
recommending that the login and the exchanges stay around as long as
possible (you could argue the diagram mandates it). So, for the best
"FC behavior" you really shouldn't kill the i/o until dev_loss_tmo fires.

-- james s

PS: I think you're highlighting my complaint with the patch. I should
  be renaming the "delete" function as it really isn't delete. The rport
  isn't truly "deleted" until dev_loss_tmo fires.  We didn't consider
  this when we dropped the block/unblock_rport routines in favor of the
  more straight-forward add/delete calls. At this point, I'd rather
  document the lifetime in the api/header than deal with namechanges and
  its affects on drivers/distros.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] fc transport: extensions for fast fail and dev loss
  2006-06-20 18:45 [RFC] fc transport: extensions for fast fail and dev loss James Smart
  2006-07-25 17:12 ` Mike Christie
@ 2006-07-26  9:20 ` Christoph Hellwig
  2006-07-26 16:35   ` James Smart
  2006-08-08 17:54 ` [RFC] [Last Rites] " James Smart
  2 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2006-07-26  9:20 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi

On Tue, Jun 20, 2006 at 02:45:23PM -0400, James Smart wrote:
> Folks,
> 
> The following addresses some long standing todo items I've had in the
> FC transport. They primarily arise when considering multipathing, or
> trying to marry driver internal state to transport state. It is intended
> that this same type of functionality would be usable in other transports
> as well.
> 
> Here's what is contained:
> 
> - dev_loss_tmo LLDD callback :
>   Currently, there is no notification to the LLDD of when the transport
>   gives up on the device returning and starts to return DID_NO_CONNECT
>   in the queuecommand helper function. This callback notifies the LLDD
>   that the transport has now given up on the rport, thereby acknowledging
>   the prior fc_remote_port_delete() call. The callback also expects the
>   LLDD to initiate the termination of any outstanding i/o on the rport.

I think this is fine.

> - fast_io_fail_tmo and LLD callback:
>   There are some cases where it may take a long while to truly determine
>   device loss, but the system is in a multipathing configuration that if
>   the i/o was failed quickly (faster than dev_loss_tmo), it could be
>   redirected to a different path and completed sooner (assuming the 
>   multipath thing knew that the sdev was blocked).

shouldn't we just always fail REQ_FAILFAST requests ASAP and totally
ignore any kind of devloss timeout for them?

>   This attribute is an exported "recommendation" by the LLDD and transport
>   on what the lowest setting for dev_loss_tmo should be for a multipathing
>   environment. Thus, the admin only needs to cat this attribute to obtain
>   the value to echo into dev_loss_tmo.

This kind of policy really doesn't belong into the kernel.  I'd rather
see a nice userspace command to get this right for the user as part of
sg_utils or Jeffs infamous blktool.

> I have one criticism of these changes. The callbacks are calling into
> the LLDD with an rport post the driver's rport_delete call. What it means
> is that we are essentially extending the lifetime of an rport until the
> dev_loss_tmo call occurs.

Which is okay as long as it's documented well enough.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] fc transport: extensions for fast fail and dev loss
  2006-07-26  9:20 ` Christoph Hellwig
@ 2006-07-26 16:35   ` James Smart
  0 siblings, 0 replies; 15+ messages in thread
From: James Smart @ 2006-07-26 16:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi


Christoph Hellwig wrote:
>> - fast_io_fail_tmo and LLD callback:
>>   There are some cases where it may take a long while to truly determine
>>   device loss, but the system is in a multipathing configuration that if
>>   the i/o was failed quickly (faster than dev_loss_tmo), it could be
>>   redirected to a different path and completed sooner (assuming the 
>>   multipath thing knew that the sdev was blocked).
> 
> shouldn't we just always fail REQ_FAILFAST requests ASAP and totally
> ignore any kind of devloss timeout for them?

A couple of questions....
- This implies 1 by 1 implicit i/o aborts. Keep in mind that the
   connectivity to the device/target has been lost, so you can't send
   transport-level single-io abort requests, nor Target-level TMF's.
   So.. how much are you trying to guarantee this behavior to the upper
   layers ?

   Please note that you may get differing behavior from different
   adapter/driver's. Some may support cancelling the i/o within the adapter
   (and properly protect against later link-side references), thus it works
   as desired. Others may not, and would then have to resort to implicit
   logouts - which will abort non-REQ_FAILFAST i/o's as well. This is ok
   if those i/o's are retryable (like on disks), but bad if they aren't
   (what if one of the luns were a tape?). Instead of implicit logouts,
   the driver may just ignore the REQ_FAILFAST flags all together and wait
   for dev_loss_tmo to kill things.

- Do you want a SCSI LLD looking at more than the scsi_cmnd ? (e.g. is it
   proper for it to be looking at the block request structure ?) Would this
   mean we want to reflect the block flag via a scsi_cmnd flag ?

- There's an argument on whether we're FC-DA compliant. Yes, Linux doesn't
   care and the above would be good for the system, but vendor selection
   still grades based on OS-ignorant transport standard compliance.

- Are we sure all the meaningful i/o will have REQ_FAILFAST set ?

>>   This attribute is an exported "recommendation" by the LLDD and transport
>>   on what the lowest setting for dev_loss_tmo should be for a multipathing
>>   environment. Thus, the admin only needs to cat this attribute to obtain
>>   the value to echo into dev_loss_tmo.
> 
> This kind of policy really doesn't belong into the kernel.  I'd rather
> see a nice userspace command to get this right for the user as part of
> sg_utils or Jeffs infamous blktool.

Makes sense. However, the tool may still need to get input from the
transport/LLD - so something like this may still be needed. Actually, it
would probably be this - we'd just change it to "a recommendation to a
tool" instead of the admin.

-- james s

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
  2006-06-20 18:45 [RFC] fc transport: extensions for fast fail and dev loss James Smart
  2006-07-25 17:12 ` Mike Christie
  2006-07-26  9:20 ` Christoph Hellwig
@ 2006-08-08 17:54 ` James Smart
  2006-08-08 21:56   ` Michael Reed
  2006-08-09 17:36   ` Christoph Hellwig
  2 siblings, 2 replies; 15+ messages in thread
From: James Smart @ 2006-08-08 17:54 UTC (permalink / raw)
  To: linux-scsi

Closing Statements:

I've attached the original RFC below. See
   http://marc.theaimsgroup.com/?l=linux-scsi&m=115082917628466&w=2

I've updated it with what I perceive to be the position and resolution
based on comments.  Keep in mind that we're trying to lay the groundwork
for common behavior and tunables between the transports.

Please let me know if I've mis-represented anything, or if there is
a dissention in the resolution. I'd like to close on this.

James Smart wrote:
> Folks,
> 
> The following addresses some long standing todo items I've had in the
> FC transport. They primarily arise when considering multipathing, or
> trying to marry driver internal state to transport state. It is intended
> that this same type of functionality would be usable in other transports
> as well.
> 
> Here's what is contained:
> 
> - dev_loss_tmo LLDD callback :
>   Currently, there is no notification to the LLDD of when the transport
>   gives up on the device returning and starts to return DID_NO_CONNECT
>   in the queuecommand helper function. This callback notifies the LLDD
>   that the transport has now given up on the rport, thereby acknowledging
>   the prior fc_remote_port_delete() call. The callback also expects the
>   LLDD to initiate the termination of any outstanding i/o on the rport.

I believe there is no dissention on this change.
Please note: this is essentially a confirmation from the transport to the
   LLD that the rport is fully deleted. Thus, the LLD must expect to see
   these callbacks as a normal part of the rport being terminated (even if
   it is not blocked).

I'll move forward with this.

> 
> - fast_io_fail_tmo and LLD callback:
>   There are some cases where it may take a long while to truly determine
>   device loss, but the system is in a multipathing configuration that if
>   the i/o was failed quickly (faster than dev_loss_tmo), it could be
>   redirected to a different path and completed sooner (assuming the 
>   multipath thing knew that the sdev was blocked).
>   
>   iSCSI is one of the transports that may vary dev_loss_tmo values
>   per session, and you would like fast io failure.


The current transport implementation did not specify what happened to
   active i/o (given to the driver, in the adapter, but not yet completed
   back to the midlayer) when a device was blocked, nor during the
   block-to->dev_loss transition period. It was up to the driver.  Many
   assumed active i/o was immediately terminated, which is semi-consistent
   with the behavior of most drivers for most "connectivity loss" scenarios.

The conversations then started to jump around, considering what i/o's you
   may want to have fail quickly, etc.

Here's my opinion:
   We have the following points in time to look at:
    (a) the device is blocked by the transport
    (b) there is a time T, usually in a multipathing environment, where it
        would be useful to error the i/o early rather than wait for dev_loss
        It is assumed that any such i/o request would be marked REQ_FASTFAIL
    (c) the dev_loss_tmo fires - we're to assume the device is gone
   and at any time post (a), the device may return, unblock and never
   encounter points (b) and (c).

   As for what happens to active i/o :

   always: the driver can fail an i/o at any point in time if it deems
           it appropriate.

   at (a): There are scenarios where a short link perturbation may occur,
           which may not disrupt the i/o. Therefore, we should not force
           io to be terminated.

   at (b): Minimally, we should terminate all active i/o requests marked
           as type REQ_FASTFAIL. From an api perspective, driver support
           for this is optional. And we must also assume that there will
           be implementations which have to abort all i/o in order to
           terminate those marked REQ_FASTFAIL. Is this acceptable ?
           (it meets the "always" condition above)

           Q: so far we've limited the io to those w/ REQ_FASTFAIL.
             Would we ever want to allow a user to fast fail all i/o
             regardless of the request flags ? (in case they flags
             weren't getting set on all the i/o the user wanted to
             see fail ?)

           There's a desire to address pending i/o (those on the block
           request queue or new requests going there) so that if we've
           crossed point (b) that we also fail them.  The proposal is
           to add a new state (device ? or queue ?), which would occur
           as of point (b). All REQ_FASTFAIL io on the queue, as well
           as on a new io, will be failed with a new i/o status if in
           this state. Non-REQ_FASTFAIL i/o would continue to enter/sit
           on the request queue until dev_loss_tmo fires.

   at (c): per the dev_loss_tmo callback, all i/o should be terminated.
           Their completions do not have to be synchronous to the return
           from the callback - they can occur afterward.


Comments ?

Assuming that folks agree, I'd like to do this in 2 patches:
  - one that puts in the transport fast_io_fail_tmo and LLD callback
  - another that adds the new state, io completion status, and does the
    handling of the request queue REQ_FASTFAIL i/o.

> 
> - fast_loss_time recommendation:
>   In discussing how a admin should set dev_loss_tmo in a multipathing
>   environment, it became apparent that we expected the admin to know
>   a lot. They had to know the transport type, what the minimum setting
>   can be that still survives normal link bouncing, and they may even
>   have to know about device specifics.  For iSCSI, the proper loss time
>   may vary widely from session to session.
> 
>   This attribute is an exported "recommendation" by the LLDD and transport
>   on what the lowest setting for dev_loss_tmo should be for a multipathing
>   environment. Thus, the admin only needs to cat this attribute to obtain
>   the value to echo into dev_loss_tmo.

The only objection was from Christoph - wanting a utility to get/set this
stuff. However, the counter was this attribute was still meaningful, as it
was the conduit to obtain a recommendation from the transport/LLD.

So - I assume this proceeds as is - with a change in it's description.

>  
> I have one criticism of these changes. The callbacks are calling into
> the LLDD with an rport post the driver's rport_delete call. What it means
> is that we are essentially extending the lifetime of an rport until the
> dev_loss_tmo call occurs.

It's ok - and adding the appropriate comments are fine.


Thanks.

-- james s


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
  2006-08-08 17:54 ` [RFC] [Last Rites] " James Smart
@ 2006-08-08 21:56   ` Michael Reed
  2006-08-08 22:15     ` Michael Reed
  2006-08-09 17:36   ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Michael Reed @ 2006-08-08 21:56 UTC (permalink / raw)
  To: James.Smart; +Cc: linux-scsi



James Smart wrote:
> Closing Statements:
> 
> I've attached the original RFC below. See
>   http://marc.theaimsgroup.com/?l=linux-scsi&m=115082917628466&w=2
> 
> I've updated it with what I perceive to be the position and resolution
> based on comments.  Keep in mind that we're trying to lay the groundwork
> for common behavior and tunables between the transports.
> 
> Please let me know if I've mis-represented anything, or if there is
> a dissention in the resolution. I'd like to close on this.
> 
> James Smart wrote:
>> Folks,
>>
>> The following addresses some long standing todo items I've had in the
>> FC transport. They primarily arise when considering multipathing, or
>> trying to marry driver internal state to transport state. It is intended
>> that this same type of functionality would be usable in other transports
>> as well.
>>
>> Here's what is contained:
>>
>> - dev_loss_tmo LLDD callback :
>>   Currently, there is no notification to the LLDD of when the transport
>>   gives up on the device returning and starts to return DID_NO_CONNECT
>>   in the queuecommand helper function. This callback notifies the LLDD
>>   that the transport has now given up on the rport, thereby acknowledging
>>   the prior fc_remote_port_delete() call. The callback also expects the
>>   LLDD to initiate the termination of any outstanding i/o on the rport.
> 
> I believe there is no dissention on this change.
> Please note: this is essentially a confirmation from the transport to the
>   LLD that the rport is fully deleted. Thus, the LLD must expect to see
>   these callbacks as a normal part of the rport being terminated (even if
>   it is not blocked).
> 
> I'll move forward with this.

Concur.

> 
>>
>> - fast_io_fail_tmo and LLD callback:
>>   There are some cases where it may take a long while to truly determine
>>   device loss, but the system is in a multipathing configuration that if
>>   the i/o was failed quickly (faster than dev_loss_tmo), it could be
>>   redirected to a different path and completed sooner (assuming the  
>> multipath thing knew that the sdev was blocked).
>>     iSCSI is one of the transports that may vary dev_loss_tmo values
>>   per session, and you would like fast io failure.
> 
> 
> The current transport implementation did not specify what happened to
>   active i/o (given to the driver, in the adapter, but not yet completed
>   back to the midlayer) when a device was blocked, nor during the
>   block-to->dev_loss transition period. It was up to the driver.  Many
>   assumed active i/o was immediately terminated, which is semi-consistent
>   with the behavior of most drivers for most "connectivity loss" scenarios.
> 
> The conversations then started to jump around, considering what i/o's you
>   may want to have fail quickly, etc.
> 
> Here's my opinion:
>   We have the following points in time to look at:
>    (a) the device is blocked by the transport
>    (b) there is a time T, usually in a multipathing environment, where it
>        would be useful to error the i/o early rather than wait for dev_loss
>        It is assumed that any such i/o request would be marked REQ_FASTFAIL
>    (c) the dev_loss_tmo fires - we're to assume the device is gone
>   and at any time post (a), the device may return, unblock and never
>   encounter points (b) and (c).

REQ_FAILFAST is stored in the request structure.  Are there "issues"
with using scsi_cmnd.request in the lldd?

> 
>   As for what happens to active i/o :
> 
>   always: the driver can fail an i/o at any point in time if it deems
>           it appropriate.
> 
>   at (a): There are scenarios where a short link perturbation may occur,
>           which may not disrupt the i/o. Therefore, we should not force
>           io to be terminated.
> 
>   at (b): Minimally, we should terminate all active i/o requests marked
>           as type REQ_FASTFAIL. From an api perspective, driver support
>           for this is optional. And we must also assume that there will
>           be implementations which have to abort all i/o in order to
>           terminate those marked REQ_FASTFAIL. Is this acceptable ?
>           (it meets the "always" condition above)
> 
>           Q: so far we've limited the io to those w/ REQ_FASTFAIL.
>             Would we ever want to allow a user to fast fail all i/o
>             regardless of the request flags ? (in case they flags
>             weren't getting set on all the i/o the user wanted to
>             see fail ?)

REQ_FAILFAST appears to influence retries during error recovery so
there may be unexpected side effects of doing this.  But, that said,
I'd say yes.  From my perspective, I'd make this the default behavior.

In talking with our volume manager people, the question raised was
"Why would you want some i/o to fail quickly and some not?"
They even considered non-i/o scsi commands.

I think that if the default behavior is to have fast_io_fail_tmo
enabled, then it should be controlled by REQ_FAILFAST in the
request.  If the default is to have the timer disabled, i.e.,
an admin has to enable it (or define when it's enabled) then
when enabled it should apply to every i/o.  In examining the
patch, it appears to be disabled by default, so our conclusion
is that all i/o should fast fail when enabled.  We also concur
with having fast_io_fail_tmo disabled by default.

I guess this implies leave REQ_FAILFAST to error recovery.  :)

Mike



> 
>           There's a desire to address pending i/o (those on the block
>           request queue or new requests going there) so that if we've
>           crossed point (b) that we also fail them.  The proposal is
>           to add a new state (device ? or queue ?), which would occur
>           as of point (b). All REQ_FASTFAIL io on the queue, as well
>           as on a new io, will be failed with a new i/o status if in
>           this state. Non-REQ_FASTFAIL i/o would continue to enter/sit
>           on the request queue until dev_loss_tmo fires.
> 
>   at (c): per the dev_loss_tmo callback, all i/o should be terminated.
>           Their completions do not have to be synchronous to the return
>           from the callback - they can occur afterward.
> 
> 
> Comments ?
> 
> Assuming that folks agree, I'd like to do this in 2 patches:
>  - one that puts in the transport fast_io_fail_tmo and LLD callback
>  - another that adds the new state, io completion status, and does the
>    handling of the request queue REQ_FASTFAIL i/o.
> 
>>
>> - fast_loss_time recommendation:
>>   In discussing how a admin should set dev_loss_tmo in a multipathing
>>   environment, it became apparent that we expected the admin to know
>>   a lot. They had to know the transport type, what the minimum setting
>>   can be that still survives normal link bouncing, and they may even
>>   have to know about device specifics.  For iSCSI, the proper loss time
>>   may vary widely from session to session.
>>
>>   This attribute is an exported "recommendation" by the LLDD and
>> transport
>>   on what the lowest setting for dev_loss_tmo should be for a
>> multipathing
>>   environment. Thus, the admin only needs to cat this attribute to obtain
>>   the value to echo into dev_loss_tmo.
> 
> The only objection was from Christoph - wanting a utility to get/set this
> stuff. However, the counter was this attribute was still meaningful, as it
> was the conduit to obtain a recommendation from the transport/LLD.
> 
> So - I assume this proceeds as is - with a change in it's description.
> 
>>  
>> I have one criticism of these changes. The callbacks are calling into
>> the LLDD with an rport post the driver's rport_delete call. What it means
>> is that we are essentially extending the lifetime of an rport until the
>> dev_loss_tmo call occurs.
> 
> It's ok - and adding the appropriate comments are fine.
> 
> 
> Thanks.
> 
> -- james s
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
  2006-08-08 21:56   ` Michael Reed
@ 2006-08-08 22:15     ` Michael Reed
  2006-08-09 15:31       ` Michael Reed
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Reed @ 2006-08-08 22:15 UTC (permalink / raw)
  To: Michael Reed; +Cc: James.Smart, linux-scsi



Michael Reed wrote:
> 
> James Smart wrote:
>> Closing Statements:
>>
>> I've attached the original RFC below. See
>>   http://marc.theaimsgroup.com/?l=linux-scsi&m=115082917628466&w=2
>>
>> I've updated it with what I perceive to be the position and resolution
>> based on comments.  Keep in mind that we're trying to lay the groundwork
>> for common behavior and tunables between the transports.
>>
>> Please let me know if I've mis-represented anything, or if there is
>> a dissention in the resolution. I'd like to close on this.
>>
>> James Smart wrote:
>>> Folks,
>>>
>>> The following addresses some long standing todo items I've had in the
>>> FC transport. They primarily arise when considering multipathing, or
>>> trying to marry driver internal state to transport state. It is intended
>>> that this same type of functionality would be usable in other transports
>>> as well.
>>>
>>> Here's what is contained:
>>>
>>> - dev_loss_tmo LLDD callback :
>>>   Currently, there is no notification to the LLDD of when the transport
>>>   gives up on the device returning and starts to return DID_NO_CONNECT
>>>   in the queuecommand helper function. This callback notifies the LLDD
>>>   that the transport has now given up on the rport, thereby acknowledging
>>>   the prior fc_remote_port_delete() call. The callback also expects the
>>>   LLDD to initiate the termination of any outstanding i/o on the rport.
>> I believe there is no dissention on this change.
>> Please note: this is essentially a confirmation from the transport to the
>>   LLD that the rport is fully deleted. Thus, the LLD must expect to see
>>   these callbacks as a normal part of the rport being terminated (even if
>>   it is not blocked).
>>
>> I'll move forward with this.
> 
> Concur.
> 
>>> - fast_io_fail_tmo and LLD callback:
>>>   There are some cases where it may take a long while to truly determine
>>>   device loss, but the system is in a multipathing configuration that if
>>>   the i/o was failed quickly (faster than dev_loss_tmo), it could be
>>>   redirected to a different path and completed sooner (assuming the  
>>> multipath thing knew that the sdev was blocked).
>>>     iSCSI is one of the transports that may vary dev_loss_tmo values
>>>   per session, and you would like fast io failure.
>>
>> The current transport implementation did not specify what happened to
>>   active i/o (given to the driver, in the adapter, but not yet completed
>>   back to the midlayer) when a device was blocked, nor during the
>>   block-to->dev_loss transition period. It was up to the driver.  Many
>>   assumed active i/o was immediately terminated, which is semi-consistent
>>   with the behavior of most drivers for most "connectivity loss" scenarios.
>>
>> The conversations then started to jump around, considering what i/o's you
>>   may want to have fail quickly, etc.
>>
>> Here's my opinion:
>>   We have the following points in time to look at:
>>    (a) the device is blocked by the transport
>>    (b) there is a time T, usually in a multipathing environment, where it
>>        would be useful to error the i/o early rather than wait for dev_loss
>>        It is assumed that any such i/o request would be marked REQ_FASTFAIL
>>    (c) the dev_loss_tmo fires - we're to assume the device is gone
>>   and at any time post (a), the device may return, unblock and never
>>   encounter points (b) and (c).
> 
> REQ_FAILFAST is stored in the request structure.  Are there "issues"
> with using scsi_cmnd.request in the lldd?
> 
>>   As for what happens to active i/o :
>>
>>   always: the driver can fail an i/o at any point in time if it deems
>>           it appropriate.
>>
>>   at (a): There are scenarios where a short link perturbation may occur,
>>           which may not disrupt the i/o. Therefore, we should not force
>>           io to be terminated.
>>
>>   at (b): Minimally, we should terminate all active i/o requests marked
>>           as type REQ_FASTFAIL. From an api perspective, driver support
>>           for this is optional. And we must also assume that there will
>>           be implementations which have to abort all i/o in order to
>>           terminate those marked REQ_FASTFAIL. Is this acceptable ?
>>           (it meets the "always" condition above)
>>
>>           Q: so far we've limited the io to those w/ REQ_FASTFAIL.
>>             Would we ever want to allow a user to fast fail all i/o
>>             regardless of the request flags ? (in case they flags
>>             weren't getting set on all the i/o the user wanted to
>>             see fail ?)
> 
> REQ_FAILFAST appears to influence retries during error recovery so
> there may be unexpected side effects of doing this.  But, that said,
> I'd say yes.  From my perspective, I'd make this the default behavior.
> 
> In talking with our volume manager people, the question raised was
> "Why would you want some i/o to fail quickly and some not?"
> They even considered non-i/o scsi commands.
> 
> I think that if the default behavior is to have fast_io_fail_tmo
> enabled, then it should be controlled by REQ_FAILFAST in the
> request.  If the default is to have the timer disabled, i.e.,
> an admin has to enable it (or define when it's enabled) then
> when enabled it should apply to every i/o.  In examining the
> patch, it appears to be disabled by default, so our conclusion
> is that all i/o should fast fail when enabled.  We also concur
> with having fast_io_fail_tmo disabled by default.

I think this also implies that our volume manager guys would be
just as happy setting dev_loss_tmo to a small value and not use
fast_io_fail_tmo.

Mike

> 
> I guess this implies leave REQ_FAILFAST to error recovery.  :)
> 
> Mike
> 
> 
> 
>>           There's a desire to address pending i/o (those on the block
>>           request queue or new requests going there) so that if we've
>>           crossed point (b) that we also fail them.  The proposal is
>>           to add a new state (device ? or queue ?), which would occur
>>           as of point (b). All REQ_FASTFAIL io on the queue, as well
>>           as on a new io, will be failed with a new i/o status if in
>>           this state. Non-REQ_FASTFAIL i/o would continue to enter/sit
>>           on the request queue until dev_loss_tmo fires.
>>
>>   at (c): per the dev_loss_tmo callback, all i/o should be terminated.
>>           Their completions do not have to be synchronous to the return
>>           from the callback - they can occur afterward.
>>
>>
>> Comments ?
>>
>> Assuming that folks agree, I'd like to do this in 2 patches:
>>  - one that puts in the transport fast_io_fail_tmo and LLD callback
>>  - another that adds the new state, io completion status, and does the
>>    handling of the request queue REQ_FASTFAIL i/o.
>>
>>> - fast_loss_time recommendation:
>>>   In discussing how a admin should set dev_loss_tmo in a multipathing
>>>   environment, it became apparent that we expected the admin to know
>>>   a lot. They had to know the transport type, what the minimum setting
>>>   can be that still survives normal link bouncing, and they may even
>>>   have to know about device specifics.  For iSCSI, the proper loss time
>>>   may vary widely from session to session.
>>>
>>>   This attribute is an exported "recommendation" by the LLDD and
>>> transport
>>>   on what the lowest setting for dev_loss_tmo should be for a
>>> multipathing
>>>   environment. Thus, the admin only needs to cat this attribute to obtain
>>>   the value to echo into dev_loss_tmo.
>> The only objection was from Christoph - wanting a utility to get/set this
>> stuff. However, the counter was this attribute was still meaningful, as it
>> was the conduit to obtain a recommendation from the transport/LLD.
>>
>> So - I assume this proceeds as is - with a change in it's description.
>>
>>>  
>>> I have one criticism of these changes. The callbacks are calling into
>>> the LLDD with an rport post the driver's rport_delete call. What it means
>>> is that we are essentially extending the lifetime of an rport until the
>>> dev_loss_tmo call occurs.
>> It's ok - and adding the appropriate comments are fine.
>>
>>
>> Thanks.
>>
>> -- james s
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
  2006-08-08 22:15     ` Michael Reed
@ 2006-08-09 15:31       ` Michael Reed
  2006-08-10 16:38         ` James Smart
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Reed @ 2006-08-09 15:31 UTC (permalink / raw)
  To: linux-scsi; +Cc: Michael Reed, James.Smart



Michael Reed wrote:

...snip...

>>>> - fast_io_fail_tmo and LLD callback:

>>>
>>>   at (b): Minimally, we should terminate all active i/o requests marked
>>>           as type REQ_FASTFAIL. From an api perspective, driver support
>>>           for this is optional. And we must also assume that there will
>>>           be implementations which have to abort all i/o in order to
>>>           terminate those marked REQ_FASTFAIL. Is this acceptable ?
>>>           (it meets the "always" condition above)
>>>
>>>           Q: so far we've limited the io to those w/ REQ_FASTFAIL.
>>>             Would we ever want to allow a user to fast fail all i/o
>>>             regardless of the request flags ? (in case they flags
>>>             weren't getting set on all the i/o the user wanted to
>>>             see fail ?)
>> REQ_FAILFAST appears to influence retries during error recovery so
>> there may be unexpected side effects of doing this.  But, that said,
>> I'd say yes.  From my perspective, I'd make this the default behavior.
>>
>> In talking with our volume manager people, the question raised was
>> "Why would you want some i/o to fail quickly and some not?"
>> They even considered non-i/o scsi commands.
>>
...snip...

> 

In thinking about this over night, I would like to withdraw my previous
comments.  (Hence the snip!)

Let's take the case of real time capture from a device and
post processing of that data.  The capture operation would
likely want a fast fail to avoid dropping data.  The post
processing of that data would like to wait for the device
to return to avoid disruption and potential premature termination
of the job.

Under the above scenario, assuming it's a valid scenario, are there
mechanisms in place to allow an application to tag an i/o stream
for fast fail?

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
  2006-08-09 15:31       ` Michael Reed
@ 2006-08-10 16:38         ` James Smart
  0 siblings, 0 replies; 15+ messages in thread
From: James Smart @ 2006-08-10 16:38 UTC (permalink / raw)
  To: Michael Reed; +Cc: linux-scsi



Michael Reed wrote:
> In thinking about this over night, I would like to withdraw my previous
> comments.  (Hence the snip!)
> 
> Let's take the case of real time capture from a device and
> post processing of that data.  The capture operation would
> likely want a fast fail to avoid dropping data.  The post
> processing of that data would like to wait for the device
> to return to avoid disruption and potential premature termination
> of the job.
> 
> Under the above scenario, assuming it's a valid scenario, are there
> mechanisms in place to allow an application to tag an i/o stream
> for fast fail?

Two comments:
- This conflicts with Christoph's last comment about fast_fail failing all
   i/o's. I prefer the fail all, as its the easier, more straight-forward
   approach. Easiest to explain behavior for too.
- As for tagging i/o from an application, the only places I see in the
   kernel setting the flags that make REQ_FAILFAST get set are strictly
   in-the-kernel items (multipath). I don't see any way for an application
   to mark this.  It implies that to do what you stated above, requires the
   app has to serially perform each mode - and to do so at a target level.
   E.g. set fast fail, perform capture, set no fast fail, perform post
   processing.

-- james s



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
  2006-08-08 17:54 ` [RFC] [Last Rites] " James Smart
  2006-08-08 21:56   ` Michael Reed
@ 2006-08-09 17:36   ` Christoph Hellwig
  2006-08-10 16:17     ` James Smart
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2006-08-09 17:36 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi

On Tue, Aug 08, 2006 at 01:54:27PM -0400, James Smart wrote:
> >Here's what is contained:
> >
> >- dev_loss_tmo LLDD callback :
> >  Currently, there is no notification to the LLDD of when the transport
> >  gives up on the device returning and starts to return DID_NO_CONNECT
> >  in the queuecommand helper function. This callback notifies the LLDD
> >  that the transport has now given up on the rport, thereby acknowledging
> >  the prior fc_remote_port_delete() call. The callback also expects the
> >  LLDD to initiate the termination of any outstanding i/o on the rport.
> 
> I believe there is no dissention on this change.
> Please note: this is essentially a confirmation from the transport to the
>   LLD that the rport is fully deleted. Thus, the LLD must expect to see
>   these callbacks as a normal part of the rport being terminated (even if
>   it is not blocked).
> 
> I'll move forward with this.

ACK.

> >- fast_io_fail_tmo and LLD callback:
> >  There are some cases where it may take a long while to truly determine
> >  device loss, but the system is in a multipathing configuration that if
> >  the i/o was failed quickly (faster than dev_loss_tmo), it could be
> >  redirected to a different path and completed sooner (assuming the 
> >  multipath thing knew that the sdev was blocked).
> >  
> >  iSCSI is one of the transports that may vary dev_loss_tmo values
> >  per session, and you would like fast io failure.
> 
> 
> The current transport implementation did not specify what happened to
>   active i/o (given to the driver, in the adapter, but not yet completed
>   back to the midlayer) when a device was blocked, nor during the
>   block-to->dev_loss transition period. It was up to the driver.  Many
>   assumed active i/o was immediately terminated, which is semi-consistent
>   with the behavior of most drivers for most "connectivity loss" scenarios.
> 
> The conversations then started to jump around, considering what i/o's you
>   may want to have fail quickly, etc.
> 
> Here's my opinion:
>   We have the following points in time to look at:
>    (a) the device is blocked by the transport
>    (b) there is a time T, usually in a multipathing environment, where it
>        would be useful to error the i/o early rather than wait for dev_loss
>        It is assumed that any such i/o request would be marked REQ_FASTFAIL
>    (c) the dev_loss_tmo fires - we're to assume the device is gone
>   and at any time post (a), the device may return, unblock and never
>   encounter points (b) and (c).
> 
>   As for what happens to active i/o :
> 
>   always: the driver can fail an i/o at any point in time if it deems
>           it appropriate.
> 
>   at (a): There are scenarios where a short link perturbation may occur,
>           which may not disrupt the i/o. Therefore, we should not force
>           io to be terminated.

Ok..

> 
>   at (b): Minimally, we should terminate all active i/o requests marked
>           as type REQ_FASTFAIL. From an api perspective, driver support
>           for this is optional. And we must also assume that there will
>           be implementations which have to abort all i/o in order to
>           terminate those marked REQ_FASTFAIL. Is this acceptable ?
>           (it meets the "always" condition above)
> 
>           Q: so far we've limited the io to those w/ REQ_FASTFAIL.
>             Would we ever want to allow a user to fast fail all i/o
>             regardless of the request flags ? (in case they flags
>             weren't getting set on all the i/o the user wanted to
>             see fail ?)

I think we should fail all.  It's not like an unprivilegued process could
request FASTFAIL.  The administrator should know what she/he is doing.

>           There's a desire to address pending i/o (those on the block
>           request queue or new requests going there) so that if we've
>           crossed point (b) that we also fail them.  The proposal is
>           to add a new state (device ? or queue ?), which would occur
>           as of point (b). All REQ_FASTFAIL io on the queue, as well
>           as on a new io, will be failed with a new i/o status if in
>           this state. Non-REQ_FASTFAIL i/o would continue to enter/sit
>           on the request queue until dev_loss_tmo fires.

We have a queue per device, so adding another scsi_device state sound
like the right way to go aheade.

>   at (c): per the dev_loss_tmo callback, all i/o should be terminated.
>           Their completions do not have to be synchronous to the return
>           from the callback - they can occur afterward.

ACK.



> >- fast_loss_time recommendation:
> >  In discussing how a admin should set dev_loss_tmo in a multipathing
> >  environment, it became apparent that we expected the admin to know
> >  a lot. They had to know the transport type, what the minimum setting
> >  can be that still survives normal link bouncing, and they may even
> >  have to know about device specifics.  For iSCSI, the proper loss time
> >  may vary widely from session to session.
> >
> >  This attribute is an exported "recommendation" by the LLDD and transport
> >  on what the lowest setting for dev_loss_tmo should be for a multipathing
> >  environment. Thus, the admin only needs to cat this attribute to obtain
> >  the value to echo into dev_loss_tmo.
> 
> The only objection was from Christoph - wanting a utility to get/set this
> stuff. However, the counter was this attribute was still meaningful, as it
> was the conduit to obtain a recommendation from the transport/LLD.
> 
> So - I assume this proceeds as is - with a change in it's description.

I must say I'm still not happy with this.  It's really policy that we
try to keep out of the kernel.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
  2006-08-09 17:36   ` Christoph Hellwig
@ 2006-08-10 16:17     ` James Smart
  2006-08-10 20:01       ` Mike Christie
  0 siblings, 1 reply; 15+ messages in thread
From: James Smart @ 2006-08-10 16:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi



Christoph Hellwig wrote:
>>   at (b): Minimally, we should terminate all active i/o requests marked
>>           as type REQ_FASTFAIL. From an api perspective, driver support
>>           for this is optional. And we must also assume that there will
>>           be implementations which have to abort all i/o in order to
>>           terminate those marked REQ_FASTFAIL. Is this acceptable ?
>>           (it meets the "always" condition above)
>>
>>           Q: so far we've limited the io to those w/ REQ_FASTFAIL.
>>             Would we ever want to allow a user to fast fail all i/o
>>             regardless of the request flags ? (in case they flags
>>             weren't getting set on all the i/o the user wanted to
>>             see fail ?)
> 
> I think we should fail all.  It's not like an unprivilegued process could
> request FASTFAIL.  The administrator should know what she/he is doing.

Good.  All it is.

>>> - fast_loss_time recommendation:
>>>  In discussing how a admin should set dev_loss_tmo in a multipathing
>>>  environment, it became apparent that we expected the admin to know
>>>  a lot. They had to know the transport type, what the minimum setting
>>>  can be that still survives normal link bouncing, and they may even
>>>  have to know about device specifics.  For iSCSI, the proper loss time
>>>  may vary widely from session to session.
>>>
>>>  This attribute is an exported "recommendation" by the LLDD and transport
>>>  on what the lowest setting for dev_loss_tmo should be for a multipathing
>>>  environment. Thus, the admin only needs to cat this attribute to obtain
>>>  the value to echo into dev_loss_tmo.
>> The only objection was from Christoph - wanting a utility to get/set this
>> stuff. However, the counter was this attribute was still meaningful, as it
>> was the conduit to obtain a recommendation from the transport/LLD.
>>
>> So - I assume this proceeds as is - with a change in it's description.
> 
> I must say I'm still not happy with this.  It's really policy that we
> try to keep out of the kernel.

Ok. I'll drop this. I don't think it was that important for FC. Mike Christie
had some better arguments for iSCSI.

-- james

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
  2006-08-10 16:17     ` James Smart
@ 2006-08-10 20:01       ` Mike Christie
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Christie @ 2006-08-10 20:01 UTC (permalink / raw)
  To: James.Smart; +Cc: Christoph Hellwig, linux-scsi

James Smart wrote:
> 
> 
> Christoph Hellwig wrote:
>>>   at (b): Minimally, we should terminate all active i/o requests marked
>>>           as type REQ_FASTFAIL. From an api perspective, driver support
>>>           for this is optional. And we must also assume that there will
>>>           be implementations which have to abort all i/o in order to
>>>           terminate those marked REQ_FASTFAIL. Is this acceptable ?
>>>           (it meets the "always" condition above)
>>>
>>>           Q: so far we've limited the io to those w/ REQ_FASTFAIL.
>>>             Would we ever want to allow a user to fast fail all i/o
>>>             regardless of the request flags ? (in case they flags
>>>             weren't getting set on all the i/o the user wanted to
>>>             see fail ?)
>>
>> I think we should fail all.  It's not like an unprivilegued process could
>> request FASTFAIL.  The administrator should know what she/he is doing.
> 
> Good.  All it is.
> 
>>>> - fast_loss_time recommendation:
>>>>  In discussing how a admin should set dev_loss_tmo in a multipathing
>>>>  environment, it became apparent that we expected the admin to know
>>>>  a lot. They had to know the transport type, what the minimum setting
>>>>  can be that still survives normal link bouncing, and they may even
>>>>  have to know about device specifics.  For iSCSI, the proper loss time
>>>>  may vary widely from session to session.
>>>>
>>>>  This attribute is an exported "recommendation" by the LLDD and
>>>> transport
>>>>  on what the lowest setting for dev_loss_tmo should be for a
>>>> multipathing
>>>>  environment. Thus, the admin only needs to cat this attribute to
>>>> obtain
>>>>  the value to echo into dev_loss_tmo.
>>> The only objection was from Christoph - wanting a utility to get/set
>>> this
>>> stuff. However, the counter was this attribute was still meaningful,
>>> as it
>>> was the conduit to obtain a recommendation from the transport/LLD.
>>>
>>> So - I assume this proceeds as is - with a change in it's description.
>>
>> I must say I'm still not happy with this.  It's really policy that we
>> try to keep out of the kernel.
> 
> Ok. I'll drop this. I don't think it was that important for FC. Mike
> Christie
> had some better arguments for iSCSI.
> 

For softwware iscsi we probably do not need kernel support since we do
so much setup in userspace we can easily have the admin pass in a
different argument for different defaults and we also have a way to set
different values and store them in userspace. So no big deal. For Hw
iscsi, we can work similar to FC so we should have no problem there too.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-08-10 21:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-20 18:45 [RFC] fc transport: extensions for fast fail and dev loss James Smart
2006-07-25 17:12 ` Mike Christie
2006-07-25 18:49   ` James Smart
2006-07-25 21:15     ` Michael Reed
2006-07-26  3:33       ` James Smart
2006-07-26  9:20 ` Christoph Hellwig
2006-07-26 16:35   ` James Smart
2006-08-08 17:54 ` [RFC] [Last Rites] " James Smart
2006-08-08 21:56   ` Michael Reed
2006-08-08 22:15     ` Michael Reed
2006-08-09 15:31       ` Michael Reed
2006-08-10 16:38         ` James Smart
2006-08-09 17:36   ` Christoph Hellwig
2006-08-10 16:17     ` James Smart
2006-08-10 20:01       ` Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox