[PATCH] make fc transport removal of target configurable

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] make fc transport removal of target configurable
@ 2006-06-12 23:16 Michael Reed
  2006-06-13  7:07 ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Reed @ 2006-06-12 23:16 UTC (permalink / raw)
  To: linux-scsi
  Cc: James.Smart, Jim Nead, Jeremy Higdon, Michael Reed, Gary Hagensen

[-- Attachment #1: Type: text/plain, Size: 377 bytes --]

If the fc transport removes the scsi infrastructure for a
disconnected target and that target subsequently returns,
those subsystems layered upon scsi which don't understand
the implications of this disconnection / reconnection may
be unable to access the reconnected scsi target.  This patch
makes the target removal configurable.

Signed-off-by: Michael Reed <mdr@sgi.com>



[-- Attachment #2: fc_transport_optional_remove.patch --]
[-- Type: text/x-patch, Size: 2209 bytes --]

--- rc6u/drivers/scsi/scsi_transport_fc.c	2006-06-07 12:21:31.000000000 -0500
+++ rc6/drivers/scsi/scsi_transport_fc.c	2006-06-12 17:23:34.135974222 -0500
@@ -374,9 +374,29 @@
 MODULE_PARM_DESC(dev_loss_tmo,
 		 "Maximum number of seconds that the FC transport should"
 		 " insulate the loss of a remote port. Once this value is"
-		 " exceeded, the scsi target is removed. Value should be"
+		 " exceeded, the scsi target may be removed. Reference the
+		 " remove_on_dev_loss module parameter.  Value should be"
 		 " between 1 and SCSI_DEVICE_BLOCK_MAX_TIMEOUT.");
 
+/*
+ * remove_on_dev_loss: controls whether the transport will
+ *   remove a scsi target after the device loss timer expires.
+ *   Removal on disconnect is modeled after the USB subsystem
+ *   and expects subsystems layered on SCSI to be aware of
+ *   potential device loss and handle it appropriately. However,
+ *   many subsystems do not support device removal, leaving situations
+ *   where structure references may remain, causing new device
+ *   name assignments, etc., if the target returns.
+ */
+static unsigned int fc_remove_on_dev_loss = 0;
+module_param_named(remove_on_dev_loss, fc_remove_on_dev_loss,
+		   int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(remove_on_dev_loss,
+    		"Boolean.  When the device loss timer fires, this variable"
+		" controls whether the scsi infrastructure for the target"
+		" device is removed.  Values: zero means do not remove,"
+		" non-zero means remove.  Default is zero.");
+
 
 static __init int fc_transport_init(void)
 {
@@ -1446,7 +1466,8 @@
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 
-	scsi_remove_target(&rport->dev);
+	if (fc_remove_on_dev_loss)
+		scsi_remove_target(&rport->dev);
 }
 
 
@@ -1992,9 +2013,13 @@
 		return;
 	}
 
-	dev_printk(KERN_ERR, &rport->dev,
-		"blocked FC remote port time out: removing target and "
-		"saving binding\n");
+	if (fc_remove_on_dev_loss)
+		dev_printk(KERN_ERR, &rport->dev,
+			"blocked FC remote port time out: removing target and "
+			"saving binding\n");
+	else
+		dev_printk(KERN_ERR, &rport->dev,
+			"blocked FC remote port time out: saving binding\n");
 
 	list_move_tail(&rport->peers, &fc_host->rport_bindings);
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-12 23:16 [PATCH] make fc transport removal of target configurable Michael Reed
@ 2006-06-13  7:07 ` Christoph Hellwig
  2006-06-13 11:06   ` James Smart
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2006-06-13  7:07 UTC (permalink / raw)
  To: Michael Reed
  Cc: linux-scsi, James.Smart, Jim Nead, Jeremy Higdon, Gary Hagensen

On Mon, Jun 12, 2006 at 06:16:42PM -0500, Michael Reed wrote:
> If the fc transport removes the scsi infrastructure for a
> disconnected target and that target subsequently returns,
> those subsystems layered upon scsi which don't understand
> the implications of this disconnection / reconnection may
> be unable to access the reconnected scsi target.  This patch
> makes the target removal configurable.

NACK, we don't want to keep dead targets around.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13  7:07 ` Christoph Hellwig
@ 2006-06-13 11:06   ` James Smart
  2006-06-13 15:42     ` Michael Reed
  0 siblings, 1 reply; 18+ messages in thread
From: James Smart @ 2006-06-13 11:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael Reed, linux-scsi, Jim Nead, Jeremy Higdon, Gary Hagensen

We are seriously in trouble if the subsystems above us don't know how
to deal with dead targets. We are encountering scenarios in which the
data structures are staying around due to references, but for all other
intents they're gone.  I know that DM has yet to fully account for this.
md - it's dead. Applications... they have no clue.

I think we should seriously reconsider this position. FC is the only major
storage transport that does this (USB doesn't count). Parallel SCSI
doesn't, iSCSI doesn't, SAS doesn't.  If the device was truly gone, ok.
But, if we expect the device to come alive again sometime in the future,
why not keep the tree in place ?

-- james s

Christoph Hellwig wrote:
> On Mon, Jun 12, 2006 at 06:16:42PM -0500, Michael Reed wrote:
>> If the fc transport removes the scsi infrastructure for a
>> disconnected target and that target subsequently returns,
>> those subsystems layered upon scsi which don't understand
>> the implications of this disconnection / reconnection may
>> be unable to access the reconnected scsi target.  This patch
>> makes the target removal configurable.
> 
> NACK, we don't want to keep dead targets around.
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 11:06   ` James Smart
@ 2006-06-13 15:42     ` Michael Reed
  2006-06-13 17:24       ` Stefan Richter
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Michael Reed @ 2006-06-13 15:42 UTC (permalink / raw)
  To: James.Smart
  Cc: Christoph Hellwig, linux-scsi, Jim Nead, Jeremy Higdon,
	Gary Hagensen

James Smart wrote:
> We are seriously in trouble if the subsystems above us don't know how
> to deal with dead targets. We are encountering scenarios in which the
> data structures are staying around due to references, but for all other
> intents they're gone.  I know that DM has yet to fully account for this.
> md - it's dead. Applications... they have no clue.

Mounted file systems have no clue either.  Even with no activity on the
fs, if the target stays missing beyond the device loss timeout and then
returns, the file system cannot be accessed without intervention.

When the target does return, the file system has to be unmounted and
remounted on a new "sd" device.  This is even if there was no activity
on the file system while its target was absent, i.e., it wouldn't otherwise
require an unmount/remount.

> 
> I think we should seriously reconsider this position. FC is the only major
> storage transport that does this (USB doesn't count). Parallel SCSI
> doesn't, iSCSI doesn't, SAS doesn't.  If the device was truly gone, ok.
> But, if we expect the device to come alive again sometime in the future,
> why not keep the tree in place ?

Treating fibre channel like removable storage is wrong.  Fibre targets aren't
generally supposed to go away.  If they do, there's a significant chance
that they'll be repaired and returned to service.  It makes sense to keep
the infrastructure in place just like scsi, sas, iscsi, ata.

The kind of disruption the current code can cause to systems with multi-terabytes
or petabytes of storage will be considered unacceptable in a production environment.

So, I also wish to encourage a reconsideration of the position that dead targets
should be removed.  Removing removable storage targets like firewire and usb
makes sense.  I just don't believe that the same applies to fibre channel
or other generally non-removable targets.

Mike

> 
> -- james s
> 
> 
> 
> Christoph Hellwig wrote:
>> On Mon, Jun 12, 2006 at 06:16:42PM -0500, Michael Reed wrote:
>>> If the fc transport removes the scsi infrastructure for a
>>> disconnected target and that target subsequently returns,
>>> those subsystems layered upon scsi which don't understand
>>> the implications of this disconnection / reconnection may
>>> be unable to access the reconnected scsi target.  This patch
>>> makes the target removal configurable.
>>
>> NACK, we don't want to keep dead targets around.
>>
>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 15:42     ` Michael Reed
@ 2006-06-13 17:24       ` Stefan Richter
  2006-06-13 19:36         ` Michael Reed
  2006-06-13 17:33       ` Steve Byan
  2006-06-13 17:59       ` James Bottomley
  2 siblings, 1 reply; 18+ messages in thread
From: Stefan Richter @ 2006-06-13 17:24 UTC (permalink / raw)
  To: Michael Reed
  Cc: James.Smart, Christoph Hellwig, linux-scsi, Jim Nead,
	Jeremy Higdon, Gary Hagensen

Michael Reed wrote:
> James Smart wrote:
>> We are seriously in trouble if the subsystems above us don't know how
>> to deal with dead targets. We are encountering scenarios in which the
>> data structures are staying around due to references, but for all other
>> intents they're gone.  I know that DM has yet to fully account for this.
>> md - it's dead. Applications... they have no clue.
> 
> Mounted file systems have no clue either.  Even with no activity on the
> fs, if the target stays missing beyond the device loss timeout and then
> returns, the file system cannot be accessed without intervention.
> 
> When the target does return, the file system has to be unmounted and
> remounted on a new "sd" device.  This is even if there was no activity
> on the file system while its target was absent, i.e., it wouldn't otherwise
> require an unmount/remount.

Michael, I don't understand how your patch fits into this picture.

There is presently the FC transport parameter 'dev_loss_tmo', which is
    "Maximum number of seconds that the FC transport should"
    " insulate the loss of a remote port. Once this value is"
    " exceeded, the scsi target {is|may be} removed. {%|Reference"
    " the remove_on_dev_loss module parameter.} Value should be"
    " between 1 and SCSI_DEVICE_BLOCK_MAX_TIMEOUT.");

Then you are adding the parameter 'remove_on_dev_loss', which is
    "Boolean.  When the device loss timer fires, this variable"
    " controls whether the scsi infrastructure for the target"
    " device is removed.  Values: zero means do not remove,"
    " non-zero means remove.  Default is zero.");

I think the 2nd parameter does not help anyone. What you rather seem to
need is
 a) the existing dev_loss_tmo parameter but without the kernel
    enforcing an upper limit for it [the admin sets the policy, not
    the kernel], and
 b) the transport layer or the SCSI core taking care that no SCSI
    command times out during the tolerated absence of a target.

So, for every layer above the transport layer or of SCSI core (SCSI
command set drivers and sg driver, block layer, filesystem...),
everything becomes fully transparent. These layers do not notice absence
of the target. If anything at all, they merely notice that commands take
unusually long to complete.

Of course there are practical limits to this:
 - We don't want to wait ages for commands to complete or to fail.
 - The device's state may have changed arbitrarily during its absence
   due to an external influence, leading to corruption when it comes
   back.
But again, the decision about the limit for such tolerated absence
should be a decision by the admin, not one by the kernel. The driver
software and the involved kernel infrastructure should merely provide
mechanisms but not enforce a policy, at least not to unnecessary extent.

Anyhow. My point is: It seems what you want is 1. to let the admin set
an arbitrary dev_loss_tmo and 2. the transport or the SCSI core taking
care that no commands time out during that period.

Where to implement this? The transport layer has the benefit to have a
better notion of target states because it is closer to the interconnect
layer than the SCSI core. On the other hand, the SCSI core is rather the
place where mechanisms to handle the lifecycle of targets and especially
of commands exist.

The SCSI core seems appropriate for another reason: The issue at hand is
not really specific to the FC transport. Maybe we want dev_loss_tmo to
be independently configurable for different transports or on a
per-host-adapter basis, or on a per target basis. But generally,
temporary absence of a target is a *natural and common state* for some
other transports besides FC. (Example: Bus reset phase and rescanning of
FireWire interconnect == connection loss and subsequent reconnect or
re-login of SBP-2 transport. This is a rather short period, but I
already thought about implementing a prolongued state of absence in sbp2
for two other specific purposes.)

If it was decided to implement this "tolerated temporary absence of a
target" in SCSI core, then the SCSI core's state machine would "simply"
have to handle another target state.

I put "simply" into quotes because the existing state model seems not to
be exactly at a point where you could immediately proceed to add such
additional state. In particular, the SCSI core does not yet support the
state "device temporarily not accessible". The state "device blocked" is
similar but ultimately not the same. Besides, the SCSI core does also
not distinguish the state transitions "device operational -> device
removal requested" versus "device operational -> device hot unplugged".
(The latter transition does not exist for SCSI core; transport layers or
low-level drivers have to initiate the transition to "device removal
requested" and work around the subsequent problems when it was actually
a hot unplug.)

Side note to everything above: Yes, I may have missed something, so
correct me.
-- 
Stefan Richter
-=====-=-==- -==- -==-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 17:24       ` Stefan Richter
@ 2006-06-13 19:36         ` Michael Reed
  2006-06-13 23:13           ` Stefan Richter
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Reed @ 2006-06-13 19:36 UTC (permalink / raw)
  To: Stefan Richter
  Cc: James.Smart, Christoph Hellwig, linux-scsi, Jim Nead,
	Jeremy Higdon, Gary Hagensen



Stefan Richter wrote:
> Michael Reed wrote:
>> James Smart wrote:
>>> We are seriously in trouble if the subsystems above us don't know how
>>> to deal with dead targets. We are encountering scenarios in which the
>>> data structures are staying around due to references, but for all other
>>> intents they're gone.  I know that DM has yet to fully account for this.
>>> md - it's dead. Applications... they have no clue.
>> Mounted file systems have no clue either.  Even with no activity on the
>> fs, if the target stays missing beyond the device loss timeout and then
>> returns, the file system cannot be accessed without intervention.
>>
>> When the target does return, the file system has to be unmounted and
>> remounted on a new "sd" device.  This is even if there was no activity
>> on the file system while its target was absent, i.e., it wouldn't otherwise
>> require an unmount/remount.
> 
> Michael, I don't understand how your patch fits into this picture.

The patch allows the target to return to its existing infrastructure
following a prolonged absence due to, say, a kicked cable or raid controller
reboot.  Current file systems, volume managers, and multi-path drivers
do not seem to tolerate the return of a target to new infrastructure.

> 
> There is presently the FC transport parameter 'dev_loss_tmo', which is
>     "Maximum number of seconds that the FC transport should"
>     " insulate the loss of a remote port. Once this value is"
>     " exceeded, the scsi target {is|may be} removed. {%|Reference"
>     " the remove_on_dev_loss module parameter.} Value should be"
>     " between 1 and SCSI_DEVICE_BLOCK_MAX_TIMEOUT.");
> 
> Then you are adding the parameter 'remove_on_dev_loss', which is
>     "Boolean.  When the device loss timer fires, this variable"
>     " controls whether the scsi infrastructure for the target"
>     " device is removed.  Values: zero means do not remove,"
>     " non-zero means remove.  Default is zero.");
> 
> I think the 2nd parameter does not help anyone. What you rather seem to
> need is
>  a) the existing dev_loss_tmo parameter but without the kernel
>     enforcing an upper limit for it [the admin sets the policy, not
>     the kernel], and
>  b) the transport layer or the SCSI core taking care that no SCSI
>     command times out during the tolerated absence of a target.

Actually, I do not want this.  The limit on the dev_loss_tmo parameter
is there to allow error notification to eventually pass up the stack.
This is important in path failover situations.  An infinite value here
would imply that commands do not time out.

> 
> So, for every layer above the transport layer or of SCSI core (SCSI
> command set drivers and sg driver, block layer, filesystem...),
> everything becomes fully transparent. These layers do not notice absence
> of the target. If anything at all, they merely notice that commands take
> unusually long to complete.

The transport currently holds off commands with a combination of DID_IMM_RETRY,
blocking the target so that no new commands are issued, and holding off
error recovery until the dev loss timer expires.  This is the behavior that
is desired.

What I want is to have the device, when it returns, reconnect to it's
existing infrastructure.  This allows previously connected "users"
to reconnect.

Mike


> 
> Of course there are practical limits to this:
>  - We don't want to wait ages for commands to complete or to fail.
>  - The device's state may have changed arbitrarily during its absence
>    due to an external influence, leading to corruption when it comes
>    back.
> But again, the decision about the limit for such tolerated absence
> should be a decision by the admin, not one by the kernel. The driver
> software and the involved kernel infrastructure should merely provide
> mechanisms but not enforce a policy, at least not to unnecessary extent.
> 
> Anyhow. My point is: It seems what you want is 1. to let the admin set
> an arbitrary dev_loss_tmo and 2. the transport or the SCSI core taking
> care that no commands time out during that period.
> 
> Where to implement this? The transport layer has the benefit to have a
> better notion of target states because it is closer to the interconnect
> layer than the SCSI core. On the other hand, the SCSI core is rather the
> place where mechanisms to handle the lifecycle of targets and especially
> of commands exist.
> 
> The SCSI core seems appropriate for another reason: The issue at hand is
> not really specific to the FC transport. Maybe we want dev_loss_tmo to
> be independently configurable for different transports or on a
> per-host-adapter basis, or on a per target basis. But generally,
> temporary absence of a target is a *natural and common state* for some
> other transports besides FC. (Example: Bus reset phase and rescanning of
> FireWire interconnect == connection loss and subsequent reconnect or
> re-login of SBP-2 transport. This is a rather short period, but I
> already thought about implementing a prolongued state of absence in sbp2
> for two other specific purposes.)
> 
> If it was decided to implement this "tolerated temporary absence of a
> target" in SCSI core, then the SCSI core's state machine would "simply"
> have to handle another target state.
> 
> I put "simply" into quotes because the existing state model seems not to
> be exactly at a point where you could immediately proceed to add such
> additional state. In particular, the SCSI core does not yet support the
> state "device temporarily not accessible". The state "device blocked" is
> similar but ultimately not the same. Besides, the SCSI core does also
> not distinguish the state transitions "device operational -> device
> removal requested" versus "device operational -> device hot unplugged".
> (The latter transition does not exist for SCSI core; transport layers or
> low-level drivers have to initiate the transition to "device removal
> requested" and work around the subsequent problems when it was actually
> a hot unplug.)
> 
> Side note to everything above: Yes, I may have missed something, so
> correct me.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 19:36         ` Michael Reed
@ 2006-06-13 23:13           ` Stefan Richter
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Richter @ 2006-06-13 23:13 UTC (permalink / raw)
  To: Michael Reed
  Cc: James.Smart, Christoph Hellwig, linux-scsi, Jim Nead,
	Jeremy Higdon, Gary Hagensen

Michael Reed wrote:
> Stefan Richter wrote:
[...]
>>I think the 2nd parameter does not help anyone. What you rather seem to
>>need is
>> a) the existing dev_loss_tmo parameter but without the kernel
>>    enforcing an upper limit for it [the admin sets the policy, not
>>    the kernel], and
>> b) the transport layer or the SCSI core taking care that no SCSI
>>    command times out during the tolerated absence of a target.
> 
> Actually, I do not want this.  The limit on the dev_loss_tmo parameter
> is there to allow error notification to eventually pass up the stack.
> This is important in path failover situations.  An infinite value here
> would imply that commands do not time out.

Wouldn't be path failover simply mean that the transport returns the 
target from "absent" to "available" when the alternative path kicks in? 
This should happen much earlier than after 'infinite' time.

[...]
> The transport currently holds off commands with a combination of DID_IMM_RETRY,
> blocking the target so that no new commands are issued, and holding off
> error recovery until the dev loss timer expires.

Ah, I didn't know that yet. Still, people should think about moving this 
or similar behaviour (IOW, the notion of "temporarily absent" state of 
targets or units) up into SCSI core. Then the remaining responsibility 
of the transport is to determine _when_ to report which connection state 
transitions (based on hardware events, user-configurable timers, sysfs 
events etc.), not _how_ to handle tasks for these targets or units in 
the various states.

> This is the behavior that is desired.
> 
> What I want is to have the device, when it returns, reconnect to it's
> existing infrastructure.  This allows previously connected "users"
> to reconnect.

Yes, clearly.
-- 
Stefan Richter
-=====-=-==- -==- -===-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 15:42     ` Michael Reed
  2006-06-13 17:24       ` Stefan Richter
@ 2006-06-13 17:33       ` Steve Byan
  2006-06-13 19:35         ` Michael Reed
  2006-06-13 17:59       ` James Bottomley
  2 siblings, 1 reply; 18+ messages in thread
From: Steve Byan @ 2006-06-13 17:33 UTC (permalink / raw)
  To: Michael Reed
  Cc: James.Smart, Christoph Hellwig, linux-scsi, Jim Nead,
	Jeremy Higdon, Gary Hagensen


On Jun 13, 2006, at 11:42 AM, Michael Reed wrote:

> Treating fibre channel like removable storage is wrong.  Fibre  
> targets aren't
> generally supposed to go away.  If they do, there's a significant  
> chance
> that they'll be repaired and returned to service.  It makes sense  
> to keep
> the infrastructure in place just like scsi, sas, iscsi, ata.

In both Fibre Channel SANs and iSCSI SANs, administrators in large  
datacenters will re-zone devices with some regularity as they  
redeploy applications among existing systems.

> The kind of disruption the current code can cause to systems with  
> multi-terabytes
> or petabytes of storage will be considered unacceptable in a  
> production environment.

Agreed; but Fibre Channel and SAN devices _will_ come and go  
dynamically.

> So, I also wish to encourage a reconsideration of the position that  
> dead targets
> should be removed.  Removing removable storage targets like  
> firewire and usb
> makes sense.  I just don't believe that the same applies to fibre  
> channel
> or other generally non-removable targets.

Think of removing a Fibre Channel or iSCSI device as "removing access  
authorization". You're correct that these devices are not often  
physically removed from the SAN, but access authorization may change  
frequently.

Regards,
-Steve
-- 
Steve Byan <smb@egenera.com>
Software Architect
Egenera, Inc.
165 Forest Street
Marlboro, MA 01752
(508) 858-3125



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 17:33       ` Steve Byan
@ 2006-06-13 19:35         ` Michael Reed
  2006-06-13 19:49           ` Steve Byan
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Reed @ 2006-06-13 19:35 UTC (permalink / raw)
  To: Steve Byan
  Cc: James.Smart, Christoph Hellwig, linux-scsi, Jim Nead,
	Jeremy Higdon, Gary Hagensen



Steve Byan wrote:
> On Jun 13, 2006, at 11:42 AM, Michael Reed wrote:
> 
>> Treating fibre channel like removable storage is wrong.  Fibre  
>> targets aren't
>> generally supposed to go away.  If they do, there's a significant  
>> chance
>> that they'll be repaired and returned to service.  It makes sense  
>> to keep
>> the infrastructure in place just like scsi, sas, iscsi, ata.
> 
> In both Fibre Channel SANs and iSCSI SANs, administrators in large  
> datacenters will re-zone devices with some regularity as they  
> redeploy applications among existing systems.

Yes, they will.  And don't they generally do this gracefully, i.e,
shut down access to file systems or devices before rezoning?  And when
the target is rezoned to the host again don't they expect to be able
to resume using it.  This patch allows that to happen with no
user intervention.

> 
>> The kind of disruption the current code can cause to systems with  
>> multi-terabytes
>> or petabytes of storage will be considered unacceptable in a  
>> production environment.
> 
> Agreed; but Fibre Channel and SAN devices _will_ come and go  
> dynamically.

My concern is when targets go in an uncontrolled manner.

> 
>> So, I also wish to encourage a reconsideration of the position that  
>> dead targets
>> should be removed.  Removing removable storage targets like  
>> firewire and usb
>> makes sense.  I just don't believe that the same applies to fibre  
>> channel
>> or other generally non-removable targets.
> 
> Think of removing a Fibre Channel or iSCSI device as "removing access  
> authorization". You're correct that these devices are not often  
> physically removed from the SAN, but access authorization may change  
> frequently.

When the target "self-removes" and "self-attaches" I want the existing
user of the target to be able to resume use.  By not removing the
infrastructure in these situations less disruption to a production
system occurs.  The existing "user" has to be tolerant of errors during
this period of absence if it wishs to resume use.

This patch allows that to happen if the admin so desires.

Mike


> 
> Regards,
> -Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 19:35         ` Michael Reed
@ 2006-06-13 19:49           ` Steve Byan
  0 siblings, 0 replies; 18+ messages in thread
From: Steve Byan @ 2006-06-13 19:49 UTC (permalink / raw)
  To: Michael Reed
  Cc: James.Smart, Christoph Hellwig, linux-scsi, Jim Nead,
	Jeremy Higdon, Gary Hagensen


On Jun 13, 2006, at 3:35 PM, Michael Reed wrote:

>
>
> Steve Byan wrote:
>> On Jun 13, 2006, at 11:42 AM, Michael Reed wrote:
>>
>>> Treating fibre channel like removable storage is wrong.  Fibre
>>> targets aren't
>>> generally supposed to go away.  If they do, there's a significant
>>> chance
>>> that they'll be repaired and returned to service.  It makes sense
>>> to keep
>>> the infrastructure in place just like scsi, sas, iscsi, ata.
>>
>> In both Fibre Channel SANs and iSCSI SANs, administrators in large
>> datacenters will re-zone devices with some regularity as they
>> redeploy applications among existing systems.
>
> Yes, they will.  And don't they generally do this gracefully, i.e,
> shut down access to file systems or devices before rezoning?  And when
> the target is rezoned to the host again don't they expect to be able
> to resume using it.  This patch allows that to happen with no
> user intervention.

Ah, I see now. Yes, I think the behavior you are after is the correct  
one. Sorry, I didn't understand the details before chiming in.

Regards,
-Steve
-- 
Steve Byan <smb@egenera.com>
Software Architect
Egenera, Inc.
165 Forest Street
Marlboro, MA 01752
(508) 858-3125



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 15:42     ` Michael Reed
  2006-06-13 17:24       ` Stefan Richter
  2006-06-13 17:33       ` Steve Byan
@ 2006-06-13 17:59       ` James Bottomley
  2006-06-13 19:37         ` Michael Reed
  2 siblings, 1 reply; 18+ messages in thread
From: James Bottomley @ 2006-06-13 17:59 UTC (permalink / raw)
  To: Michael Reed
  Cc: James.Smart, Christoph Hellwig, linux-scsi, Jim Nead,
	Jeremy Higdon, Gary Hagensen

On Tue, 2006-06-13 at 10:42 -0500, Michael Reed wrote:
> Mounted file systems have no clue either.  Even with no activity on the
> fs, if the target stays missing beyond the device loss timeout and then
> returns, the file system cannot be accessed without intervention.
> 
> When the target does return, the file system has to be unmounted and
> remounted on a new "sd" device.  This is even if there was no activity
> on the file system while its target was absent, i.e., it wouldn't otherwise
> require an unmount/remount.

But lets examine the options:  If you leave an uncontactable target
hanging around, the SCSI error handler will activate anyway when the
command timeout passes (currently 30s) and the device will be offlined.
Bringing it back online will require user intervention and likely
necessitate an unmount and a remount to repair the filesystem anyway.
Even if you go further and hold off the error handler, what this will do
is slowly hang the system since anything that touches an inode on the
blocked target will be put into D wait.  I really think pro-actively
removing the target is better than either of the other two options.

The device loss timer represents an acceptable compromise between the
need to keep the target across short disconnect/reconnect events and the
need to keep the system functioning.

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 17:59       ` James Bottomley
@ 2006-06-13 19:37         ` Michael Reed
  2006-06-13 20:02           ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Reed @ 2006-06-13 19:37 UTC (permalink / raw)
  To: James Bottomley
  Cc: James.Smart, Christoph Hellwig, linux-scsi, Jim Nead,
	Jeremy Higdon, Gary Hagensen

James Bottomley wrote:
> On Tue, 2006-06-13 at 10:42 -0500, Michael Reed wrote:
>> Mounted file systems have no clue either.  Even with no activity on the
>> fs, if the target stays missing beyond the device loss timeout and then
>> returns, the file system cannot be accessed without intervention.
>>
>> When the target does return, the file system has to be unmounted and
>> remounted on a new "sd" device.  This is even if there was no activity
>> on the file system while its target was absent, i.e., it wouldn't otherwise
>> require an unmount/remount.
> 
> But lets examine the options:  If you leave an uncontactable target
> hanging around, the SCSI error handler will activate anyway when the
> command timeout passes (currently 30s) and the device will be offlined.

Not really true as the transport holds off the error handler until the
transport dev loss timer expires.

And afterwards, commands are returned immediately with DID_NO_CONNECT.
The device is never offlined (with my patch applied).

> Bringing it back online will require user intervention and likely
> necessitate an unmount and a remount to repair the filesystem anyway.

With the unpatched code, the device transitions from ONLINE - BLOCKED - CANCEL -
DEL.  Then the infrastructure is removed.  With the new code, it
transitions from ONLINE - BLOCKED - ONLINE.  Subsequent access to the
device results in i/o errors with a status of DID_NO_CONNECT.

	duck /root# dd if=/dev/md0 bs=128k count=1 of=/dev/null
	sd 5:0:13:0: SCSI error: return code = 0x10000
	end_request: I/O error, dev sdj, sector 0
	Buffer I/O error on device md0, logical block 0
	sd 5:0:11:0: SCSI error: return code = 0x10000
	end_request: I/O error, dev sdh, sector 0
	Buffer I/O error on device md0, logical block 4dd:
	reading `/dev/md0'sd 5:0:13:0: SCSI error: return code = 0x10000
	: Input/output error

The layer issuing the i/o can decide what to do with the device.

> Even if you go further and hold off the error handler, what this will do
> is slowly hang the system since anything that touches an inode on the
> blocked target will be put into D wait.  I really think pro-actively
> removing the target is better than either of the other two options.

The error handler is only held off during the dev loss period.  Once
the timer expires, the target is unblocked and pending commands issue
and terminate with DID_NO_CONNECT.  If there are no pending commands,
nothing bad happens.  Many multi-path drivers know to change paths when
"EIO" is returned, so, no EIO, no path switch, even if a prolonged
absence occurs.

The system does not slowly hang.  It remains responsive and behaves in
an expected manner.

> 
> The device loss timer represents an acceptable compromise between the
> need to keep the target across short disconnect/reconnect events and the
> need to keep the system functioning.

The new parameter doesn't really change the usage of the device loss timer.
It still will result in failed i/o when it expires.  It just leaves the
infrastructure around so that if/when the target returns, the reference
holders can resume using it.  This is the desired behavior.

The system remains fully functional with no unexpected delays.

Mike

> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 19:37         ` Michael Reed
@ 2006-06-13 20:02           ` James Bottomley
  2006-06-13 21:44             ` Michael Reed
  2006-06-14 16:31             ` Mike Christie
  0 siblings, 2 replies; 18+ messages in thread
From: James Bottomley @ 2006-06-13 20:02 UTC (permalink / raw)
  To: Michael Reed
  Cc: James.Smart, Christoph Hellwig, linux-scsi, Jim Nead,
	Jeremy Higdon, Gary Hagensen

On Tue, 2006-06-13 at 14:37 -0500, Michael Reed wrote:
> Not really true as the transport holds off the error handler until the
> transport dev loss timer expires.
> 
> And afterwards, commands are returned immediately with DID_NO_CONNECT.
> The device is never offlined (with my patch applied).

That was just a general examination of the options for retaining contact
with the target.

It seems we both agree that returning an error is about the only viable
option, in which case the user or application has to take a recovery
action anyway, so there's no logical difference between what you propose
and what we currently do as far as the application or filesystem is
concerned.

The only difference is what happens if the device reappears.  However,
since the application has to be modified in either case:  your patch to
continually probe with I/O to see if the device has returned, or the
existing case to wait out the udev event that says the device is back it
doesn't really buy us anything for the application.  Since the rest of
our infrastructure is already event driven, or migrating that way, I
really don't see value in introducing an anomaly like this purely for
fibre channel.

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 20:02           ` James Bottomley
@ 2006-06-13 21:44             ` Michael Reed
  2006-06-14  7:21               ` Hannes Reinecke
  2006-06-14 16:31             ` Mike Christie
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Reed @ 2006-06-13 21:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: James.Smart, Christoph Hellwig, linux-scsi, Jim Nead,
	Jeremy Higdon, Gary Hagensen

James Bottomley wrote:
> On Tue, 2006-06-13 at 14:37 -0500, Michael Reed wrote:
>> Not really true as the transport holds off the error handler until the
>> transport dev loss timer expires.
>>
>> And afterwards, commands are returned immediately with DID_NO_CONNECT.
>> The device is never offlined (with my patch applied).
> 
> That was just a general examination of the options for retaining contact
> with the target.
> 
> It seems we both agree that returning an error is about the only viable
> option, in which case the user or application has to take a recovery
> action anyway, so there's no logical difference between what you propose
> and what we currently do as far as the application or filesystem is
> concerned.
> 
> The only difference is what happens if the device reappears.  However,
> since the application has to be modified in either case:  your patch to
> continually probe with I/O to see if the device has returned, 

I'm not suggesting that any application would probe with i/o, though
it may or may not be doing that today.  If it is, the difference is
that the i/o will have the possibility of success when the target
ultimately returns.  With the current code, the i/o will never, ever,
succeed.  (Without app change, of course.)

or the
> existing case to wait out the udev event that says the device is back it
> doesn't really buy us anything for the application.  

BTW, I consider "application" to include kernel code such as volume
managers and file systems.  The applications don't require any modifications
with the new patch.  They still get failure notification in either case.
They still fail to work while the target is disconnected.  They can choose
to terminate or not.

> Since the rest of
> our infrastructure is already event driven, or migrating that way, I
> really don't see value in introducing an anomaly like this purely for
> fibre channel.

It's tough on fibre channel, being first.  :)

Among the benefits of this patch is the purchase of time.  With the fc
infrastructure the way it is, you're assured of forcing developers to
"publish or perish".  That may be the intended desire.  It just doesn't
seem fair to the users who have to deal with this.  It makes sense to
me to implement the event driven infrastructure in such a way that
it's more complete when released.  If infrastructure is going to be
removed, then "applications" have to be adjusted to accommodate this.
It shouldn't be, oh by the way, your driver/app is now broken, hurry up
and fix it or your users will complain.  [End Of Rant].  My patch
buys time.  Change the default so that the remove on disconnect has
to be consciously overridden.  Remove the variable when the supporting
infrastructure is in place.  Put out a message indicating that the
option of not removing the infrastructure is "going away" in a future
release.  Provide an orderly transition.  Insure domestic tranquility.
Promote the general welfare.  :)  I'm happy to adjust the patch
to accommodate any of these suggestions if they are deemed acceptable.

Thanks for taking the time to consider and discuss this issue.  I see
your point and I've made mine.  I trust your judgment.

Thanks,
 Mike

> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 21:44             ` Michael Reed
@ 2006-06-14  7:21               ` Hannes Reinecke
  2006-06-14 16:18                 ` Mike Christie
  0 siblings, 1 reply; 18+ messages in thread
From: Hannes Reinecke @ 2006-06-14  7:21 UTC (permalink / raw)
  To: Michael Reed
  Cc: James Bottomley, James.Smart, Christoph Hellwig, linux-scsi,
	Jim Nead, Jeremy Higdon, Gary Hagensen

Michael Reed wrote:
> 
> James Bottomley wrote:
>> Since the rest of
>> our infrastructure is already event driven, or migrating that way, I
>> really don't see value in introducing an anomaly like this purely for
>> fibre channel.
> 
> It's tough on fibre channel, being first.  :)
> 
> Among the benefits of this patch is the purchase of time.  With the fc
> infrastructure the way it is, you're assured of forcing developers to
> "publish or perish".  That may be the intended desire.  It just doesn't
> seem fair to the users who have to deal with this.  It makes sense to
> me to implement the event driven infrastructure in such a way that
> it's more complete when released.  If infrastructure is going to be
> removed, then "applications" have to be adjusted to accommodate this.
> It shouldn't be, oh by the way, your driver/app is now broken, hurry up
> and fix it or your users will complain.  [End Of Rant].  My patch
> buys time.  Change the default so that the remove on disconnect has
> to be consciously overridden.  Remove the variable when the supporting
> infrastructure is in place.  Put out a message indicating that the
> option of not removing the infrastructure is "going away" in a future
> release.  Provide an orderly transition.  Insure domestic tranquility.
> Promote the general welfare.  :)  I'm happy to adjust the patch
> to accommodate any of these suggestions if they are deemed acceptable.
> 
And I can only _strongly_ agree with Mike here.
Yes, the infrastructure is moving towards dynamic device configuration.
But no, we're not there yet. Not by a long way.
Neither of the current volume managers (ie LVM2, EVMS, and even md to
some extend) are capable of dynamic reconfiguration.
Of course they sort of work nowadays, but the general idea of LVM2 and
EVMS is still that _all_ devices are available during setup.

And I don't even want to _think_ of the implications of running 'vgscan'
from a udev rule. So Mike's patch would allow those application to run
properly for the time being.

In general I agree with the dev_loss_tmo mechanism. But by enforcing it
we'll only make it easier for the developers of LVM2 et al to put the
blame on us (on the grounds of 'you broke it, you fix it') instead of
coaxing them to change their applications to accept dynamic device
configuration.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke			hare@suse.de
SuSE Linux Products GmbH		S390 & zSeries
Maxfeldstraße 5				+49 911 74053 688
90409 Nürnberg				http://www.suse.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-14  7:21               ` Hannes Reinecke
@ 2006-06-14 16:18                 ` Mike Christie
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Christie @ 2006-06-14 16:18 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Michael Reed, James Bottomley, James.Smart, Christoph Hellwig,
	linux-scsi, Jim Nead, Jeremy Higdon, Gary Hagensen

Hannes Reinecke wrote:
> Michael Reed wrote:
>> James Bottomley wrote:
>>> Since the rest of
>>> our infrastructure is already event driven, or migrating that way, I
>>> really don't see value in introducing an anomaly like this purely for
>>> fibre channel.
>> It's tough on fibre channel, being first.  :)
>>
>> Among the benefits of this patch is the purchase of time.  With the fc
>> infrastructure the way it is, you're assured of forcing developers to
>> "publish or perish".  That may be the intended desire.  It just doesn't
>> seem fair to the users who have to deal with this.  It makes sense to
>> me to implement the event driven infrastructure in such a way that
>> it's more complete when released.  If infrastructure is going to be
>> removed, then "applications" have to be adjusted to accommodate this.
>> It shouldn't be, oh by the way, your driver/app is now broken, hurry up
>> and fix it or your users will complain.  [End Of Rant].  My patch
>> buys time.  Change the default so that the remove on disconnect has
>> to be consciously overridden.  Remove the variable when the supporting
>> infrastructure is in place.  Put out a message indicating that the
>> option of not removing the infrastructure is "going away" in a future
>> release.  Provide an orderly transition.  Insure domestic tranquility.
>> Promote the general welfare.  :)  I'm happy to adjust the patch
>> to accommodate any of these suggestions if they are deemed acceptable.
>>
> And I can only _strongly_ agree with Mike here.
> Yes, the infrastructure is moving towards dynamic device configuration.
> But no, we're not there yet. Not by a long way.
> Neither of the current volume managers (ie LVM2, EVMS, and even md to
> some extend) are capable of dynamic reconfiguration.


I do not think DM based volume manager will work at all if all devices
in a DM table are removed at the same time temporarily. For dm-multipath
we would end up with no dm device and then the user will have to remount
the FS.

Does the kobject_uevent use also have problems? I guess GFP_KERNEL
allocations will not fail, but if all the devices are dm-multipath
devices is there the possibility that the GFP_KERNEL allocation will
wait forever or is there a way for it work eventually? Is this one of
those things we are waiting for the VM guys to fix, or do modify the
uevent code or just not get into it by never removing and readding the
devices?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-13 20:02           ` James Bottomley
  2006-06-13 21:44             ` Michael Reed
@ 2006-06-14 16:31             ` Mike Christie
  2006-06-15  9:04               ` Stefan Richter
  1 sibling, 1 reply; 18+ messages in thread
From: Mike Christie @ 2006-06-14 16:31 UTC (permalink / raw)
  To: James Bottomley
  Cc: Michael Reed, James.Smart, Christoph Hellwig, linux-scsi,
	Jim Nead, Jeremy Higdon, Gary Hagensen

James Bottomley wrote:
> On Tue, 2006-06-13 at 14:37 -0500, Michael Reed wrote:
>> Not really true as the transport holds off the error handler until the
>> transport dev loss timer expires.
>>
>> And afterwards, commands are returned immediately with DID_NO_CONNECT.
>> The device is never offlined (with my patch applied).
> 
> That was just a general examination of the options for retaining contact
> with the target.
> 
> It seems we both agree that returning an error is about the only viable
> option, in which case the user or application has to take a recovery
> action anyway, so there's no logical difference between what you propose
> and what we currently do as far as the application or filesystem is
> concerned.
> 
> The only difference is what happens if the device reappears.  However,
> since the application has to be modified in either case:  your patch to
> continually probe with I/O to see if the device has returned, or the
> existing case to wait out the udev event that says the device is back it
> doesn't really buy us anything for the application.  Since the rest of
> our infrastructure is already event driven, or migrating that way, I
> really don't see value in introducing an anomaly like this purely for
> fibre channel.
> 

For iscsi we do sort of the probe option. The problem with software
iscsi is that we do not normally get a event that some target is back
online so from userspace we basically have to probe it. We try to open a
connection and poll it until it we can connect, then we try to log back
in. When we log back in, we set the devices online if we have to and we
set the driver and iscsi state to start accepting IO again.

For HW iscsi the card can signal an event that it has logged back into a
target, so we could do like FC. So for qla4xxx, should we follow the FC
model or software iscsi one that is already there?

Note, that for software iscsi we could do FC's model too. When
dev_loss_tmo expires we could remove the session/target. Then we could
just create the connection, poll, and if we successfully login we could
then create the session/target structs again.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] make fc transport removal of target configurable
  2006-06-14 16:31             ` Mike Christie
@ 2006-06-15  9:04               ` Stefan Richter
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Richter @ 2006-06-15  9:04 UTC (permalink / raw)
  To: Mike Christie
  Cc: James Bottomley, Michael Reed, James.Smart, Christoph Hellwig,
	linux-scsi, Jim Nead, Jeremy Higdon, Gary Hagensen

Mike Christie wrote:
> James Bottomley wrote:
[...]
>> doesn't really buy us anything for the application.  Since the rest of
>> our infrastructure is already event driven, or migrating that way, I
>> really don't see value in introducing an anomaly like this purely for
>> fibre channel.
> 
> For iscsi we do sort of the probe option. The problem with software
> iscsi is that we do not normally get a event that some target is back
> online so from userspace we basically have to probe it. We try to open a
> connection and poll it until it we can connect, then we try to log back
> in. When we log back in, we set the devices online if we have to and we
> set the driver and iscsi state to start accepting IO again.
> 
> For HW iscsi the card can signal an event that it has logged back into a
> target, so we could do like FC. So for qla4xxx, should we follow the FC
> model or software iscsi one that is already there?
> 
> Note, that for software iscsi we could do FC's model too.
[...]

I don't think there is any basic difference between the transports like
FC, iSCSI, or e.g. USB storage, or even parallel SCSI.
 - The SCSI core implements a state model for its abstract* notion of
   "SCSI device"/ "SCSI target" and handles new/ pending/ completed/
   failed tasks according to these devices's states.
   (* = transport independent)
 - The transport layer receives events concerning the state of
   connection with a target or LU. These events come from the
   interconnect (i.e. from the thin or thick layer of software driving
   the interconnect) or from userspace (management interface) or from
   timeouts. According to these events, the transport layer handles the
   transport protocol (login etc.) and triggers state transitions of the
   SCSI core's device model.

There may be differences between transports like from where certain
events come or how long certain timeouts are. Some timeouts are defined
by the protocol spec, others may be configurable by the user. But the
types of connection state transitions ("gone online", "became
unavailable", "came back", "requested to be shut down", "hot
terminated") as well as their consequences for SCSI core and software
layers above it are ultimately the same. Thus, behaviour of SCSI core
like to block or to fail tasks should be the same.

(Connection details on the other hand, like "tied to this unique
identifier" or "routed via this or that path" or "backed by heartbeat
protocol" or whatnot, concern only the transport layer and transport
management, as far as such details are not hidden by the interconnect.)

One problem at the moment is that the mentioned state transitions are
not fully supported by the SCSI core's lowlevel API. We only have "gone
online", "to be blocked", "to be unblocked", "requested to be shut
down". Transport layers currently have to work around this limitation,
and they do it differently and sometimes buggy.
-- 
Stefan Richter
-=====-=-==- -==- -===-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-06-15  9:07 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-12 23:16 [PATCH] make fc transport removal of target configurable Michael Reed
2006-06-13  7:07 ` Christoph Hellwig
2006-06-13 11:06   ` James Smart
2006-06-13 15:42     ` Michael Reed
2006-06-13 17:24       ` Stefan Richter
2006-06-13 19:36         ` Michael Reed
2006-06-13 23:13           ` Stefan Richter
2006-06-13 17:33       ` Steve Byan
2006-06-13 19:35         ` Michael Reed
2006-06-13 19:49           ` Steve Byan
2006-06-13 17:59       ` James Bottomley
2006-06-13 19:37         ` Michael Reed
2006-06-13 20:02           ` James Bottomley
2006-06-13 21:44             ` Michael Reed
2006-06-14  7:21               ` Hannes Reinecke
2006-06-14 16:18                 ` Mike Christie
2006-06-14 16:31             ` Mike Christie
2006-06-15  9:04               ` Stefan Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox