public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Michael Reed <mdr@sgi.com>
To: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: James.Smart@Emulex.Com, Christoph Hellwig <hch@infradead.org>,
	linux-scsi <linux-scsi@vger.kernel.org>, Jim Nead <jnead@sgi.com>,
	Jeremy Higdon <jeremy@sgi.com>, Gary Hagensen <gwh@sgi.com>
Subject: Re: [PATCH] make fc transport removal of target configurable
Date: Tue, 13 Jun 2006 14:36:16 -0500	[thread overview]
Message-ID: <448F13B0.20803@sgi.com> (raw)
In-Reply-To: <448EF4D4.4090402@s5r6.in-berlin.de>



Stefan Richter wrote:
> Michael Reed wrote:
>> James Smart wrote:
>>> We are seriously in trouble if the subsystems above us don't know how
>>> to deal with dead targets. We are encountering scenarios in which the
>>> data structures are staying around due to references, but for all other
>>> intents they're gone.  I know that DM has yet to fully account for this.
>>> md - it's dead. Applications... they have no clue.
>> Mounted file systems have no clue either.  Even with no activity on the
>> fs, if the target stays missing beyond the device loss timeout and then
>> returns, the file system cannot be accessed without intervention.
>>
>> When the target does return, the file system has to be unmounted and
>> remounted on a new "sd" device.  This is even if there was no activity
>> on the file system while its target was absent, i.e., it wouldn't otherwise
>> require an unmount/remount.
> 
> Michael, I don't understand how your patch fits into this picture.

The patch allows the target to return to its existing infrastructure
following a prolonged absence due to, say, a kicked cable or raid controller
reboot.  Current file systems, volume managers, and multi-path drivers
do not seem to tolerate the return of a target to new infrastructure.

> 
> There is presently the FC transport parameter 'dev_loss_tmo', which is
>     "Maximum number of seconds that the FC transport should"
>     " insulate the loss of a remote port. Once this value is"
>     " exceeded, the scsi target {is|may be} removed. {%|Reference"
>     " the remove_on_dev_loss module parameter.} Value should be"
>     " between 1 and SCSI_DEVICE_BLOCK_MAX_TIMEOUT.");
> 
> Then you are adding the parameter 'remove_on_dev_loss', which is
>     "Boolean.  When the device loss timer fires, this variable"
>     " controls whether the scsi infrastructure for the target"
>     " device is removed.  Values: zero means do not remove,"
>     " non-zero means remove.  Default is zero.");
> 
> I think the 2nd parameter does not help anyone. What you rather seem to
> need is
>  a) the existing dev_loss_tmo parameter but without the kernel
>     enforcing an upper limit for it [the admin sets the policy, not
>     the kernel], and
>  b) the transport layer or the SCSI core taking care that no SCSI
>     command times out during the tolerated absence of a target.

Actually, I do not want this.  The limit on the dev_loss_tmo parameter
is there to allow error notification to eventually pass up the stack.
This is important in path failover situations.  An infinite value here
would imply that commands do not time out.

> 
> So, for every layer above the transport layer or of SCSI core (SCSI
> command set drivers and sg driver, block layer, filesystem...),
> everything becomes fully transparent. These layers do not notice absence
> of the target. If anything at all, they merely notice that commands take
> unusually long to complete.

The transport currently holds off commands with a combination of DID_IMM_RETRY,
blocking the target so that no new commands are issued, and holding off
error recovery until the dev loss timer expires.  This is the behavior that
is desired.

What I want is to have the device, when it returns, reconnect to it's
existing infrastructure.  This allows previously connected "users"
to reconnect.

Mike


> 
> Of course there are practical limits to this:
>  - We don't want to wait ages for commands to complete or to fail.
>  - The device's state may have changed arbitrarily during its absence
>    due to an external influence, leading to corruption when it comes
>    back.
> But again, the decision about the limit for such tolerated absence
> should be a decision by the admin, not one by the kernel. The driver
> software and the involved kernel infrastructure should merely provide
> mechanisms but not enforce a policy, at least not to unnecessary extent.
> 
> Anyhow. My point is: It seems what you want is 1. to let the admin set
> an arbitrary dev_loss_tmo and 2. the transport or the SCSI core taking
> care that no commands time out during that period.
> 
> Where to implement this? The transport layer has the benefit to have a
> better notion of target states because it is closer to the interconnect
> layer than the SCSI core. On the other hand, the SCSI core is rather the
> place where mechanisms to handle the lifecycle of targets and especially
> of commands exist.
> 
> The SCSI core seems appropriate for another reason: The issue at hand is
> not really specific to the FC transport. Maybe we want dev_loss_tmo to
> be independently configurable for different transports or on a
> per-host-adapter basis, or on a per target basis. But generally,
> temporary absence of a target is a *natural and common state* for some
> other transports besides FC. (Example: Bus reset phase and rescanning of
> FireWire interconnect == connection loss and subsequent reconnect or
> re-login of SBP-2 transport. This is a rather short period, but I
> already thought about implementing a prolongued state of absence in sbp2
> for two other specific purposes.)
> 
> If it was decided to implement this "tolerated temporary absence of a
> target" in SCSI core, then the SCSI core's state machine would "simply"
> have to handle another target state.
> 
> I put "simply" into quotes because the existing state model seems not to
> be exactly at a point where you could immediately proceed to add such
> additional state. In particular, the SCSI core does not yet support the
> state "device temporarily not accessible". The state "device blocked" is
> similar but ultimately not the same. Besides, the SCSI core does also
> not distinguish the state transitions "device operational -> device
> removal requested" versus "device operational -> device hot unplugged".
> (The latter transition does not exist for SCSI core; transport layers or
> low-level drivers have to initiate the transition to "device removal
> requested" and work around the subsequent problems when it was actually
> a hot unplug.)
> 
> Side note to everything above: Yes, I may have missed something, so
> correct me.

  reply	other threads:[~2006-06-13 19:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-12 23:16 [PATCH] make fc transport removal of target configurable Michael Reed
2006-06-13  7:07 ` Christoph Hellwig
2006-06-13 11:06   ` James Smart
2006-06-13 15:42     ` Michael Reed
2006-06-13 17:24       ` Stefan Richter
2006-06-13 19:36         ` Michael Reed [this message]
2006-06-13 23:13           ` Stefan Richter
2006-06-13 17:33       ` Steve Byan
2006-06-13 19:35         ` Michael Reed
2006-06-13 19:49           ` Steve Byan
2006-06-13 17:59       ` James Bottomley
2006-06-13 19:37         ` Michael Reed
2006-06-13 20:02           ` James Bottomley
2006-06-13 21:44             ` Michael Reed
2006-06-14  7:21               ` Hannes Reinecke
2006-06-14 16:18                 ` Mike Christie
2006-06-14 16:31             ` Mike Christie
2006-06-15  9:04               ` Stefan Richter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=448F13B0.20803@sgi.com \
    --to=mdr@sgi.com \
    --cc=James.Smart@Emulex.Com \
    --cc=gwh@sgi.com \
    --cc=hch@infradead.org \
    --cc=jeremy@sgi.com \
    --cc=jnead@sgi.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=stefanr@s5r6.in-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox