From: Stefan Richter <stefanr@s5r6.in-berlin.de>
To: Michael Reed <mdr@sgi.com>
Cc: James.Smart@Emulex.Com, Christoph Hellwig <hch@infradead.org>,
linux-scsi <linux-scsi@vger.kernel.org>, Jim Nead <jnead@sgi.com>,
Jeremy Higdon <jeremy@sgi.com>, Gary Hagensen <gwh@sgi.com>
Subject: Re: [PATCH] make fc transport removal of target configurable
Date: Tue, 13 Jun 2006 19:24:36 +0200 [thread overview]
Message-ID: <448EF4D4.4090402@s5r6.in-berlin.de> (raw)
In-Reply-To: <448EDCEB.50702@sgi.com>
Michael Reed wrote:
> James Smart wrote:
>> We are seriously in trouble if the subsystems above us don't know how
>> to deal with dead targets. We are encountering scenarios in which the
>> data structures are staying around due to references, but for all other
>> intents they're gone. I know that DM has yet to fully account for this.
>> md - it's dead. Applications... they have no clue.
>
> Mounted file systems have no clue either. Even with no activity on the
> fs, if the target stays missing beyond the device loss timeout and then
> returns, the file system cannot be accessed without intervention.
>
> When the target does return, the file system has to be unmounted and
> remounted on a new "sd" device. This is even if there was no activity
> on the file system while its target was absent, i.e., it wouldn't otherwise
> require an unmount/remount.
Michael, I don't understand how your patch fits into this picture.
There is presently the FC transport parameter 'dev_loss_tmo', which is
"Maximum number of seconds that the FC transport should"
" insulate the loss of a remote port. Once this value is"
" exceeded, the scsi target {is|may be} removed. {%|Reference"
" the remove_on_dev_loss module parameter.} Value should be"
" between 1 and SCSI_DEVICE_BLOCK_MAX_TIMEOUT.");
Then you are adding the parameter 'remove_on_dev_loss', which is
"Boolean. When the device loss timer fires, this variable"
" controls whether the scsi infrastructure for the target"
" device is removed. Values: zero means do not remove,"
" non-zero means remove. Default is zero.");
I think the 2nd parameter does not help anyone. What you rather seem to
need is
a) the existing dev_loss_tmo parameter but without the kernel
enforcing an upper limit for it [the admin sets the policy, not
the kernel], and
b) the transport layer or the SCSI core taking care that no SCSI
command times out during the tolerated absence of a target.
So, for every layer above the transport layer or of SCSI core (SCSI
command set drivers and sg driver, block layer, filesystem...),
everything becomes fully transparent. These layers do not notice absence
of the target. If anything at all, they merely notice that commands take
unusually long to complete.
Of course there are practical limits to this:
- We don't want to wait ages for commands to complete or to fail.
- The device's state may have changed arbitrarily during its absence
due to an external influence, leading to corruption when it comes
back.
But again, the decision about the limit for such tolerated absence
should be a decision by the admin, not one by the kernel. The driver
software and the involved kernel infrastructure should merely provide
mechanisms but not enforce a policy, at least not to unnecessary extent.
Anyhow. My point is: It seems what you want is 1. to let the admin set
an arbitrary dev_loss_tmo and 2. the transport or the SCSI core taking
care that no commands time out during that period.
Where to implement this? The transport layer has the benefit to have a
better notion of target states because it is closer to the interconnect
layer than the SCSI core. On the other hand, the SCSI core is rather the
place where mechanisms to handle the lifecycle of targets and especially
of commands exist.
The SCSI core seems appropriate for another reason: The issue at hand is
not really specific to the FC transport. Maybe we want dev_loss_tmo to
be independently configurable for different transports or on a
per-host-adapter basis, or on a per target basis. But generally,
temporary absence of a target is a *natural and common state* for some
other transports besides FC. (Example: Bus reset phase and rescanning of
FireWire interconnect == connection loss and subsequent reconnect or
re-login of SBP-2 transport. This is a rather short period, but I
already thought about implementing a prolongued state of absence in sbp2
for two other specific purposes.)
If it was decided to implement this "tolerated temporary absence of a
target" in SCSI core, then the SCSI core's state machine would "simply"
have to handle another target state.
I put "simply" into quotes because the existing state model seems not to
be exactly at a point where you could immediately proceed to add such
additional state. In particular, the SCSI core does not yet support the
state "device temporarily not accessible". The state "device blocked" is
similar but ultimately not the same. Besides, the SCSI core does also
not distinguish the state transitions "device operational -> device
removal requested" versus "device operational -> device hot unplugged".
(The latter transition does not exist for SCSI core; transport layers or
low-level drivers have to initiate the transition to "device removal
requested" and work around the subsequent problems when it was actually
a hot unplug.)
Side note to everything above: Yes, I may have missed something, so
correct me.
--
Stefan Richter
-=====-=-==- -==- -==-=
http://arcgraph.de/sr/
next prev parent reply other threads:[~2006-06-13 17:27 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-12 23:16 [PATCH] make fc transport removal of target configurable Michael Reed
2006-06-13 7:07 ` Christoph Hellwig
2006-06-13 11:06 ` James Smart
2006-06-13 15:42 ` Michael Reed
2006-06-13 17:24 ` Stefan Richter [this message]
2006-06-13 19:36 ` Michael Reed
2006-06-13 23:13 ` Stefan Richter
2006-06-13 17:33 ` Steve Byan
2006-06-13 19:35 ` Michael Reed
2006-06-13 19:49 ` Steve Byan
2006-06-13 17:59 ` James Bottomley
2006-06-13 19:37 ` Michael Reed
2006-06-13 20:02 ` James Bottomley
2006-06-13 21:44 ` Michael Reed
2006-06-14 7:21 ` Hannes Reinecke
2006-06-14 16:18 ` Mike Christie
2006-06-14 16:31 ` Mike Christie
2006-06-15 9:04 ` Stefan Richter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=448EF4D4.4090402@s5r6.in-berlin.de \
--to=stefanr@s5r6.in-berlin.de \
--cc=James.Smart@Emulex.Com \
--cc=gwh@sgi.com \
--cc=hch@infradead.org \
--cc=jeremy@sgi.com \
--cc=jnead@sgi.com \
--cc=linux-scsi@vger.kernel.org \
--cc=mdr@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox