From: Michael Reed <mdr@sgi.com>
To: Michael Reed <mdr@sgi.com>
Cc: James.Smart@Emulex.Com, linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
Date: Tue, 08 Aug 2006 17:15:04 -0500 [thread overview]
Message-ID: <44D90CE8.70006@sgi.com> (raw)
In-Reply-To: <44D90878.7000903@sgi.com>
Michael Reed wrote:
>
> James Smart wrote:
>> Closing Statements:
>>
>> I've attached the original RFC below. See
>> http://marc.theaimsgroup.com/?l=linux-scsi&m=115082917628466&w=2
>>
>> I've updated it with what I perceive to be the position and resolution
>> based on comments. Keep in mind that we're trying to lay the groundwork
>> for common behavior and tunables between the transports.
>>
>> Please let me know if I've mis-represented anything, or if there is
>> a dissention in the resolution. I'd like to close on this.
>>
>> James Smart wrote:
>>> Folks,
>>>
>>> The following addresses some long standing todo items I've had in the
>>> FC transport. They primarily arise when considering multipathing, or
>>> trying to marry driver internal state to transport state. It is intended
>>> that this same type of functionality would be usable in other transports
>>> as well.
>>>
>>> Here's what is contained:
>>>
>>> - dev_loss_tmo LLDD callback :
>>> Currently, there is no notification to the LLDD of when the transport
>>> gives up on the device returning and starts to return DID_NO_CONNECT
>>> in the queuecommand helper function. This callback notifies the LLDD
>>> that the transport has now given up on the rport, thereby acknowledging
>>> the prior fc_remote_port_delete() call. The callback also expects the
>>> LLDD to initiate the termination of any outstanding i/o on the rport.
>> I believe there is no dissention on this change.
>> Please note: this is essentially a confirmation from the transport to the
>> LLD that the rport is fully deleted. Thus, the LLD must expect to see
>> these callbacks as a normal part of the rport being terminated (even if
>> it is not blocked).
>>
>> I'll move forward with this.
>
> Concur.
>
>>> - fast_io_fail_tmo and LLD callback:
>>> There are some cases where it may take a long while to truly determine
>>> device loss, but the system is in a multipathing configuration that if
>>> the i/o was failed quickly (faster than dev_loss_tmo), it could be
>>> redirected to a different path and completed sooner (assuming the
>>> multipath thing knew that the sdev was blocked).
>>> iSCSI is one of the transports that may vary dev_loss_tmo values
>>> per session, and you would like fast io failure.
>>
>> The current transport implementation did not specify what happened to
>> active i/o (given to the driver, in the adapter, but not yet completed
>> back to the midlayer) when a device was blocked, nor during the
>> block-to->dev_loss transition period. It was up to the driver. Many
>> assumed active i/o was immediately terminated, which is semi-consistent
>> with the behavior of most drivers for most "connectivity loss" scenarios.
>>
>> The conversations then started to jump around, considering what i/o's you
>> may want to have fail quickly, etc.
>>
>> Here's my opinion:
>> We have the following points in time to look at:
>> (a) the device is blocked by the transport
>> (b) there is a time T, usually in a multipathing environment, where it
>> would be useful to error the i/o early rather than wait for dev_loss
>> It is assumed that any such i/o request would be marked REQ_FASTFAIL
>> (c) the dev_loss_tmo fires - we're to assume the device is gone
>> and at any time post (a), the device may return, unblock and never
>> encounter points (b) and (c).
>
> REQ_FAILFAST is stored in the request structure. Are there "issues"
> with using scsi_cmnd.request in the lldd?
>
>> As for what happens to active i/o :
>>
>> always: the driver can fail an i/o at any point in time if it deems
>> it appropriate.
>>
>> at (a): There are scenarios where a short link perturbation may occur,
>> which may not disrupt the i/o. Therefore, we should not force
>> io to be terminated.
>>
>> at (b): Minimally, we should terminate all active i/o requests marked
>> as type REQ_FASTFAIL. From an api perspective, driver support
>> for this is optional. And we must also assume that there will
>> be implementations which have to abort all i/o in order to
>> terminate those marked REQ_FASTFAIL. Is this acceptable ?
>> (it meets the "always" condition above)
>>
>> Q: so far we've limited the io to those w/ REQ_FASTFAIL.
>> Would we ever want to allow a user to fast fail all i/o
>> regardless of the request flags ? (in case they flags
>> weren't getting set on all the i/o the user wanted to
>> see fail ?)
>
> REQ_FAILFAST appears to influence retries during error recovery so
> there may be unexpected side effects of doing this. But, that said,
> I'd say yes. From my perspective, I'd make this the default behavior.
>
> In talking with our volume manager people, the question raised was
> "Why would you want some i/o to fail quickly and some not?"
> They even considered non-i/o scsi commands.
>
> I think that if the default behavior is to have fast_io_fail_tmo
> enabled, then it should be controlled by REQ_FAILFAST in the
> request. If the default is to have the timer disabled, i.e.,
> an admin has to enable it (or define when it's enabled) then
> when enabled it should apply to every i/o. In examining the
> patch, it appears to be disabled by default, so our conclusion
> is that all i/o should fast fail when enabled. We also concur
> with having fast_io_fail_tmo disabled by default.
I think this also implies that our volume manager guys would be
just as happy setting dev_loss_tmo to a small value and not use
fast_io_fail_tmo.
Mike
>
> I guess this implies leave REQ_FAILFAST to error recovery. :)
>
> Mike
>
>
>
>> There's a desire to address pending i/o (those on the block
>> request queue or new requests going there) so that if we've
>> crossed point (b) that we also fail them. The proposal is
>> to add a new state (device ? or queue ?), which would occur
>> as of point (b). All REQ_FASTFAIL io on the queue, as well
>> as on a new io, will be failed with a new i/o status if in
>> this state. Non-REQ_FASTFAIL i/o would continue to enter/sit
>> on the request queue until dev_loss_tmo fires.
>>
>> at (c): per the dev_loss_tmo callback, all i/o should be terminated.
>> Their completions do not have to be synchronous to the return
>> from the callback - they can occur afterward.
>>
>>
>> Comments ?
>>
>> Assuming that folks agree, I'd like to do this in 2 patches:
>> - one that puts in the transport fast_io_fail_tmo and LLD callback
>> - another that adds the new state, io completion status, and does the
>> handling of the request queue REQ_FASTFAIL i/o.
>>
>>> - fast_loss_time recommendation:
>>> In discussing how a admin should set dev_loss_tmo in a multipathing
>>> environment, it became apparent that we expected the admin to know
>>> a lot. They had to know the transport type, what the minimum setting
>>> can be that still survives normal link bouncing, and they may even
>>> have to know about device specifics. For iSCSI, the proper loss time
>>> may vary widely from session to session.
>>>
>>> This attribute is an exported "recommendation" by the LLDD and
>>> transport
>>> on what the lowest setting for dev_loss_tmo should be for a
>>> multipathing
>>> environment. Thus, the admin only needs to cat this attribute to obtain
>>> the value to echo into dev_loss_tmo.
>> The only objection was from Christoph - wanting a utility to get/set this
>> stuff. However, the counter was this attribute was still meaningful, as it
>> was the conduit to obtain a recommendation from the transport/LLD.
>>
>> So - I assume this proceeds as is - with a change in it's description.
>>
>>>
>>> I have one criticism of these changes. The callbacks are calling into
>>> the LLDD with an rport post the driver's rport_delete call. What it means
>>> is that we are essentially extending the lifetime of an rport until the
>>> dev_loss_tmo call occurs.
>> It's ok - and adding the appropriate comments are fine.
>>
>>
>> Thanks.
>>
>> -- james s
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
next prev parent reply other threads:[~2006-08-08 22:15 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-20 18:45 [RFC] fc transport: extensions for fast fail and dev loss James Smart
2006-07-25 17:12 ` Mike Christie
2006-07-25 18:49 ` James Smart
2006-07-25 21:15 ` Michael Reed
2006-07-26 3:33 ` James Smart
2006-07-26 9:20 ` Christoph Hellwig
2006-07-26 16:35 ` James Smart
2006-08-08 17:54 ` [RFC] [Last Rites] " James Smart
2006-08-08 21:56 ` Michael Reed
2006-08-08 22:15 ` Michael Reed [this message]
2006-08-09 15:31 ` Michael Reed
2006-08-10 16:38 ` James Smart
2006-08-09 17:36 ` Christoph Hellwig
2006-08-10 16:17 ` James Smart
2006-08-10 20:01 ` Mike Christie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44D90CE8.70006@sgi.com \
--to=mdr@sgi.com \
--cc=James.Smart@Emulex.Com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox