Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: Michael Reed <mdr@sgi.com>
To: Michael Reed <mdr@sgi.com>
Cc: James.Smart@Emulex.Com, linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
Date: Tue, 08 Aug 2006 17:15:04 -0500	[thread overview]
Message-ID: <44D90CE8.70006@sgi.com> (raw)
In-Reply-To: <44D90878.7000903@sgi.com>



Michael Reed wrote:
> 
> James Smart wrote:
>> Closing Statements:
>>
>> I've attached the original RFC below. See
>>   http://marc.theaimsgroup.com/?l=linux-scsi&m=115082917628466&w=2
>>
>> I've updated it with what I perceive to be the position and resolution
>> based on comments.  Keep in mind that we're trying to lay the groundwork
>> for common behavior and tunables between the transports.
>>
>> Please let me know if I've mis-represented anything, or if there is
>> a dissention in the resolution. I'd like to close on this.
>>
>> James Smart wrote:
>>> Folks,
>>>
>>> The following addresses some long standing todo items I've had in the
>>> FC transport. They primarily arise when considering multipathing, or
>>> trying to marry driver internal state to transport state. It is intended
>>> that this same type of functionality would be usable in other transports
>>> as well.
>>>
>>> Here's what is contained:
>>>
>>> - dev_loss_tmo LLDD callback :
>>>   Currently, there is no notification to the LLDD of when the transport
>>>   gives up on the device returning and starts to return DID_NO_CONNECT
>>>   in the queuecommand helper function. This callback notifies the LLDD
>>>   that the transport has now given up on the rport, thereby acknowledging
>>>   the prior fc_remote_port_delete() call. The callback also expects the
>>>   LLDD to initiate the termination of any outstanding i/o on the rport.
>> I believe there is no dissention on this change.
>> Please note: this is essentially a confirmation from the transport to the
>>   LLD that the rport is fully deleted. Thus, the LLD must expect to see
>>   these callbacks as a normal part of the rport being terminated (even if
>>   it is not blocked).
>>
>> I'll move forward with this.
> 
> Concur.
> 
>>> - fast_io_fail_tmo and LLD callback:
>>>   There are some cases where it may take a long while to truly determine
>>>   device loss, but the system is in a multipathing configuration that if
>>>   the i/o was failed quickly (faster than dev_loss_tmo), it could be
>>>   redirected to a different path and completed sooner (assuming the  
>>> multipath thing knew that the sdev was blocked).
>>>     iSCSI is one of the transports that may vary dev_loss_tmo values
>>>   per session, and you would like fast io failure.
>>
>> The current transport implementation did not specify what happened to
>>   active i/o (given to the driver, in the adapter, but not yet completed
>>   back to the midlayer) when a device was blocked, nor during the
>>   block-to->dev_loss transition period. It was up to the driver.  Many
>>   assumed active i/o was immediately terminated, which is semi-consistent
>>   with the behavior of most drivers for most "connectivity loss" scenarios.
>>
>> The conversations then started to jump around, considering what i/o's you
>>   may want to have fail quickly, etc.
>>
>> Here's my opinion:
>>   We have the following points in time to look at:
>>    (a) the device is blocked by the transport
>>    (b) there is a time T, usually in a multipathing environment, where it
>>        would be useful to error the i/o early rather than wait for dev_loss
>>        It is assumed that any such i/o request would be marked REQ_FASTFAIL
>>    (c) the dev_loss_tmo fires - we're to assume the device is gone
>>   and at any time post (a), the device may return, unblock and never
>>   encounter points (b) and (c).
> 
> REQ_FAILFAST is stored in the request structure.  Are there "issues"
> with using scsi_cmnd.request in the lldd?
> 
>>   As for what happens to active i/o :
>>
>>   always: the driver can fail an i/o at any point in time if it deems
>>           it appropriate.
>>
>>   at (a): There are scenarios where a short link perturbation may occur,
>>           which may not disrupt the i/o. Therefore, we should not force
>>           io to be terminated.
>>
>>   at (b): Minimally, we should terminate all active i/o requests marked
>>           as type REQ_FASTFAIL. From an api perspective, driver support
>>           for this is optional. And we must also assume that there will
>>           be implementations which have to abort all i/o in order to
>>           terminate those marked REQ_FASTFAIL. Is this acceptable ?
>>           (it meets the "always" condition above)
>>
>>           Q: so far we've limited the io to those w/ REQ_FASTFAIL.
>>             Would we ever want to allow a user to fast fail all i/o
>>             regardless of the request flags ? (in case they flags
>>             weren't getting set on all the i/o the user wanted to
>>             see fail ?)
> 
> REQ_FAILFAST appears to influence retries during error recovery so
> there may be unexpected side effects of doing this.  But, that said,
> I'd say yes.  From my perspective, I'd make this the default behavior.
> 
> In talking with our volume manager people, the question raised was
> "Why would you want some i/o to fail quickly and some not?"
> They even considered non-i/o scsi commands.
> 
> I think that if the default behavior is to have fast_io_fail_tmo
> enabled, then it should be controlled by REQ_FAILFAST in the
> request.  If the default is to have the timer disabled, i.e.,
> an admin has to enable it (or define when it's enabled) then
> when enabled it should apply to every i/o.  In examining the
> patch, it appears to be disabled by default, so our conclusion
> is that all i/o should fast fail when enabled.  We also concur
> with having fast_io_fail_tmo disabled by default.

I think this also implies that our volume manager guys would be
just as happy setting dev_loss_tmo to a small value and not use
fast_io_fail_tmo.

Mike

> 
> I guess this implies leave REQ_FAILFAST to error recovery.  :)
> 
> Mike
> 
> 
> 
>>           There's a desire to address pending i/o (those on the block
>>           request queue or new requests going there) so that if we've
>>           crossed point (b) that we also fail them.  The proposal is
>>           to add a new state (device ? or queue ?), which would occur
>>           as of point (b). All REQ_FASTFAIL io on the queue, as well
>>           as on a new io, will be failed with a new i/o status if in
>>           this state. Non-REQ_FASTFAIL i/o would continue to enter/sit
>>           on the request queue until dev_loss_tmo fires.
>>
>>   at (c): per the dev_loss_tmo callback, all i/o should be terminated.
>>           Their completions do not have to be synchronous to the return
>>           from the callback - they can occur afterward.
>>
>>
>> Comments ?
>>
>> Assuming that folks agree, I'd like to do this in 2 patches:
>>  - one that puts in the transport fast_io_fail_tmo and LLD callback
>>  - another that adds the new state, io completion status, and does the
>>    handling of the request queue REQ_FASTFAIL i/o.
>>
>>> - fast_loss_time recommendation:
>>>   In discussing how a admin should set dev_loss_tmo in a multipathing
>>>   environment, it became apparent that we expected the admin to know
>>>   a lot. They had to know the transport type, what the minimum setting
>>>   can be that still survives normal link bouncing, and they may even
>>>   have to know about device specifics.  For iSCSI, the proper loss time
>>>   may vary widely from session to session.
>>>
>>>   This attribute is an exported "recommendation" by the LLDD and
>>> transport
>>>   on what the lowest setting for dev_loss_tmo should be for a
>>> multipathing
>>>   environment. Thus, the admin only needs to cat this attribute to obtain
>>>   the value to echo into dev_loss_tmo.
>> The only objection was from Christoph - wanting a utility to get/set this
>> stuff. However, the counter was this attribute was still meaningful, as it
>> was the conduit to obtain a recommendation from the transport/LLD.
>>
>> So - I assume this proceeds as is - with a change in it's description.
>>
>>>  
>>> I have one criticism of these changes. The callbacks are calling into
>>> the LLDD with an rport post the driver's rport_delete call. What it means
>>> is that we are essentially extending the lifetime of an rport until the
>>> dev_loss_tmo call occurs.
>> It's ok - and adding the appropriate comments are fine.
>>
>>
>> Thanks.
>>
>> -- james s
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>

next prev parent reply	other threads:[~2006-08-08 22:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-20 18:45 [RFC] fc transport: extensions for fast fail and dev loss James Smart
2006-07-25 17:12 ` Mike Christie
2006-07-25 18:49   ` James Smart
2006-07-25 21:15     ` Michael Reed
2006-07-26  3:33       ` James Smart
2006-07-26  9:20 ` Christoph Hellwig
2006-07-26 16:35   ` James Smart
2006-08-08 17:54 ` [RFC] [Last Rites] " James Smart
2006-08-08 21:56   ` Michael Reed
2006-08-08 22:15     ` Michael Reed [this message]
2006-08-09 15:31       ` Michael Reed
2006-08-10 16:38         ` James Smart
2006-08-09 17:36   ` Christoph Hellwig
2006-08-10 16:17     ` James Smart
2006-08-10 20:01       ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44D90CE8.70006@sgi.com \
    --to=mdr@sgi.com \
    --cc=James.Smart@Emulex.Com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox