Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: Christoph Hellwig <hch@infradead.org>
To: James Smart <James.Smart@Emulex.Com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
Date: Wed, 9 Aug 2006 18:36:43 +0100	[thread overview]
Message-ID: <20060809173643.GA22969@infradead.org> (raw)
In-Reply-To: <44D8CFD3.9050408@emulex.com>

On Tue, Aug 08, 2006 at 01:54:27PM -0400, James Smart wrote:
> >Here's what is contained:
> >
> >- dev_loss_tmo LLDD callback :
> >  Currently, there is no notification to the LLDD of when the transport
> >  gives up on the device returning and starts to return DID_NO_CONNECT
> >  in the queuecommand helper function. This callback notifies the LLDD
> >  that the transport has now given up on the rport, thereby acknowledging
> >  the prior fc_remote_port_delete() call. The callback also expects the
> >  LLDD to initiate the termination of any outstanding i/o on the rport.
> 
> I believe there is no dissention on this change.
> Please note: this is essentially a confirmation from the transport to the
>   LLD that the rport is fully deleted. Thus, the LLD must expect to see
>   these callbacks as a normal part of the rport being terminated (even if
>   it is not blocked).
> 
> I'll move forward with this.

ACK.

> >- fast_io_fail_tmo and LLD callback:
> >  There are some cases where it may take a long while to truly determine
> >  device loss, but the system is in a multipathing configuration that if
> >  the i/o was failed quickly (faster than dev_loss_tmo), it could be
> >  redirected to a different path and completed sooner (assuming the 
> >  multipath thing knew that the sdev was blocked).
> >  
> >  iSCSI is one of the transports that may vary dev_loss_tmo values
> >  per session, and you would like fast io failure.
> 
> 
> The current transport implementation did not specify what happened to
>   active i/o (given to the driver, in the adapter, but not yet completed
>   back to the midlayer) when a device was blocked, nor during the
>   block-to->dev_loss transition period. It was up to the driver.  Many
>   assumed active i/o was immediately terminated, which is semi-consistent
>   with the behavior of most drivers for most "connectivity loss" scenarios.
> 
> The conversations then started to jump around, considering what i/o's you
>   may want to have fail quickly, etc.
> 
> Here's my opinion:
>   We have the following points in time to look at:
>    (a) the device is blocked by the transport
>    (b) there is a time T, usually in a multipathing environment, where it
>        would be useful to error the i/o early rather than wait for dev_loss
>        It is assumed that any such i/o request would be marked REQ_FASTFAIL
>    (c) the dev_loss_tmo fires - we're to assume the device is gone
>   and at any time post (a), the device may return, unblock and never
>   encounter points (b) and (c).
> 
>   As for what happens to active i/o :
> 
>   always: the driver can fail an i/o at any point in time if it deems
>           it appropriate.
> 
>   at (a): There are scenarios where a short link perturbation may occur,
>           which may not disrupt the i/o. Therefore, we should not force
>           io to be terminated.

Ok..

> 
>   at (b): Minimally, we should terminate all active i/o requests marked
>           as type REQ_FASTFAIL. From an api perspective, driver support
>           for this is optional. And we must also assume that there will
>           be implementations which have to abort all i/o in order to
>           terminate those marked REQ_FASTFAIL. Is this acceptable ?
>           (it meets the "always" condition above)
> 
>           Q: so far we've limited the io to those w/ REQ_FASTFAIL.
>             Would we ever want to allow a user to fast fail all i/o
>             regardless of the request flags ? (in case they flags
>             weren't getting set on all the i/o the user wanted to
>             see fail ?)

I think we should fail all.  It's not like an unprivilegued process could
request FASTFAIL.  The administrator should know what she/he is doing.

>           There's a desire to address pending i/o (those on the block
>           request queue or new requests going there) so that if we've
>           crossed point (b) that we also fail them.  The proposal is
>           to add a new state (device ? or queue ?), which would occur
>           as of point (b). All REQ_FASTFAIL io on the queue, as well
>           as on a new io, will be failed with a new i/o status if in
>           this state. Non-REQ_FASTFAIL i/o would continue to enter/sit
>           on the request queue until dev_loss_tmo fires.

We have a queue per device, so adding another scsi_device state sound
like the right way to go aheade.

>   at (c): per the dev_loss_tmo callback, all i/o should be terminated.
>           Their completions do not have to be synchronous to the return
>           from the callback - they can occur afterward.

ACK.



> >- fast_loss_time recommendation:
> >  In discussing how a admin should set dev_loss_tmo in a multipathing
> >  environment, it became apparent that we expected the admin to know
> >  a lot. They had to know the transport type, what the minimum setting
> >  can be that still survives normal link bouncing, and they may even
> >  have to know about device specifics.  For iSCSI, the proper loss time
> >  may vary widely from session to session.
> >
> >  This attribute is an exported "recommendation" by the LLDD and transport
> >  on what the lowest setting for dev_loss_tmo should be for a multipathing
> >  environment. Thus, the admin only needs to cat this attribute to obtain
> >  the value to echo into dev_loss_tmo.
> 
> The only objection was from Christoph - wanting a utility to get/set this
> stuff. However, the counter was this attribute was still meaningful, as it
> was the conduit to obtain a recommendation from the transport/LLD.
> 
> So - I assume this proceeds as is - with a change in it's description.

I must say I'm still not happy with this.  It's really policy that we
try to keep out of the kernel.

next prev parent reply	other threads:[~2006-08-09 17:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-20 18:45 [RFC] fc transport: extensions for fast fail and dev loss James Smart
2006-07-25 17:12 ` Mike Christie
2006-07-25 18:49   ` James Smart
2006-07-25 21:15     ` Michael Reed
2006-07-26  3:33       ` James Smart
2006-07-26  9:20 ` Christoph Hellwig
2006-07-26 16:35   ` James Smart
2006-08-08 17:54 ` [RFC] [Last Rites] " James Smart
2006-08-08 21:56   ` Michael Reed
2006-08-08 22:15     ` Michael Reed
2006-08-09 15:31       ` Michael Reed
2006-08-10 16:38         ` James Smart
2006-08-09 17:36   ` Christoph Hellwig [this message]
2006-08-10 16:17     ` James Smart
2006-08-10 20:01       ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060809173643.GA22969@infradead.org \
    --to=hch@infradead.org \
    --cc=James.Smart@Emulex.Com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox