Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christoph Hellwig <hch@infradead.org>
To: James Smart <James.Smart@Emulex.Com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss
Date: Wed, 9 Aug 2006 18:36:43 +0100	[thread overview]
Message-ID: <20060809173643.GA22969@infradead.org> (raw)
In-Reply-To: <44D8CFD3.9050408@emulex.com>

On Tue, Aug 08, 2006 at 01:54:27PM -0400, James Smart wrote:
> >Here's what is contained:
> >
> >- dev_loss_tmo LLDD callback :
> >  Currently, there is no notification to the LLDD of when the transport
> >  gives up on the device returning and starts to return DID_NO_CONNECT
> >  in the queuecommand helper function. This callback notifies the LLDD
> >  that the transport has now given up on the rport, thereby acknowledging
> >  the prior fc_remote_port_delete() call. The callback also expects the
> >  LLDD to initiate the termination of any outstanding i/o on the rport.
> 
> I believe there is no dissention on this change.
> Please note: this is essentially a confirmation from the transport to the
>   LLD that the rport is fully deleted. Thus, the LLD must expect to see
>   these callbacks as a normal part of the rport being terminated (even if
>   it is not blocked).
> 
> I'll move forward with this.

ACK.

> >- fast_io_fail_tmo and LLD callback:
> >  There are some cases where it may take a long while to truly determine
> >  device loss, but the system is in a multipathing configuration that if
> >  the i/o was failed quickly (faster than dev_loss_tmo), it could be
> >  redirected to a different path and completed sooner (assuming the 
> >  multipath thing knew that the sdev was blocked).
> >  
> >  iSCSI is one of the transports that may vary dev_loss_tmo values
> >  per session, and you would like fast io failure.
> 
> 
> The current transport implementation did not specify what happened to
>   active i/o (given to the driver, in the adapter, but not yet completed
>   back to the midlayer) when a device was blocked, nor during the
>   block-to->dev_loss transition period. It was up to the driver.  Many
>   assumed active i/o was immediately terminated, which is semi-consistent
>   with the behavior of most drivers for most "connectivity loss" scenarios.
> 
> The conversations then started to jump around, considering what i/o's you
>   may want to have fail quickly, etc.
> 
> Here's my opinion:
>   We have the following points in time to look at:
>    (a) the device is blocked by the transport
>    (b) there is a time T, usually in a multipathing environment, where it
>        would be useful to error the i/o early rather than wait for dev_loss
>        It is assumed that any such i/o request would be marked REQ_FASTFAIL
>    (c) the dev_loss_tmo fires - we're to assume the device is gone
>   and at any time post (a), the device may return, unblock and never
>   encounter points (b) and (c).
> 
>   As for what happens to active i/o :
> 
>   always: the driver can fail an i/o at any point in time if it deems
>           it appropriate.
> 
>   at (a): There are scenarios where a short link perturbation may occur,
>           which may not disrupt the i/o. Therefore, we should not force
>           io to be terminated.

Ok..

> 
>   at (b): Minimally, we should terminate all active i/o requests marked
>           as type REQ_FASTFAIL. From an api perspective, driver support
>           for this is optional. And we must also assume that there will
>           be implementations which have to abort all i/o in order to
>           terminate those marked REQ_FASTFAIL. Is this acceptable ?
>           (it meets the "always" condition above)
> 
>           Q: so far we've limited the io to those w/ REQ_FASTFAIL.
>             Would we ever want to allow a user to fast fail all i/o
>             regardless of the request flags ? (in case they flags
>             weren't getting set on all the i/o the user wanted to
>             see fail ?)

I think we should fail all.  It's not like an unprivilegued process could
request FASTFAIL.  The administrator should know what she/he is doing.

>           There's a desire to address pending i/o (those on the block
>           request queue or new requests going there) so that if we've
>           crossed point (b) that we also fail them.  The proposal is
>           to add a new state (device ? or queue ?), which would occur
>           as of point (b). All REQ_FASTFAIL io on the queue, as well
>           as on a new io, will be failed with a new i/o status if in
>           this state. Non-REQ_FASTFAIL i/o would continue to enter/sit
>           on the request queue until dev_loss_tmo fires.

We have a queue per device, so adding another scsi_device state sound
like the right way to go aheade.

>   at (c): per the dev_loss_tmo callback, all i/o should be terminated.
>           Their completions do not have to be synchronous to the return
>           from the callback - they can occur afterward.

ACK.



> >- fast_loss_time recommendation:
> >  In discussing how a admin should set dev_loss_tmo in a multipathing
> >  environment, it became apparent that we expected the admin to know
> >  a lot. They had to know the transport type, what the minimum setting
> >  can be that still survives normal link bouncing, and they may even
> >  have to know about device specifics.  For iSCSI, the proper loss time
> >  may vary widely from session to session.
> >
> >  This attribute is an exported "recommendation" by the LLDD and transport
> >  on what the lowest setting for dev_loss_tmo should be for a multipathing
> >  environment. Thus, the admin only needs to cat this attribute to obtain
> >  the value to echo into dev_loss_tmo.
> 
> The only objection was from Christoph - wanting a utility to get/set this
> stuff. However, the counter was this attribute was still meaningful, as it
> was the conduit to obtain a recommendation from the transport/LLD.
> 
> So - I assume this proceeds as is - with a change in it's description.

I must say I'm still not happy with this.  It's really policy that we
try to keep out of the kernel.

next prev parent reply	other threads:[~2006-08-09 17:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-20 18:45 [RFC] fc transport: extensions for fast fail and dev loss James Smart
2006-07-25 17:12 ` Mike Christie
2006-07-25 18:49   ` James Smart
2006-07-25 21:15     ` Michael Reed
2006-07-26  3:33       ` James Smart
2006-07-26  9:20 ` Christoph Hellwig
2006-07-26 16:35   ` James Smart
2006-08-08 17:54 ` [RFC] [Last Rites] " James Smart
2006-08-08 21:56   ` Michael Reed
2006-08-08 22:15     ` Michael Reed
2006-08-09 15:31       ` Michael Reed
2006-08-10 16:38         ` James Smart
2006-08-09 17:36   ` Christoph Hellwig [this message]
2006-08-10 16:17     ` James Smart
2006-08-10 20:01       ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060809173643.GA22969@infradead.org \
    --to=hch@infradead.org \
    --cc=James.Smart@Emulex.Com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.