Re: [Lsf] Notes from the four separate IO track sessions at LSF/MM

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bart Van Assche <bart.vanassche@sandisk.com>
To: "Knight, Frederick" <Frederick.Knight@netapp.com>,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	Mike Snitzer <snitzer@redhat.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"lsf@lists.linux-foundation.org" <lsf@lists.linux-foundation.org>,
	device-mapper development <dm-devel@redhat.com>,
	linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: [Lsf] Notes from the four separate IO track sessions at LSF/MM
Date: Thu, 28 Apr 2016 09:37:28 -0700	[thread overview]
Message-ID: <57223C48.2050509@sandisk.com> (raw)
In-Reply-To: <d8e572ba7c9a46aaa3af4c5703ec07e3@hioexcmbx07-prd.hq.netapp.com>

Hello Fred,

Your feedback is very useful, but please note that in my e-mail I used
the phrase "transport layer" to refer to the code in the Linux kernel in
which the fast_io_fail_tmo functionality has been implemented. The
following commit message from 10 years ago explains why the
fast_io_fail_tmo and dev_loss_tmo mechanisms have been implemented:

---------------------------------------------------------------------------
commit 0f29b966d60e9a4f5ecff9f3832257b38aea4f13
Author: James Smart <James.Smart@Emulex.Com>
Date:   Fri Aug 18 17:33:29 2006 -0400

    [SCSI] FC transport: Add dev_loss_tmo callbacks, and new fast_io_fail_tmo w/ callback
    
    This patch adds the following functionality to the FC transport:
    
    - dev_loss_tmo LLDD callback :
      Called to essentially confirm the deletion of an rport. Thus, it is
      called whenever the dev_loss_tmo fires, or when the rport is deleted
      due to other circumstances (module unload, etc).  It is expected that
      the callback will initiate the termination of any outstanding i/o on
      the rport.
    
    - fast_io_fail_tmo and LLD callback:
      There are some cases where it may take a long while to truly determine
      device loss, but the system is in a multipathing configuration that if
      the i/o was failed quickly (faster than dev_loss_tmo), it could be
      redirected to a different path and completed sooner.
    
    Many thanks to Mike Reed who cleaned up the initial RFC in support
    of this post.
---------------------------------------------------------------------------

Bart.

On 04/28/2016 09:19 AM, Knight, Frederick wrote:
> There are multiple possible situations being intermixed in this discussion.
> First, I assume you're talking only about random access devices (if you try
> transport level error recover on a sequential access device - tape or SMR
> disk - there are lots of additional complexities).
> 
> Failures can occur at multiple places:
> a) Transport layer failures that the transport layer is able to detect quickly;
> b) SCSI device layer failures that the transport layer never even knows about.
> 
> For (a) there are two competing goals.  If a port drops off the fabric and
> comes back again, should you be able to just recover and continue.  But how
> long do you wait during that drop?  Some devices use this technique to "move"
> a WWPN from one place to another.  The port drops from the fabric, and a
> short time later, shows up again (the WWPN moves from one physical port to a
> different physical port). There are FC driver layer timers that define the
> length of time allowed for this operation.  The goal is fast failover, but
> not too fast - because too fast will break this kind of "transparent failover".
> This timer also allows for the "OH crap, I pulled the wrong cable - put it
> back in; quick" kind of stupid user bug.
> 
> For (b) the transport never has a failure.  A LUN (or a group of LUNs)
> have an ALUA transition from one set of ports to a different set of ports.
> Some of the LUNs on the port continue to work just fine, but others enter
> ALUA TRANSITION state so they can "move" to a different part of the hardware.
> After the move completes, you now have different sets of optimized and
> non-optimized paths (or possible standby, or unavailable).  The transport
> will never even know this happened.  This kind of "failure" is handled by
> the SCSI layer drivers.
> 
> There are other cases too, but these are the most common.
> 
> 	Fred
> 
> -----Original Message-----
> From: lsf-bounces@lists.linux-foundation.org [mailto:lsf-bounces@lists.linux-foundation.org] On Behalf Of Bart Van Assche
> Sent: Thursday, April 28, 2016 11:54 AM
> To: James Bottomley; Mike Snitzer
> Cc: linux-block@vger.kernel.org; lsf@lists.linux-foundation.org; device-mapper development; linux-scsi
> Subject: Re: [Lsf] Notes from the four separate IO track sessions at LSF/MM
> 
> On 04/28/2016 08:40 AM, James Bottomley wrote:
>> Well, the entire room, that's vendors, users and implementors
>> complained that path failover takes far too long.  I think in their
>> minds this is enough substance to go on.
> 
> The only complaints I heard about path failover taking too long came
> from people working on FC drivers. Aren't SCSI transport layer
> implementations expected to fail I/O after fast_io_fail_tmo expired
> instead of waiting until the SCSI error handler has finished? If so, why
> is it considered an issue that error handling for the FC protocol can
> take very long (hours)?
> 
> Thanks,
> 
> Bart.

next prev parent reply	other threads:[~2016-04-28 16:37 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-27 23:39 Notes from the four separate IO track sessions at LSF/MM James Bottomley
2016-04-28 12:11 ` Mike Snitzer
2016-04-28 15:40   ` James Bottomley
2016-04-28 15:53     ` [Lsf] " Bart Van Assche
2016-04-28 16:19       ` Knight, Frederick
2016-04-28 16:37         ` Bart Van Assche [this message]
2016-04-28 17:33         ` James Bottomley
2016-04-28 16:23       ` Laurence Oberman
2016-04-28 16:41         ` [dm-devel] " Bart Van Assche
2016-04-28 16:47           ` Laurence Oberman
2016-04-29 21:47             ` Laurence Oberman
2016-04-29 21:51               ` Laurence Oberman
2016-04-30  0:36               ` Bart Van Assche
2016-04-30  0:47                 ` Laurence Oberman
2016-05-02 18:49                   ` Bart Van Assche
2016-05-02 19:28                     ` Laurence Oberman
2016-05-02 22:28                       ` Bart Van Assche
2016-05-03 17:44                         ` Laurence Oberman
2016-05-26  2:38     ` bio-based DM multipath is back from the dead [was: Re: Notes from the four separate IO track sessions at LSF/MM] Mike Snitzer
2016-05-27  8:39       ` Hannes Reinecke
2016-05-27  8:39         ` Hannes Reinecke
2016-05-27 14:44         ` Mike Snitzer
2016-05-27 15:42           ` Hannes Reinecke
2016-05-27 15:42             ` Hannes Reinecke
2016-05-27 16:10             ` Mike Snitzer
2016-04-29 16:45 ` [dm-devel] Notes from the four separate IO track sessions at LSF/MM Benjamin Marzinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57223C48.2050509@sandisk.com \
    --to=bart.vanassche@sandisk.com \
    --cc=Frederick.Knight@netapp.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lsf@lists.linux-foundation.org \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.