All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: emilne@redhat.com
Cc: Baruch Even <baruch@ev-en.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	michaelc <michaelc@cs.wisc.edu>
Subject: Re: [PATCH] scsi: Allow error handling timeout to be specified
Date: Fri, 10 May 2013 16:24:17 +0200	[thread overview]
Message-ID: <518D0311.9010208@suse.de> (raw)
In-Reply-To: <1368194460.3319.40.camel@localhost.localdomain>

On 05/10/2013 04:01 PM, Ewan Milne wrote:
> On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
>> On Fri, May 10, 2013 at 3:43 PM, Ewan Milne <emilne@redhat.com> wrote:
>>>
>>> On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
>>>> Introduce eh_timeout which can be used for error handling purposes. This
>>>> was previously hardcoded to 10 seconds in the SCSI error handling
>>>> code. However, for some fast-fail scenarios it is necessary to be able
>>>> to tune this as it can take several iterations (bus device, target, bus,
>>>> controller) before we give up.
>>>>
>>>> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
>>>>
>>>
>>> Thanks for posting this.  It will be very helpful to have this
>>> capability, particularly when alternate paths to the device exist.
>>>
>>> Acked-by: Ewan D. Milne <emilne@redhat.com>
>>
>>
>> I would argue that waiting for the eh to timeout before you switch to
>> another path is most likely to be wrong. If you did the first pass of
>> error recovery (task abort) and that failed the
>> path/hba/logical-device is doomed. If you will switch to another path
>> it will either work (meaning the path/hba were bad) or not (logical
>> device was the culprit).
> 
> It is necessary to either know the disposition of a command or
> else wait for a defined amount of time before retrying the command on
> another path.  Otherwise you run the risk that the command will
> eventually complete on the first path.  So yes, we need to do the abort
> (and its timeout).
> 
Strictly speaking that's not true.
Yes, we do need to wait for a certain amount of time for the command
completion to come in.

However, this time is only defined _on the initiator_.
The specification does _NOT_ have any fixed timeout values for _any_
command. As such it could in theory (and does, if you happen to run
against certain arrays under certain conditions) take several
minutes to return a completion.

So we have to accept that a command completion might happen in
between the time we take between deciding that a command abort has
to be send and the actual submission of the command abort by the
HBA. Which is totally independent of any command timeout we set.
It's just that a short command timeout increases the likelyhood of
the race to happen; the race itself is always present.

>>
>> Actually reducing the timeouts is probably not a good approach since
>> it will cause the host to take a more radical approach without waiting
>> sufficiently for a potential recovery. In addition the more radical
>> error handlings such as host reset will destroy other paths for
>> completely unrelated devices/links, from my experience a host reset is
>> usually not required and the Linux kernel currently reaches to this
>> big hammer too fast.
> 
> I believe that Hannes is working on a better error handling algorithm
> that e.g. does not cause an emulated bus reset in an FC environment
> by resetting all the targets (and affecting I/O to unrelated targets in
> the process).
> 
Yes, that was the idea.
Which I'll get down to eventually; if only customers wouldn't have
all these obnoxious issues no-one has ever seen...

And there is nothing wrong with reducing the timeout per se. It's
just that the current error recovery strategy isn't well equipped to
handle it :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2013-05-10 14:24 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-10  3:11 [PATCH] scsi: Allow error handling timeout to be specified Martin K. Petersen
2013-05-10  6:23 ` Bart Van Assche
2013-05-10 14:36   ` Martin K. Petersen
2013-05-10 12:43 ` Ewan Milne
2013-05-10 12:55   ` Hannes Reinecke
2013-05-10 13:09   ` Bryn M. Reeves
2013-05-10 13:22   ` Baruch Even
2013-05-10 14:01     ` Ewan Milne
2013-05-10 14:24       ` Hannes Reinecke [this message]
2013-05-10 14:31         ` Bryn M. Reeves
2013-05-10 16:59         ` Ewan Milne
2013-05-13 15:16           ` Elliott, Robert (Server Storage)
2013-05-10 17:51       ` Baruch Even
2013-05-10 20:18         ` Hannes Reinecke
2013-05-10 19:27           ` Baruch Even
2013-05-13  5:46             ` Hannes Reinecke
2013-05-13 14:40               ` Jeremy Linton
2013-05-13 15:03                 ` Hannes Reinecke
2013-05-13 15:58                   ` Jeremy Linton
2013-05-13 16:50                     ` Baruch Even
2013-05-13 20:29                     ` Martin K. Petersen
2013-05-13 21:01                       ` Jeremy Linton
2013-05-14 22:21                         ` Martin K. Petersen
     [not found]   ` <CAC9+anJ9Y-SnCOK6EOCavTNJwx=xhAbL_X__MsEsL7DroawaJg@mail.gmail.com>
2013-05-10 14:53     ` Martin K. Petersen
2013-05-10 15:27       ` Martin K. Petersen
2013-05-10 17:55       ` Baruch Even

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=518D0311.9010208@suse.de \
    --to=hare@suse.de \
    --cc=baruch@ev-en.org \
    --cc=emilne@redhat.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=michaelc@cs.wisc.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.