All of lore.kernel.org
 help / color / mirror / Atom feed
From: Douglas Gilbert <dgilbert@interlog.com>
To: James.Smart@emulex.com
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: error handler scheduling
Date: Wed, 27 Mar 2013 10:39:07 -0400	[thread overview]
Message-ID: <5153048B.20905@interlog.com> (raw)
In-Reply-To: <51525560.3000008@emulex.com>

On 13-03-26 10:11 PM, James Smart wrote:
> In looking through the error handler, if a command times out and is added to the
> eh_cmd_q for the shost, the error handler is only awakened once shost->host_busy
> (total number of i/os posted to the shost) is equal to shost->host_failed
> (number of i/o that have been failed and put on the eh_cmd_q).  Which means, any
> other i/o that was outstanding must either complete or have their timeout fire.
> Additionally, as all further i/o is held off at the block layer as the shost is
> in recovery, new i/o cannot be submitted until the error handler runs and
> resolves the errored i/os.
>
> Is this true ?
>
> I take it is also true that the midlayer thus expects every i/o to have an i/o
> timeout.  True ?
>
> The crux of this point is that when the recovery thread runs to aborts the timed
> out i/os, is at the mercy of the last command to complete or timeout.
> Additionally, as all further i/o is held off at the block layer as the shost is
> in recovery, new i/o cannot be submitted until the error handler runs and
> resolves the errored i/os. So all I/O on the host is stopped until that last i/o
> completes/times out.   The timeouts may be eons later.  Consider SCSI format
> commands or verify commands that can take hours to complete.
>
> Specifically, I'm in a situation currently, where an application is using sg to
> send a command to a target. The app selected no-timeout - by setting timeout to
> MAX_INT. Effectively it's so large its infinite. This I/O was one of those
> "lost" on the storage fabric. There was another command that long ago timed out
> and is sitting on the error handlers queue. But nothing is happening - new i/o,
> or error handler to resolve the failed i/o, until that inifinite i/o completes.
>
> I'm hoping I hear that I just misunderstand things.  If not,  is there a
> suggestion for how to resolve this predicament ?    IMHO, I'm surprised we stop
> all i/o for error handling, and that it can be so long later... I would assume
> there's a minimum bound we would wait in the error handler (30s?) before we
> unconditionally run it and abort anything that was outstanding.

James,
After many encounters with the Linux SCSI mid-level error
handler I have concluded it is uncontrollable and
seemingly random, seen from the user space. Interestingly,
several attempts to add finer grained controls over
lu/target/host resets have been rebuffed.

So my policy is to avoid timeout induced resets (like the
plague). Hence the default with sg_format is to set the IMMED
bit and use TEST UNIT READY or REQUEST SENSE polling to
monitor progress **. With commands like VERIFY, send many
reasonably sized commands, not one big one. And a special
mention for the SCSI WRITE SAME command which probably
has T10's silliest definition: if the NUMBER OF
LOGICAL BLOCKS field is set to zero it means keep writing
until the end of the disk *** and that might be 20 hours
later! The equivalent field set to zero in a SCSI VERIFY
or WRITE *** command means do nothing.

Doug Gilbert


**   You can still run into problems when a SCSI FORMAT UNIT
      with the IMMED bit set: some other kernel subsystem or
      user space program may decide to send a SCSI command to the
      disk during format. Then said code may not comprehend why
      the disk in question is not ready and ends up triggering
      mid-level error handling which blows the format out of
      the water. That leaves the disk in the "format corrupt"
      state.

***  recently the Block Limits VPD has (knee-)capped this
      with the WSNZ bit

**** apart from the obsolete WRITE(6) command which found
      another non obvious interpretation for a zero transfer
      length

  parent reply	other threads:[~2013-03-27 14:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-27  2:11 error handler scheduling James Smart
2013-03-27 14:35 ` Hannes Reinecke
2013-04-02  7:43   ` Bhanu Prakash Gollapudi
2013-03-27 14:39 ` Douglas Gilbert [this message]
2013-03-28 16:02   ` Elliott, Robert (Server Storage)
2013-04-12  9:42     ` Ren Mingxin
2013-04-12 19:20       ` Baruch Even

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5153048B.20905@interlog.com \
    --to=dgilbert@interlog.com \
    --cc=James.Smart@emulex.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.