All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Smart <James.Smart@emulex.com>
To: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: error handler scheduling
Date: Tue, 26 Mar 2013 22:11:44 -0400	[thread overview]
Message-ID: <51525560.3000008@emulex.com> (raw)

In looking through the error handler, if a command times out and is 
added to the eh_cmd_q for the shost, the error handler is only awakened 
once shost->host_busy (total number of i/os posted to the shost) is 
equal to shost->host_failed (number of i/o that have been failed and put 
on the eh_cmd_q).  Which means, any other i/o that was outstanding must 
either complete or have their timeout fire.  Additionally, as all 
further i/o is held off at the block layer as the shost is in recovery, 
new i/o cannot be submitted until the error handler runs and resolves 
the errored i/os.

Is this true ?

I take it is also true that the midlayer thus expects every i/o to have 
an i/o timeout.  True ?

The crux of this point is that when the recovery thread runs to aborts 
the timed out i/os, is at the mercy of the last command to complete or 
timeout. Additionally, as all further i/o is held off at the block layer 
as the shost is in recovery, new i/o cannot be submitted until the error 
handler runs and resolves the errored i/os. So all I/O on the host is 
stopped until that last i/o completes/times out.   The timeouts may be 
eons later.  Consider SCSI format commands or verify commands that can 
take hours to complete.

Specifically, I'm in a situation currently, where an application is 
using sg to send a command to a target. The app selected no-timeout - by 
setting timeout to MAX_INT. Effectively it's so large its infinite.   
This I/O was one of those "lost" on the storage fabric. There was 
another command that long ago timed out and is sitting on the error 
handlers queue. But nothing is happening - new i/o, or error handler to 
resolve the failed i/o, until that inifinite i/o completes.

I'm hoping I hear that I just misunderstand things.  If not,  is there a 
suggestion for how to resolve this predicament ?    IMHO, I'm surprised 
we stop all i/o for error handling, and that it can be so long later...  
I would assume there's a minimum bound we would wait in the error 
handler (30s?) before we unconditionally run it and abort anything that 
was outstanding.

-- james s

             reply	other threads:[~2013-03-27  2:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-27  2:11 James Smart [this message]
2013-03-27 14:35 ` error handler scheduling Hannes Reinecke
2013-04-02  7:43   ` Bhanu Prakash Gollapudi
2013-03-27 14:39 ` Douglas Gilbert
2013-03-28 16:02   ` Elliott, Robert (Server Storage)
2013-04-12  9:42     ` Ren Mingxin
2013-04-12 19:20       ` Baruch Even

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51525560.3000008@emulex.com \
    --to=james.smart@emulex.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.