From: Hannes Reinecke <hare@suse.de>
To: James.Smart@emulex.com
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: error handler scheduling
Date: Wed, 27 Mar 2013 15:35:26 +0100 [thread overview]
Message-ID: <515303AE.3060605@suse.de> (raw)
In-Reply-To: <51525560.3000008@emulex.com>
On 03/27/2013 03:11 AM, James Smart wrote:
> In looking through the error handler, if a command times out and is
> added to the eh_cmd_q for the shost, the error handler is only
> awakened once shost->host_busy (total number of i/os posted to the
> shost) is equal to shost->host_failed (number of i/o that have been
> failed and put on the eh_cmd_q). Which means, any other i/o that
> was outstanding must either complete or have their timeout fire.
> Additionally, as all further i/o is held off at the block layer as
> the shost is in recovery, new i/o cannot be submitted until the
> error handler runs and resolves the errored i/os.
>
> Is this true ?
>
Yes.
> I take it is also true that the midlayer thus expects every i/o to
> have an i/o timeout. True ?
>
Yes. But this is guaranteed by the block-layer:
void blk_add_timer(struct request *req)
{
struct request_queue *q = req->q;
unsigned long expiry;
if (!q->rq_timed_out_fn)
return;
BUG_ON(!list_empty(&req->timeout_list));
BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags));
/*
* Some LLDs, like scsi, peek at the timeout to prevent a
* command from being retried forever.
*/
if (!req->timeout)
req->timeout = q->rq_timeout;
So every request will have a timeout, either the default
request_queue timeout or an individual one.
> The crux of this point is that when the recovery thread runs to
> aborts the timed out i/os, is at the mercy of the last command to
> complete or timeout. Additionally, as all further i/o is held off at
> the block layer as the shost is in recovery, new i/o cannot be
> submitted until the error handler runs and resolves the errored
> i/os. So all I/O on the host is stopped until that last i/o
> completes/times out. The timeouts may be eons later. Consider
> SCSI format commands or verify commands that can take hours to
> complete.
>
Yes, that's true. Unfortunately.
> Specifically, I'm in a situation currently, where an application is
> using sg to send a command to a target. The app selected no-timeout
> - by setting timeout to MAX_INT. Effectively it's so large its
> infinite. This I/O was one of those "lost" on the storage fabric.
> There was another command that long ago timed out and is sitting on
> the error handlers queue. But nothing is happening - new i/o, or
> error handler to resolve the failed i/o, until that inifinite i/o
> completes.
>
Hehe. no timeout != MAX_INT.
It's easy to apply a timeout if none is set. But how do we determine
what constitutes a valid timeout?
As mentioned, some command can literally take forever, _and_ being
fully legit. So who are we to decide?
> I'm hoping I hear that I just misunderstand things. If not, is
> there a suggestion for how to resolve this predicament ? IMHO,
> I'm surprised we stop all i/o for error handling, and that it can be
> so long later... I would assume there's a minimum bound we would
> wait in the error handler (30s?) before we unconditionally run it
> and abort anything that was outstanding.
>
Ah, the joys of error recovery.
Incidentally, that'll be one of the topics I'll be discussing at
LSF; I've been bitten by this on various other occasions.
AFAIK the reasoning behind the current error recovery strategy is
that it's modelled after SCSI parallel behaviour, where you
basically have to stop the entire bus, figure out which state it's
in, and then take corrective action.
And you typically don't have any LUNs to deal with.
_And_ SPI is essentially single-threaded when it comes to target
access, so in effect you cannot send commands over the bus when
resetting a target.
So there it makes sense.
Less so for modern fabrics, where target access is governed by an
I_T nexus, any of which is largely independent on others.
Actually there is another issue with the error handler:
The commands will only be release after eh is done.
If you look at the eh sequence
-> eh_abort
-> eh_lun_reset
-> eh_target_reset
-> eh_bus_reset
-> eh_host_reset
the command itself is only meaningful until lun_reset() has
completed; after lun_reset() the command is invalided.
Every other stage still uses the scsi command as an argument,
but only as a place holder to figure out which device it should act
upon.
So we _could_ speed up things by quite a lot when we were able to
call ->done() on the command after lun reset; then the command would
be returned to the upper layers.
And things like multipath could kick in an move I/O to other
devices.
However, this is a daunting task.
I've tried, and it's far from easy.
_Especially_ do to some FC HBAs insisting on using scmds for sending
TARGET RESET TMFs.
If we just could do a LOGO for target reset things would become so
much easier ...
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-03-27 14:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-27 2:11 error handler scheduling James Smart
2013-03-27 14:35 ` Hannes Reinecke [this message]
2013-04-02 7:43 ` Bhanu Prakash Gollapudi
2013-03-27 14:39 ` Douglas Gilbert
2013-03-28 16:02 ` Elliott, Robert (Server Storage)
2013-04-12 9:42 ` Ren Mingxin
2013-04-12 19:20 ` Baruch Even
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=515303AE.3060605@suse.de \
--to=hare@suse.de \
--cc=James.Smart@emulex.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).