All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Vasquez <andrew.vasquez@qlogic.com>
To: linux-scsi@vger.kernel.org
Cc: andrew.vasquez@qlogic.com
Subject: Mid-layer handling of NOT_READY conditions...
Date: Fri, 28 Jan 2005 15:24:10 -0800	[thread overview]
Message-ID: <1106954650.9862.61.camel@plap> (raw)

[PREFACE: Please forgive the rather long absence on linux-scsi, I've
been occupied by several non-related projects]


All,

While stripping out the remnants of internal queuing from the qla2xxx
driver and adding-in support for various fc_host/fc_remote constructs,
I've ran into a rather peculiar problem with respect to the way the SCSI
mid-layer handles NOT_READY conditions (notably ASC 0x04 and ASCQ 0x01).

I was doing simple short-duration cable-pulls when I noticed I/O errors
would occur at unexpected times as the storage returned to the topology.
The simplest case goes like this:

      * Issue I/O to device A
      * Device A falls off the topology 
      * Driver (qla2xxx) blocks additional requests to device A via
        fc_remote_port_block() 
      * Short time later (couple of seconds) device A returns to
        topology
      * Driver logs-into device and unblocks requests via
        fc_remote_port_unblock().
      * I/O resumes

The storage still unable to process the commands returns
check-conditions (please excuse the crude printk()s):

        *** check 1148/1/5 [1:0] sdev_st=2 status=2 [6/29/0].
        *** check 1149/1/5 [1:0] sdev_st=2 status=2 [2/4/1].
        scsi_decide_disposition: sc 0 RETRY incremented 2/5
        *** check 1150/2/5 [1:0] sdev_st=2 status=2 [2/4/1].
        scsi_decide_disposition: sc 0 RETRY incremented 3/5
        *** check 1151/3/5 [1:0] sdev_st=2 status=2 [2/4/1].
        scsi_decide_disposition: sc 0 RETRY incremented 4/5
        *** check 1152/4/5 [1:0] sdev_st=2 status=2 [2/4/1].
        
while scsi_decide_disposition() agrees to retry the commands since
cmd->retries < cmd->allowed.  But when NOT_READYs persists beyond
cmd->allowed, scsi_decide_disposition() returns SUCCESS:

        scsi_decide_disposition: sc 0 2 SUCCESS 6/5 [2/4/1]

and the command then begins additional processing via:

        scsi_finish_command()
          sd_rw_itr()
            scsi_io_completion()
        
at which point, the following check is made:

        ...
        /*
         * If the device is in the process of becoming ready,
         * retry.
         */
        if (sshdr.asc == 0x04 && sshdr.ascq == 0x01) {
                scsi_requeue_command(q, cmd);
                return;
        }
        
and the command is requeued to the request-q via blk_insert_request()
and started again with:

        q->request_fn()
          scsi_request_fn()
            scsi_dispatch_cmd()
        
There seems to be two problem with this approach:

     1. As the storage continues to return NOT_READY,
        scsi_decide_disposition() blindly increments cmd->retries and
        checks against cmd->allowed, returning SUCCESS (since at this
        point cmd->retries is always greater than cmd->allowed) -- I've
        seen this condition loop several hundred times while the
        NOT_READY condition clears.
     2. as a result of the (cmd->retries > cmd->allowed) state of the
        command, if a LLDD returns any status (other than DID_OK) which
        could initiate a retry, the command is immediately failed.  As
        an example, the qla2xxx driver returns DID_BUS_BUSY in case of
        any 'transport' related problems during the exchange (dropped
        frames, FCP protocal failures, etc.).

When the qla2xxx driver managed command queuing internally, a NOT_READY
status would cause the lun-queue to be frozen for some period time while
the storage settled-down.

Would this be an approach to consider?  Or should we tackle the problem
by addressing the quirky (cmd->retries > cmd->allowed) state?

Thanks,
Andrew Vasquez

             reply	other threads:[~2005-01-28 23:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-28 23:24 Andrew Vasquez [this message]
2005-01-29  5:46 ` Mid-layer handling of NOT_READY conditions Andrew Vasquez
2005-01-29 16:16   ` Matthew Wilcox
2005-01-29 16:44   ` James Bottomley
2005-01-29 19:34     ` Patrick Mansfield
2005-01-30  1:40       ` James Bottomley
2005-01-30  2:33       ` Douglas Gilbert
2005-01-31  7:47       ` Andrew Vasquez
  -- strict thread matches above, loose matches on Subject: below --
2005-01-31  9:46 Mid-Layer handling of NOT READY conditions EXT / DEVOTEAM VAROQUI Christophe
2005-01-31 14:07 goggin, edward
2005-01-31 16:56 Mid-layer handling of NOT_READY conditions James.Smart
2005-01-31 17:36 ` Patrick Mansfield
2005-02-01  7:21   ` Andrew Vasquez
2005-01-31 18:22 ` Andrew Vasquez
2005-01-31 19:07 James.Smart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1106954650.9862.61.camel@plap \
    --to=andrew.vasquez@qlogic.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.