public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Mid-layer handling of NOT_READY conditions...
@ 2005-01-28 23:24 Andrew Vasquez
  2005-01-29  5:46 ` Andrew Vasquez
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Vasquez @ 2005-01-28 23:24 UTC (permalink / raw)
  To: linux-scsi; +Cc: andrew.vasquez

[PREFACE: Please forgive the rather long absence on linux-scsi, I've
been occupied by several non-related projects]


All,

While stripping out the remnants of internal queuing from the qla2xxx
driver and adding-in support for various fc_host/fc_remote constructs,
I've ran into a rather peculiar problem with respect to the way the SCSI
mid-layer handles NOT_READY conditions (notably ASC 0x04 and ASCQ 0x01).

I was doing simple short-duration cable-pulls when I noticed I/O errors
would occur at unexpected times as the storage returned to the topology.
The simplest case goes like this:

      * Issue I/O to device A
      * Device A falls off the topology 
      * Driver (qla2xxx) blocks additional requests to device A via
        fc_remote_port_block() 
      * Short time later (couple of seconds) device A returns to
        topology
      * Driver logs-into device and unblocks requests via
        fc_remote_port_unblock().
      * I/O resumes

The storage still unable to process the commands returns
check-conditions (please excuse the crude printk()s):

        *** check 1148/1/5 [1:0] sdev_st=2 status=2 [6/29/0].
        *** check 1149/1/5 [1:0] sdev_st=2 status=2 [2/4/1].
        scsi_decide_disposition: sc 0 RETRY incremented 2/5
        *** check 1150/2/5 [1:0] sdev_st=2 status=2 [2/4/1].
        scsi_decide_disposition: sc 0 RETRY incremented 3/5
        *** check 1151/3/5 [1:0] sdev_st=2 status=2 [2/4/1].
        scsi_decide_disposition: sc 0 RETRY incremented 4/5
        *** check 1152/4/5 [1:0] sdev_st=2 status=2 [2/4/1].
        
while scsi_decide_disposition() agrees to retry the commands since
cmd->retries < cmd->allowed.  But when NOT_READYs persists beyond
cmd->allowed, scsi_decide_disposition() returns SUCCESS:

        scsi_decide_disposition: sc 0 2 SUCCESS 6/5 [2/4/1]

and the command then begins additional processing via:

        scsi_finish_command()
          sd_rw_itr()
            scsi_io_completion()
        
at which point, the following check is made:

        ...
        /*
         * If the device is in the process of becoming ready,
         * retry.
         */
        if (sshdr.asc == 0x04 && sshdr.ascq == 0x01) {
                scsi_requeue_command(q, cmd);
                return;
        }
        
and the command is requeued to the request-q via blk_insert_request()
and started again with:

        q->request_fn()
          scsi_request_fn()
            scsi_dispatch_cmd()
        
There seems to be two problem with this approach:

     1. As the storage continues to return NOT_READY,
        scsi_decide_disposition() blindly increments cmd->retries and
        checks against cmd->allowed, returning SUCCESS (since at this
        point cmd->retries is always greater than cmd->allowed) -- I've
        seen this condition loop several hundred times while the
        NOT_READY condition clears.
     2. as a result of the (cmd->retries > cmd->allowed) state of the
        command, if a LLDD returns any status (other than DID_OK) which
        could initiate a retry, the command is immediately failed.  As
        an example, the qla2xxx driver returns DID_BUS_BUSY in case of
        any 'transport' related problems during the exchange (dropped
        frames, FCP protocal failures, etc.).

When the qla2xxx driver managed command queuing internally, a NOT_READY
status would cause the lun-queue to be frozen for some period time while
the storage settled-down.

Would this be an approach to consider?  Or should we tackle the problem
by addressing the quirky (cmd->retries > cmd->allowed) state?

Thanks,
Andrew Vasquez

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Mid-Layer handling of NOT READY conditions...
@ 2005-01-31  9:46 EXT / DEVOTEAM VAROQUI Christophe
  0 siblings, 0 replies; 15+ messages in thread
From: EXT / DEVOTEAM VAROQUI Christophe @ 2005-01-31  9:46 UTC (permalink / raw)
  To: 'linux-scsi@vger.kernel.org'

Hello,

a related discussion is ongoing in the multipath area.

Edward Goggin, EMC, noted a LU can be remapped anytime, by way of an HBA
recabling, or a LUN masking reconfiguration or a LU renaming. If IO are
queued to sda (LU id == 1234), sda goes unreachable (say because of a cable
unplug), when sda come back online it may not be 1234 anymore but 4321.
Requeuing the IO to sda will then silently corrupt 4321.

There needs to be a way to ask the HBA driver to error out immediatly IO
submitted to sda.
Can you confirm a "timeout" driver module parameter set to 0 can bring us
this behaviour ?

regards,
cvaroqui

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Mid-Layer handling of NOT READY conditions...
@ 2005-01-31 14:07 goggin, edward
  0 siblings, 0 replies; 15+ messages in thread
From: goggin, edward @ 2005-01-31 14:07 UTC (permalink / raw)
  To: 'linux-scsi@vger.kernel.org'
  Cc: 'EXT / DEVOTEAM VAROQUI Christophe'

 

> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org 
> [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of EXT / 
> DEVOTEAM VAROQUI Christophe
> Sent: Monday, January 31, 2005 4:46 AM
> To: 'linux-scsi@vger.kernel.org'
> Subject: Re: Mid-Layer handling of NOT READY conditions...
> 
> Hello,
> 
> a related discussion is ongoing in the multipath area.
> 
> Edward Goggin, EMC, noted a LU can be remapped anytime, by 
> way of an HBA
> recabling, or a LUN masking reconfiguration or a LU renaming. 
> If IO are
> queued to sda (LU id == 1234), sda goes unreachable (say 
> because of a cable
> unplug), when sda come back online it may not be 1234 anymore 
> but 4321.
> Requeuing the IO to sda will then silently corrupt 4321.
> 
> There needs to be a way to ask the HBA driver to error out 
> immediatly IO
> submitted to sda.
> Can you confirm a "timeout" driver module parameter set to 0 
> can bring us
> this behaviour ?
> 

Looks like the SCSI mid-layer may be already doing __most__ of the
"right thing" - but (1) the right thing isn't happening for EMC CLARiion
or Symmetrix logical units (maybe other storage also) because they are
not being treated as "removable" units by the mid-layer and (2) there
Does not seem to be any refresh of the cached inquiry data
(vendor/model/rev) in the mid-layer's scsi_device data structure.
Not clear to me if these storage systems should be setting the RMB
bit of the standard inquiry reply (which CLARiion and Symmetrix are
not doing) or if the linux SCSI mid-layer should just be treating all
SAN storage units as removable units, independent of the state of this bit.

The SCSI mid-layer (scsi_io_completion()) detects a SCSI sense key of
UNIT_ATTENTION after most any attempt to access the SCSI logical unit
With the "new" identity for the first time.  As long as the logical unit
is viewed as "removable" media, the most of right thing happens, namely,
the I/O in question is failed and the device is marked to prevent any
further I/O.  Apparently calling check_disk_change() from the next
sd_open() will at least clear the changed field of the scsi_device,
thereby allowing I/O to the device.  But, the cached inquiry fields
are not updated (possibly via scsi_probe_lun()) to reflect the
possibly new device identity.

> regards,
> cvaroqui
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Mid-layer handling of NOT_READY conditions...
@ 2005-01-31 16:56 James.Smart
  2005-01-31 17:36 ` Patrick Mansfield
  2005-01-31 18:22 ` Andrew Vasquez
  0 siblings, 2 replies; 15+ messages in thread
From: James.Smart @ 2005-01-31 16:56 UTC (permalink / raw)
  To: andrew.vasquez, patmans; +Cc: James.Bottomley, linux-scsi

> On Sat, 2005-01-29 at 11:34 -0800, Patrick Mansfield wrote:
> > On Sat, Jan 29, 2005 at 10:44:41AM -0600, James Bottomley wrote:
> > > On Fri, 2005-01-28 at 21:46 -0800, Andrew Vasquez wrote:
> > > > Returning back DID_IMM_RETRY for these 'transport' 
> related conditions
> > > > would of course help in this issue -- but at the same 
> time bring with it
> > > > several side-effects which may not be desirable.
> > > > 
> > > > So, beyond this particular circumstance, what would be 
> considered a
> > > > 'proper' return status for this type of event? 
> > > 
> > > Well, the correct return, since this is a condition from 
> the storage, is
> > > simply the check condition and the sense code (rather 
> than having the
> > > driver interpret it).
> > 
> > But the transport hit a failure, not the storage device.
> > 
> > I thought Andrew hit this sequence:
> > 
> > 	- pull / replace cable
> > 
> > 	- IO resumes but gets NOT_READY (the device could be 
> logging back
> > 	  into the fibre or such)
> > 
> > 	- a FC transport problem is hit, DID_BUSY_BUSY is returned, but
> > 	  scmd->retries has already been exhausted by the NOT_READY
> > 
> > Did I misread something?
> > 
> 
> No, that's correct -- sorry about the confusion my second 
> email caused.
> I had only inquired about the 'correct' return status in the 
> context of
> avoiding the (cmd-retries > cmd->allowed) failure.

So this maps into the fc_target_block/unblock functionality that was
added to the fc class...  Adapter notifies driver of cable loss and
starts the block, driver does not "resume" the traffic until the
firmware says the login, etc has the device ready to accept scsi
traffic (Note: it does not guarantee the device can't respond with
a NOT_READY sense code).  If the transport hits a problem, there's
no harm done as long as the problem is resolved within the block
timeout. If the timeout is hit - it's because the user dicated that
it wanted to know of errors within this time and if the device fails,
it fails...

In the multipath solution - the "block" time used by the transport gets
set to 0 (or 1 second), so the i/o fails quickly and the multipath
function can kick in.

I am not a fan of a driver manufacturing a NOT_READY condition...

> > 
> > Why not just set scmd->retries to zero in scsi_requeue_command()?
> > 
> 
> This is exactly what I was thinking would be a fairly straight-forward
> approach at solving the problem...

This is ultimately a hack, and raises the potential for the retries value
to perpetually be rezero'd.  The better solution is the use the block
primitives available to avoid the i/o being issued at all if the transport
can't handle it.

James S
 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Mid-layer handling of NOT_READY conditions...
@ 2005-01-31 19:07 James.Smart
  0 siblings, 0 replies; 15+ messages in thread
From: James.Smart @ 2005-01-31 19:07 UTC (permalink / raw)
  To: andrew.vasquez; +Cc: patmans, James.Bottomley, linux-scsi

> >   If the transport hits a problem, there's
> > no harm done as long as the problem is resolved within the block
> > timeout. If the timeout is hit - it's because the user dicated that
> > it wanted to know of errors within this time and if the 
> device fails,
> > it fails...
> > 
> > In the multipath solution - the "block" time used by the 
> transport gets
> > set to 0 (or 1 second), so the i/o fails quickly and the multipath
> > function can kick in.
> > 
> 
> A bit confused now, are you proposing that 
> cmd->timeout_per_command time
> be inclusive of potential transport failures resulting in a requested
> retry?  And thus not be refreshed (as it currently is) upon retry
> request.

Nope. I knew I probably said the wrong thing here...

Let the io timeout stays as is. If the block condition occurs - it's up
to the driver take the right action on whether the i/o was killed as a
result or not. If it kills it, the driver should error it right away,
then the block code will hold off the retry. If it isn't killed, it lets
the i/o timer fire normally, which may invoke the eh_abort handler
(which is not held off by the block). After the abort, the retry would
be held off by the blocked state.

You want the dev_loss timer low so that other i/o will get through, and
can fail, thus invoking the device state change (and subsequent aborts
of the pending i/o's).


> Perhaps there could be
> a combination of timing conditionals -- the 
> fc_starget_dev_loss_tmo() to
> time the overall pause in 'not-ready' plugging and a
> period-to-wakeup-and-ping-the-storage time within the window?

Agree. I assume the ping-the-storage function is something done as
part of the multipathing so that it has an accurate state of the path.

-- james

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-02-01  7:21 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-28 23:24 Mid-layer handling of NOT_READY conditions Andrew Vasquez
2005-01-29  5:46 ` Andrew Vasquez
2005-01-29 16:16   ` Matthew Wilcox
2005-01-29 16:44   ` James Bottomley
2005-01-29 19:34     ` Patrick Mansfield
2005-01-30  1:40       ` James Bottomley
2005-01-30  2:33       ` Douglas Gilbert
2005-01-31  7:47       ` Andrew Vasquez
  -- strict thread matches above, loose matches on Subject: below --
2005-01-31  9:46 Mid-Layer handling of NOT READY conditions EXT / DEVOTEAM VAROQUI Christophe
2005-01-31 14:07 goggin, edward
2005-01-31 16:56 Mid-layer handling of NOT_READY conditions James.Smart
2005-01-31 17:36 ` Patrick Mansfield
2005-02-01  7:21   ` Andrew Vasquez
2005-01-31 18:22 ` Andrew Vasquez
2005-01-31 19:07 James.Smart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox