From mboxrd@z Thu Jan 1 00:00:00 1970 From: Douglas Gilbert Subject: Re: Incorrect response to SK/ASC/ASCQ = x 02/04/01 (becoming ready) Date: Fri, 27 Aug 2004 10:03:22 +1000 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <412E7A4A.1090706@torque.net> References: <1093217548.1727.367.camel@mulgrave> <1093218968.3553.17.camel@swtf.comptex.com.au> <1093275122.1776.52.camel@mulgrave> <20040826025453.GA125799@sgi.com> <1093534695.1684.28.camel@mulgrave> <20040826223634.GB131056@sgi.com> Reply-To: dougg@torque.net Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from borg.st.net.au ([65.23.158.22]:17541 "EHLO borg.st.net.au") by vger.kernel.org with ESMTP id S269818AbUH0AEU (ORCPT ); Thu, 26 Aug 2004 20:04:20 -0400 In-Reply-To: <20040826223634.GB131056@sgi.com> List-Id: linux-scsi@vger.kernel.org To: Jeremy Higdon Cc: James Bottomley , Burn Alting , SCSI development list Jeremy Higdon wrote: > On Thu, Aug 26, 2004 at 11:38:08AM -0400, James Bottomley wrote: > >>On Wed, 2004-08-25 at 22:54, Jeremy Higdon wrote: >> >>>I believe that 02/04/01 should be treated like a Busy status. The >>>semantics are slightly different, but the action taken by the >>>initiator is the same: wait a bit and then retry. >> >>Well, that's dangerous. BUSY is assumed always to be a transient >>condition, so it doesn't count against the retries, so if this thing >>never comes back, we'd loop forever there. > > > Hmm. I think an exponential backoff would be better. > > In any case, the 02/04/01 is also a transient condition. It means > that the device is in the process of becoming read, but is not ready yet. > > >>The other thing about BUSY handling is that, for a device with no >>outstanding commands like this, we retry based on I/O pressure, so even >>if I increment the retry count time waited would depend on how much I/O >>pressure the system had. > > > The amount of time you wait to retry is based on I/O pressure? > The amount of time that it will take before the device is ready is > not based on I/O pressure. > > >>>I have seen this key/asc/asq before and had to handle it that way. >>>I believe the device in question was a disk drive that had been >>>powered on recently. Disks set to have a variable spinup delay >>>(based on ID, so that not all disks in a 16-drive box power up at >>>the same time and overstress a power supply) can return this code >>>up to a couple of minutes after power on. >> >>But that's a start of day thing, which would be handled by sd's TUR >>code. > > > True. However, in the future, I might expect to see more of this > behavior, as drives are spun down for power conservation. Probably > more on small systems than on big ones. Jeremy, I have read that SAS disks will need spinning up. Also SATA disks in a SAS domain will be held somewhere in their initialization state machine so they don't spin up either. It both cases active intervention from an application server (e.g. linux scsi subsystem) will be required. New infrastructure supporting a "power condition" state machine has been added recently to SPC-3 and SBC-2. For disks I think this is an attempt to merge the way SATA and SAS disks (and perhaps recent u320 disks) will work in this area. There is a new Power Condition mode page. See SPC-3 section 5.9 and SBC-2 section 4.14 . Doug Gilbert