From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brian King Subject: Re: host_self_blocked question/bug? Date: Tue, 25 Nov 2003 16:31:09 -0600 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <3FC3D82D.2030604@us.ibm.com> References: <3FC3CDCF.4030105@us.ibm.com> <1069797320.1787.220.camel@mulgrave> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from e4.ny.us.ibm.com ([32.97.182.104]:14776 "EHLO e4.ny.us.ibm.com") by vger.kernel.org with ESMTP id S263203AbTKYWbL (ORCPT ); Tue, 25 Nov 2003 17:31:11 -0500 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e4.ny.us.ibm.com (8.12.10/8.12.2) with ESMTP id hAPMVAh3842976 for ; Tue, 25 Nov 2003 17:31:10 -0500 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.6) with ESMTP id hAPMV9je156530 for ; Tue, 25 Nov 2003 17:31:09 -0500 List-Id: linux-scsi@vger.kernel.org To: linux-scsi@vger.kernel.org James Bottomley wrote: > The original design was to allow short hiatuses when the HBA couldn't > accept I/O. It doesn't work if there's I/O pending (unless the stop is > very short), because the SCSI timers are still ticking and error > recovery doesn't see this flag. > > There has been talk of making this interface robust to pending commands > (halt the timers and freeze the error handler) for FC HBA's that take > ages to process loop events, but no work has been done on this---it's > quite a bit more work than simply not allowing the eh to emit TURs. I'd like a way to be able to stop the mid-layer from sending me any commands. The scenarios I have today are: 1. Fatal error on the adapter. 2. microcode download to the adapter. 3. Adapter cache recovery commands. All of these cases require me to run BIST on the adapter and bring it back up. To do this may take 20-30 seconds. I call scsi_block_requests, fail all pending ops back with DID_ERROR, reset the adapter, then call scsi_unblock_requests. My usage of it gets around the ticking timer problem. I agree that the error recovery thread doesn't see this either and that this is a potential problem. I had planned to work around that by failing abort and device reset, forcing the host_reset to be called, which would wait on the completion of the adapter reset, but it would be nice if I didn't have to do that. -- Brian King eServer Storage I/O IBM Linux Technology Center