From mboxrd@z Thu Jan 1 00:00:00 1970 From: Patrick Mansfield Subject: Re: SCSI woes (followup) Date: Tue, 24 Sep 2002 11:18:47 -0700 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <20020924111847.A4151@eng2.beaverton.ibm.com> References: <200209241346.g8ODkER09516@localhost.localdomain> <20020924145852.A28042@flint.arm.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: In-Reply-To: <20020924145852.A28042@flint.arm.linux.org.uk>; from rmk@arm.linux.org.uk on Tue, Sep 24, 2002 at 02:58:52PM +0100 List-Id: linux-scsi@vger.kernel.org To: Russell King Cc: James Bottomley , linux-scsi@vger.kernel.org On Tue, Sep 24, 2002 at 02:58:52PM +0100, Russell King wrote: > On Tue, Sep 24, 2002 at 09:46:14AM -0400, James Bottomley wrote: > > I think it's method of operation is misplaced. > > I think it is misplaced. It locks the doors of devices that aren't even > in use, which is just plain stupid. > > > However, for your case does simply moving the queue empty check to the top > > cause the problems to go away? (That would be hiding the problem not fixing > > it, but still...) > > I #if 0'd it out, and it makes the problem go away. The scan will only send INQUIRY commands, and after all scanning is done, the upper level drivers might send a TUR. After a new Scsi_Device is added in scsi_scan.c it calls scsi_release_commandblocks() and sets queue_depth = 0. Any call to scsi_request_fn() for the device at this point will just return (break statements) after scsi_allocate_device() returns NULL, and if scsi_ioctl() was called from scsi_request_fn() it will hang forever. The problem is that we try to send a command via scsi_request_fn() to a device that has no command blocks allocated - it's initializatin is incomplete. Moving the empty check up sounds like good and simple fix for 2.4, or check if queue_depth == 0. Anything else would be difficult to get right. Moving the the SCSI_IOCTL_DOORLOCK doesn't fix the problem if it is still called on a incompletely initialized device. And, perhaps do not allow the error handler to run during scanning, let later IO (to any discovered device) kick off the error handler. It's hard to say if this is good or not - for example, if this is your root device, you want it online. But if it some other device, and we try hard to scan and use it, it can cause more problems (if it keeps getting errors, and we keeping running the error handler/reset cycle, blocking other IO). The problem happens via: 1) device A is found that has removable media during scan 2) INQUIRY to another device B kicks off error handling before the scan has completed, so device A has no command blocks. 3) Error handler completion calls scsi_request_fn() for A. 4) scsi_request_fn() for A sees the reset happened, and calls scsi_ioctl(). 5) scsi_ioctl() calls scsi_request_fn(), it cannot get a Scsi_Cmnd, so it just returns, incorrectly assuming that another request must be outstanding. 6) The scsi_ioctl() never completes. The error handling thread should be hung. -- Patrick Mansfield