From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Anderson Subject: Re: [PATCH 1/2] Block I/O while SG reset operation in progress - midlayer portion Date: Mon, 27 Feb 2006 23:09:50 -0800 Message-ID: <20060228070950.GA26005@us.ibm.com> References: <43FF39C0.7080509@emulex.com> <20060224201151.GA30144@us.ibm.com> <43FF8A40.8080702@emulex.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from e2.ny.us.ibm.com ([32.97.182.142]:9607 "EHLO e2.ny.us.ibm.com") by vger.kernel.org with ESMTP id S1751908AbWB1HKC (ORCPT ); Tue, 28 Feb 2006 02:10:02 -0500 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id k1S7A1PE020418 for ; Tue, 28 Feb 2006 02:10:01 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.12.10/NCO/VER6.8) with ESMTP id k1S7A1rB061012 for ; Tue, 28 Feb 2006 02:10:01 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id k1S7A1xu007375 for ; Tue, 28 Feb 2006 02:10:01 -0500 Content-Disposition: inline In-Reply-To: <43FF8A40.8080702@emulex.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: James Smart Cc: linux-scsi James Smart wrote: > Well for a couple of reasons. I didn't view it as an actual recovery > function as it's execution doesn't attempt to change the state of the > shost, starget or sdev. It's not part of an escalation policy, etc. > It's essentially performing the same effect as a reset by some other > initiator in a multi-initiator environment, just with i/o failing faster > than it may normally take us to detect it. As for side effects on the > teardown, I don't see anything different than what exists today. > At worst, it should only add a short delay until the ioflow resumes. I agree it is unsafe today as we do not restrict the scsi_reset_provider during scsi_remove_host. In scsi_remove_host there was code added to detect recovery running during removal and try to reduce the chance of eh_* routines being called post scsi_remove_host returning. > > I also looked at what it would take to implement a full state change > or piggy back on SHOST_RECOVERY. It's a significant amount of code, > and the addition of a new state brings it's own set of additional > issues to get everything to play right. In the end, I saw very little > change in result, but with lots of code changes. > Yes, there is some code overhead, but some of it is needed. In looking at your patch you add to the scsi_host_in_recovery case, but do not have a wake_up for event waiters on host_wait. > My main concern was - what is the interface definition we're telling > the LLDD's for the eh routines ? Does the LLDD expect to always be > entered in them while in the error thread ? Does the LLDD expect that > an eh routine will never be invoked will another call to that routine > is outstanding ? Obviously, none of these are true with the existing > implementation. Well the scsi_mid_low_api.txt indicates that the eh_*_reset_handler routines will be called with "No other commands will be queued on current host during eh". As you have indicated this is not true today. I did review a few reset functions: One checked to see if a reset action was pending and returned an error, the others appeared to not care about current state of a reset action. -andmike -- Michael Anderson andmike@us.ibm.com