From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Anderson Subject: Re: [PATCH] scsi_error fix Date: Fri, 7 Mar 2003 13:17:32 -0800 Sender: linux-kernel-owner@vger.kernel.org Message-ID: <20030307211732.GA1148@beaverton.ibm.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: To: Andries.Brouwer@cwi.nl Cc: patmans@us.ibm.com, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, torvalds@transmeta.com List-Id: linux-scsi@vger.kernel.org Andries.Brouwer@cwi.nl [Andries.Brouwer@cwi.nl] wrote: > From: Patrick Mansfield > > > [Further discussion and things I did not yet investigate: > > What was changed to make this fail first in 2.5.63? > > Experience shows that we get into a loop when something else > > than SUCCESS is returned here. Probably that is a bug elsewhere. > > Probably the commands that cause problems should never have been > > sent in the first place.] > > The scsi error handler is also used to retrieve sense data for > adapters/drivers that do not auto retrieve it. In such cases, it should > not issue any aborts, resets etc. > > Indeed. > > Your change effectively disables that support - we never hit the code in > scsi_eh_get_sense() to request sense. It would be very nice if we could > fix (or audit) all the scsi drivers, apply your change and remove > scsi_eh_get_sense, but AFAIK that has not and is not happening. > > No. What happened before was that we got into an infinite loop. > The right action is to read the code, understand why it gets > into a loop, and fix it. Once that has happened we may decide > to undo my change. Or we may decide to ask for sense at that very spot. > > Today both James and Mike say that they can reproduce the loop, > so probably they'll fix that part. If not, I'll have a look again. Sorry about that Patrick. I had sent some mail last might and thought it went to the list. Both James and I can reproduce this problem. I was able to reproduce using a hack to scsi_debug. The loop problem is related to scsi error handling using the common code of scsi_queue_insert / scsi_requesT_fn . When a command gets started the scsi_init_cmd_errh function is called which sets retries to 0. There maybe another issue with the scsi_eh_get_sense function, but I am still looking at it. I believe James is still pursuing a solution also. -andmike -- Michael Anderson andmike@us.ibm.com