From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jens Axboe Subject: Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1 Date: Wed, 16 Jul 2003 14:45:49 +0200 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <20030716124549.GX833@suse.de> References: <1058310172.981.7.camel@markh1.pdx.osdl.net> <1058359278.1856.8.camel@mulgrave> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from ns.virtualhost.dk ([195.184.98.160]:8427 "EHLO virtualhost.dk") by vger.kernel.org with ESMTP id S270812AbTGPMbC (ORCPT ); Wed, 16 Jul 2003 08:31:02 -0400 Content-Disposition: inline In-Reply-To: <1058359278.1856.8.camel@mulgrave> List-Id: linux-scsi@vger.kernel.org To: James Bottomley Cc: Mark Haverkamp , Nick Piggin , Andrew Morton , Cliff White , linux-scsi On Wed, Jul 16 2003, James Bottomley wrote: > On Tue, 2003-07-15 at 19:02, Mark Haverkamp wrote: > > Daniel McNeil and I have been debugging a hang with the aacraid driver > > using the as I/O scheduler. We found that scsi_request_fn would > > de-queue a request and later re-queued it. This left the > > as_data->nr_dispatched variable in an inconsistent state (it was never > > being decremented back to zero). We added a call to > > elv_completed_request to clean up the state before re-adding the > > request. This has fixed our hang problem. The linux-scsi list is being > > copied for review of the scsi_lib.c change. > > > > ===== drivers/scsi/scsi_lib.c 1.99 vs edited ===== > > --- 1.99/drivers/scsi/scsi_lib.c Sun Jun 29 18:14:44 2003 > > +++ edited/drivers/scsi/scsi_lib.c Tue Jul 15 15:47:45 2003 > > @@ -1215,6 +1215,7 @@ > > spin_lock_irq(q->queue_lock); > > if (blk_rq_tagged(req)) > > blk_queue_end_tag(q, req); > > + elv_completed_request(q, req); > > __elv_add_request(q, req, 0, 0); > > This doen't look right to me. > > SCSI expects to be able to push back uncompleted requests onto the > request queue. The fact that you seem to be calling a completion > function for an uncompleted request is what's causing me heartburn. Completely agree, see my earlier main on the subject. > This code used to work with the old scheduler (we extensively tested it > around the 2.5.6x timeframe because of other changes), so what I'd > really like to know is what changed in the scheduler assumptions to > necessitate this? AS scheduler needs paired started/completed calls, the old scheduler didn't. In fact elv_completed_request() was introduced for that purpose. > If this change is suddenly required, there are several other places in > our queueing functions that will need similar modifications. Indeed > Could I have a definitive statement from the I/O scheduler people about > the procedure for pushing back uncompleted I/O on the block queue just > so we all get back on the same page? SCSI does it right, don't worry. It's a block layer problem that we need to find out how to correctly make sure that we deal it. Basically elv_next_request() does an implicit elv_started_request(), at least that is what AS assumes. So request re-adds should notify the scheduler of that fact. The hack posted here works for them, but isn't correct. -- Jens Axboe