From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1 Date: 16 Jul 2003 08:41:17 -0400 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <1058359278.1856.8.camel@mulgrave> References: <1058310172.981.7.camel@markh1.pdx.osdl.net> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: Received: from nat9.steeleye.com ([65.114.3.137]:34053 "EHLO hancock.sc.steeleye.com") by vger.kernel.org with ESMTP id S270545AbTGPM0v (ORCPT ); Wed, 16 Jul 2003 08:26:51 -0400 In-Reply-To: <1058310172.981.7.camel@markh1.pdx.osdl.net> List-Id: linux-scsi@vger.kernel.org To: Mark Haverkamp Cc: Nick Piggin , Andrew Morton , Cliff White , linux-scsi , Jens Axboe On Tue, 2003-07-15 at 19:02, Mark Haverkamp wrote: > Daniel McNeil and I have been debugging a hang with the aacraid driver > using the as I/O scheduler. We found that scsi_request_fn would > de-queue a request and later re-queued it. This left the > as_data->nr_dispatched variable in an inconsistent state (it was never > being decremented back to zero). We added a call to > elv_completed_request to clean up the state before re-adding the > request. This has fixed our hang problem. The linux-scsi list is being > copied for review of the scsi_lib.c change. > > ===== drivers/scsi/scsi_lib.c 1.99 vs edited ===== > --- 1.99/drivers/scsi/scsi_lib.c Sun Jun 29 18:14:44 2003 > +++ edited/drivers/scsi/scsi_lib.c Tue Jul 15 15:47:45 2003 > @@ -1215,6 +1215,7 @@ > spin_lock_irq(q->queue_lock); > if (blk_rq_tagged(req)) > blk_queue_end_tag(q, req); > + elv_completed_request(q, req); > __elv_add_request(q, req, 0, 0); This doen't look right to me. SCSI expects to be able to push back uncompleted requests onto the request queue. The fact that you seem to be calling a completion function for an uncompleted request is what's causing me heartburn. This code used to work with the old scheduler (we extensively tested it around the 2.5.6x timeframe because of other changes), so what I'd really like to know is what changed in the scheduler assumptions to necessitate this? If this change is suddenly required, there are several other places in our queueing functions that will need similar modifications. Could I have a definitive statement from the I/O scheduler people about the procedure for pushing back uncompleted I/O on the block queue just so we all get back on the same page? Thanks, James