From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bart Van Assche Subject: Re: [PATCH v11 1/9] Fix race between starved list and device removal Date: Mon, 24 Jun 2013 18:16:14 +0200 Message-ID: <51C870CE.4020900@acm.org> References: <51B86E26.6030108@acm.org> <51B86E74.60909@acm.org> <1372088320.2013.33.camel@dabdike.int.hansenpartnership.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from gerard.telenet-ops.be ([195.130.132.48]:53723 "EHLO gerard.telenet-ops.be" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750902Ab3FXQQU (ORCPT ); Mon, 24 Jun 2013 12:16:20 -0400 In-Reply-To: <1372088320.2013.33.camel@dabdike.int.hansenpartnership.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: James Bottomley Cc: linux-scsi , Joe Lawrence , Tejun Heo , Chanho Min , David Milburn , Hannes Reinecke , Mike Christie On 06/24/13 17:38, James Bottomley wrote: > I really don't like this because it's shuffling potentially fragile > lifetime rules since you now have to have the sdev deleted from the > starved list before final put. That becomes an unstated assumption > within the code. > > The theory is that the starved list processing may be racing with a > scsi_remove_device, so when we unlock the host lock, the device (and the > queue) may be destroyed. OK, so I agree with this, remote a possibility > though it may be. The easy way of fixing it without making assumptions > is this, isn't it? All it requires is that the queue be destroyed after > the starved list entry is deleted in the sdev release code. > > James > > --- > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 86d5220..f294cc6 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -434,6 +434,7 @@ static void scsi_run_queue(struct request_queue *q) > list_splice_init(&shost->starved_list, &starved_list); > > while (!list_empty(&starved_list)) { > + struct request_queue *slq; > /* > * As long as shost is accepting commands and we have > * starved queues, call blk_run_queue. scsi_request_fn > @@ -456,10 +457,21 @@ static void scsi_run_queue(struct request_queue *q) > continue; > } > > + /* > + * once we drop the host lock, a racing scsi_remove_device may > + * remove the sdev from the starved list and destroy it and > + * the queue. Mitigate by taking a reference to the queue and > + * never touching the sdev again after we drop the host lock. > + */ > + slq = sdev->request_queue; > + if (!blk_get_queue(slq)) > + continue; > + > spin_unlock(shost->host_lock); > - spin_lock(sdev->request_queue->queue_lock); > - __blk_run_queue(sdev->request_queue); > - spin_unlock(sdev->request_queue->queue_lock); > + > + blk_run_queue(slq); > + blk_put_queue(slq); > + > spin_lock(shost->host_lock); > } > /* put any unprocessed entries back */ Since the above patch invokes blk_put_queue() with interrupts disabled it may cause blk_release_queue() to be invoked with interrupts disabled. Sorry but I'm not sure whether that will work fine. Bart.