From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bart Van Assche Subject: Re: [PATCH] Fix a use-after-free triggered by device removal Date: Thu, 06 Sep 2012 20:52:09 +0200 Message-ID: <5048F0D9.6080403@acm.org> References: <5044BAD2.7060901@acm.org> <91D94272-CA62-4E68-87D7-CE77DE776CC9@cs.wisc.edu> <5048E45E.1070302@acm.org> <5048E80B.5010101@cs.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Return-path: Received: from juliette.telenet-ops.be ([195.130.137.74]:40455 "EHLO juliette.telenet-ops.be" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753788Ab2IFSwM (ORCPT ); Thu, 6 Sep 2012 14:52:12 -0400 In-Reply-To: <5048E80B.5010101@cs.wisc.edu> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Mike Christie Cc: linux-scsi , James Bottomley , Jens Axboe , Tejun Heo , Chanho Min On 09/06/12 20:14, Mike Christie wrote: > On 09/06/2012 12:58 PM, Bart Van Assche wrote: >> _suOn 09/06/12 18:27, Michael Christie wrote: >>> On Sep 3, 2012, at 9:12 AM, Bart Van Assche wrote: >>>> If the put_device() call in scsi_request_fn() drops the sdev refcount >>>> to zero then the spin_lock() call after the put_device() call triggers >>>> a use-after-free. Avoid that by making sure that blk_cleanup_queue() >>>> can only finish after all active scsi_request_fn() calls have returned. >>> >>> If we have this patch http://marc.info/?l=linux-scsi&m=134453905402413&w=2 >>> it seems we have all the scsi layer callers of the request_fn/ >>> *blk_run_queue holding a reference to the device when they make the call. >>> Right, or are there some other places missing? >>> >>> What are the other places we can call the request_fn without already >>> holding a reference to the device? Is it the block layer? Is that why we >>> need this patch? >> >> The purpose of this patch is indeed to make *blk_run_queue() calls from >> the block layer safe. There are several direct or indirect >> *blk_run_queue() calls in the block layer where a reference on the queue >> is held but not on the sdev, e.g. in the md, dm and bsg drivers. > > Is there a race still? If some blk code is calling blk_run_queue > (waiting on the queue lock) but no IO is queued, > blk_drain_queue/blk_cleanup_queue could complete since the drain > counters are zero. Then blk_run_queue could grab the queue lock and call > the request_fn on a freed scsi_device (sdev pointed to by q->queuedata > would be freed so scsi_reuqest_fn would be freed). > > Do we need a check in __blk_run_queue for QUEUE_FLAG_DEAD (if dead then > fail IO?), or do we need a check in scsi_request_fn for this. A dead > queue check or maybe null q->queuedata in > scsi_device_dev_release_usercontext (do this with the queue lock held), > then check for null q->queuedata in scsi_request_fn? Yet another scenario is that scsi_remove_host() gets invoked and finishes after scsi_request_fn() has unlocked the queue and before it locks the queue again. That's a scenario that can't be handled by adding more checks at the start of __blk_run_queue() or scsi_request_fn(). Bart.