public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device
@ 2013-02-27 14:45 Bart Van Assche
  2013-02-27 14:47 ` [PATCH v2 1/2] block: Convert blk_run_queue() recursion into iteration Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Bart Van Assche @ 2013-02-27 14:45 UTC (permalink / raw)
  To: device-mapper development, linux-scsi, Alasdair G Kergon,
	Jens Axboe, Mike Snitzer, Tejun Heo, James Bottomley,
	Jun'ichi Nomura

This mini-series of two patches avoids that the device mapper
implementation can trigger a use-after-free during removal of a
mapped device. The two patches in this series are:
- block: Convert blk_run_queue() recursion into iteration.
- dm: Avoid running the md queue after the last dm_put().

Note: these patches are the result of source reading. As far as I know 
this issue has not (yet) caused any harm.

Changes compared to v1:
- Invoking blk_run_queue() again guarantees that the queue will be run
   sooner or later.
- Refined patch descriptions and source code comments.
- Left out the "Cc: <stable@vger.kernel.org>" tags.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] block: Convert blk_run_queue() recursion into iteration
  2013-02-27 14:45 [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device Bart Van Assche
@ 2013-02-27 14:47 ` Bart Van Assche
  2013-02-27 14:48 ` [PATCH v2 2/2] dm: Avoid running the md queue after the last dm_put() Bart Van Assche
  2013-02-28  0:42 ` [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device Jun'ichi Nomura
  2 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2013-02-27 14:47 UTC (permalink / raw)
  To: device-mapper development, linux-scsi
  Cc: Alasdair G Kergon, Jens Axboe, Mike Snitzer, Tejun Heo,
	James Bottomley, Jun'ichi Nomura

Some block drivers, e.g. dm and SCSI, need to trigger a queue
run from inside functions that may be invoked by their request_fn()
implementation. Make sure that invoking blk_run_queue() instead
of blk_run_queue_async() from such functions does not trigger
recursion.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
---
 block/blk-core.c       |   34 +++++++++++++++++++++++-----------
 include/linux/blkdev.h |    8 ++------
 2 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c973249..ac4f310 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -304,22 +304,34 @@ EXPORT_SYMBOL(blk_sync_queue);
  *    This variant runs the queue whether or not the queue has been
  *    stopped. Must be called with the queue lock held and interrupts
  *    disabled. See also @blk_run_queue.
+ *
+ * Notes:
+ *    It is allowed to invoke __blk_run_queue() or blk_run_queue() (whichever
+ *    is appropriate) from inside a request_fn() implementation. Such
+ *    recursive calls are converted into iteration with the help of the
+ *    needs_rerun flag.
+ *
+ *    Some request_fn() implementations, e.g. scsi_request_fn() and
+ *    dm_request_fn(), unlock the queue lock internally. For such
+ *    implementations the request_fn_running check does not only prevent
+ *    recursion but also avoids that multiple threads execute such a
+ *    request_fn() implementation concurrently.
  */
 inline void __blk_run_queue_uncond(struct request_queue *q)
 {
 	if (unlikely(blk_queue_dead(q)))
 		return;
 
-	/*
-	 * Some request_fn implementations, e.g. scsi_request_fn(), unlock
-	 * the queue lock internally. As a result multiple threads may be
-	 * running such a request function concurrently. Keep track of the
-	 * number of active request_fn invocations such that blk_drain_queue()
-	 * can wait until all these request_fn calls have finished.
-	 */
-	q->request_fn_active++;
-	q->request_fn(q);
-	q->request_fn_active--;
+	if (!q->request_fn_running) {
+		do {
+			q->needs_rerun = false;
+			q->request_fn_running = true;
+			q->request_fn(q);
+			q->request_fn_running = false;
+		} while (q->needs_rerun);
+	} else {
+		q->needs_rerun = true;
+	}
 }
 
 /**
@@ -418,7 +430,7 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
 			__blk_run_queue(q);
 
 		drain |= q->nr_rqs_elvpriv;
-		drain |= q->request_fn_active;
+		drain |= q->request_fn_running;
 
 		/*
 		 * Unfortunately, requests are queued at and tracked from
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94bc83..20da3c8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -378,12 +378,8 @@ struct request_queue {
 
 	unsigned int		nr_sorted;
 	unsigned int		in_flight[2];
-	/*
-	 * Number of active block driver functions for which blk_drain_queue()
-	 * must wait. Must be incremented around functions that unlock the
-	 * queue_lock internally, e.g. scsi_request_fn().
-	 */
-	unsigned int		request_fn_active;
+	bool			request_fn_running;
+	bool			needs_rerun;
 
 	unsigned int		rq_timeout;
 	struct timer_list	timeout;
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] dm: Avoid running the md queue after the last dm_put()
  2013-02-27 14:45 [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device Bart Van Assche
  2013-02-27 14:47 ` [PATCH v2 1/2] block: Convert blk_run_queue() recursion into iteration Bart Van Assche
@ 2013-02-27 14:48 ` Bart Van Assche
  2013-02-28  0:42 ` [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device Jun'ichi Nomura
  2 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2013-02-27 14:48 UTC (permalink / raw)
  To: device-mapper development, linux-scsi
  Cc: Alasdair G Kergon, Jens Axboe, Mike Snitzer, Tejun Heo,
	James Bottomley, Jun'ichi Nomura

At least in theory the mapped device can disappear after
rq_completed() has invoked dm_put() and before the queue of
the mapped device has been run. Avoid this by running the
queue synchronously instead of asynchronously. Note: the
previous patch makes invoking blk_run_queue() from inside
dm_request_fn() safe.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
---
 drivers/md/dm.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 314a0e2..28b7ad4 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -728,14 +728,8 @@ static void rq_completed(struct mapped_device *md, int rw, int run_queue)
 	if (!md_in_flight(md))
 		wake_up(&md->wait);
 
-	/*
-	 * Run this off this callpath, as drivers could invoke end_io while
-	 * inside their request_fn (and holding the queue lock). Calling
-	 * back into ->request_fn() could deadlock attempting to grab the
-	 * queue lock again.
-	 */
 	if (run_queue)
-		blk_run_queue_async(md->queue);
+		blk_run_queue(md->queue);
 
 	/*
 	 * dm_put() must be at the end of this function. See the comment above
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device
  2013-02-27 14:45 [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device Bart Van Assche
  2013-02-27 14:47 ` [PATCH v2 1/2] block: Convert blk_run_queue() recursion into iteration Bart Van Assche
  2013-02-27 14:48 ` [PATCH v2 2/2] dm: Avoid running the md queue after the last dm_put() Bart Van Assche
@ 2013-02-28  0:42 ` Jun'ichi Nomura
  2013-02-28 13:00   ` Bart Van Assche
  2 siblings, 1 reply; 6+ messages in thread
From: Jun'ichi Nomura @ 2013-02-28  0:42 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: device-mapper development, linux-scsi, Alasdair G Kergon,
	Jens Axboe, Mike Snitzer, Tejun Heo, James Bottomley

Hi Bart,

On 02/27/13 23:45, Bart Van Assche wrote:
> This mini-series of two patches avoids that the device mapper
> implementation can trigger a use-after-free during removal of a
> mapped device. The two patches in this series are:
> - block: Convert blk_run_queue() recursion into iteration.
> - dm: Avoid running the md queue after the last dm_put().
> 
> Note: these patches are the result of source reading. As far as I know this issue has not (yet) caused any harm.

Ref-counting of mapped device is like this:
  - dm depends on the fact that the block device is opened while there
    is bio/request submitted.  So dm_get/put in dm_blk_open/close is
    enough to keep mapped device while there are bios.
  - Request-based target has a tiny window between dm_blk_close()
    and the end of rq_completed() because the opener may close the device
    once the last bio completes even if request is still finishing.
    dm_get/dm_put in dm_start_request/rq_completed closes this window.
    (See comments in dm_start_request())
  - So, when dm_put() puts the last reference, there should be no
    requests in the queue.
  - If there is no reference to the mapped device, dm_destroy() may
    start tearing it down.
    It is ok if there is pending delayed work for the request queue
    because blk_cleanup_queue() is called before freeing the mapped device
    and cancels the delayed work.

So as far as blk_run_queue_async() in rq_completed() is concerned,
it is not a problem from "use-after-free" point of view.

-- 
Jun'ichi Nomura, NEC Corporation


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device
  2013-02-28  0:42 ` [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device Jun'ichi Nomura
@ 2013-02-28 13:00   ` Bart Van Assche
  2013-03-01  8:18     ` Jun'ichi Nomura
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2013-02-28 13:00 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: device-mapper development, linux-scsi, Alasdair G Kergon,
	Jens Axboe, Mike Snitzer, Tejun Heo, James Bottomley

On 02/28/13 01:42, Jun'ichi Nomura wrote:
> Hi Bart,
>
> On 02/27/13 23:45, Bart Van Assche wrote:
>> This mini-series of two patches avoids that the device mapper
>> implementation can trigger a use-after-free during removal of a
>> mapped device. The two patches in this series are:
>> - block: Convert blk_run_queue() recursion into iteration.
>> - dm: Avoid running the md queue after the last dm_put().
>>
>> Note: these patches are the result of source reading. As far as I know this issue has not (yet) caused any harm.
>
> Ref-counting of mapped device is like this:
>    - dm depends on the fact that the block device is opened while there
>      is bio/request submitted.  So dm_get/put in dm_blk_open/close is
>      enough to keep mapped device while there are bios.
>    - Request-based target has a tiny window between dm_blk_close()
>      and the end of rq_completed() because the opener may close the device
>      once the last bio completes even if request is still finishing.
>      dm_get/dm_put in dm_start_request/rq_completed closes this window.
>      (See comments in dm_start_request())
>    - So, when dm_put() puts the last reference, there should be no
>      requests in the queue.
>    - If there is no reference to the mapped device, dm_destroy() may
>      start tearing it down.
>      It is ok if there is pending delayed work for the request queue
>      because blk_cleanup_queue() is called before freeing the mapped device
>      and cancels the delayed work.
>
> So as far as blk_run_queue_async() in rq_completed() is concerned,
> it is not a problem from "use-after-free" point of view.

Hello Jun'ichi,

Thanks for the feedback. It is good to know that there is no risk of 
triggering a use-after-free with the current approach.

How about reposting these patches as a performance optimization ? With 
these patches I see a slightly lower latency and slightly higher 
throughput. With a dm-linear mapping on top of a RAM disk (brd), a 
request size of 512 bytes and 100% reads fio reports 2063K IOPS without 
these patches and 2083K IOPS with these two patches applied. That's an 
improvement of about 1%. It's not much but that comes on top of the 
advantage that these two patches make the rq_completed() implementation 
easier to understand and to reason about.

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device
  2013-02-28 13:00   ` Bart Van Assche
@ 2013-03-01  8:18     ` Jun'ichi Nomura
  0 siblings, 0 replies; 6+ messages in thread
From: Jun'ichi Nomura @ 2013-03-01  8:18 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: device-mapper development, linux-scsi, Alasdair G Kergon,
	Jens Axboe, Mike Snitzer, Tejun Heo, James Bottomley

On 02/28/13 22:00, Bart Van Assche wrote:
> How about reposting these patches as a performance optimization ?

How and when do they improve performance?

> With these patches I see a slightly lower latency and slightly higher
> throughput. With a dm-linear mapping on top of a RAM disk (brd), a
> request size of 512 bytes and 100% reads fio reports 2063K IOPS without
> these patches and 2083K IOPS with these two patches applied. That's an
> improvement of about 1%. It's not much but that comes on top of the

dm-linear + brd is not appropriate for testing your patches.
They are both bio-based drivers while your patches change request
processing.  To test the patches, you have to use request-based drivers,
e.g. dm-multipath on top of scsi.

Also, IMO, the measurement should be done not only for IO performance
but also how other processes are interrupted by IO processing.
The use of blk_run_queue_async should have reduced the foot print of
softirq handling.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-03-01  8:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-27 14:45 [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device Bart Van Assche
2013-02-27 14:47 ` [PATCH v2 1/2] block: Convert blk_run_queue() recursion into iteration Bart Van Assche
2013-02-27 14:48 ` [PATCH v2 2/2] dm: Avoid running the md queue after the last dm_put() Bart Van Assche
2013-02-28  0:42 ` [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device Jun'ichi Nomura
2013-02-28 13:00   ` Bart Van Assche
2013-03-01  8:18     ` Jun'ichi Nomura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox