linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/3] blk-mq: three misc patches
@ 2022-06-15  6:48 Ming Lei
  2022-06-15  6:48 ` [PATCH V2 1/3] blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Ming Lei @ 2022-06-15  6:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei

Hello Guys,

The 1st two patches make referring to q->elevator more reliable, and
avoid potential use-after-free on q->elevator in two code paths.

The 3rd patch improves boot time for scsi host with lots of hw queues.

V2:
	- follow Christoph's suggestion to set QUEUE_FLAG_SQ_SCHED inside
	mq-deadline and bfq directly

Ming Lei (3):
  blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none
  blk-mq: avoid to touch q->elevator without any protection
  blk-mq: don't clear flush_rq from tags->rqs[]

 block/bfq-iosched.c    |  3 +++
 block/blk-mq.c         | 27 ++++++++-------------------
 block/kyber-iosched.c  |  1 -
 block/mq-deadline.c    |  3 +++
 include/linux/blkdev.h |  4 ++--
 5 files changed, 16 insertions(+), 22 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH V2 1/3] blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none
  2022-06-15  6:48 [PATCH V2 0/3] blk-mq: three misc patches Ming Lei
@ 2022-06-15  6:48 ` Ming Lei
  2022-06-15  6:48 ` [PATCH V2 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
  2022-06-15  6:48 ` [PATCH V2 3/3] blk-mq: don't clear flush_rq from tags->rqs[] Ming Lei
  2 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2022-06-15  6:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei

elevator can be tore down by sysfs switch interface or disk release, so
hold ->sysfs_lock before referring to q->elevator, then potential
use-after-free can be avoided.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e9bf950983c7..22a89c758f70 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4438,12 +4438,14 @@ static bool blk_mq_elv_switch_none(struct list_head *head,
 	if (!qe)
 		return false;
 
+	/* q->elevator needs protection from ->sysfs_lock */
+	mutex_lock(&q->sysfs_lock);
+
 	INIT_LIST_HEAD(&qe->node);
 	qe->q = q;
 	qe->type = q->elevator->type;
 	list_add(&qe->node, head);
 
-	mutex_lock(&q->sysfs_lock);
 	/*
 	 * After elevator_switch_mq, the previous elevator_queue will be
 	 * released by elevator_release. The reference of the io scheduler
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V2 2/3] blk-mq: avoid to touch q->elevator without any protection
  2022-06-15  6:48 [PATCH V2 0/3] blk-mq: three misc patches Ming Lei
  2022-06-15  6:48 ` [PATCH V2 1/3] blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none Ming Lei
@ 2022-06-15  6:48 ` Ming Lei
  2022-06-15  6:50   ` Christoph Hellwig
  2022-06-15  6:48 ` [PATCH V2 3/3] blk-mq: don't clear flush_rq from tags->rqs[] Ming Lei
  2 siblings, 1 reply; 6+ messages in thread
From: Ming Lei @ 2022-06-15  6:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei, Jan Kara

q->elevator is referred in blk_mq_has_sqsched() without any protection,
no .q_usage_counter is held, no queue srcu and rcu read lock is held,
so potential use-after-free may be triggered.

Fix the issue by adding one queue flag for checking if the elevator
uses single queue style dispatch. Meantime the elevator feature flag
of ELEVATOR_F_MQ_AWARE isn't needed any more.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bfq-iosched.c    |  3 +++
 block/blk-mq.c         | 18 ++----------------
 block/kyber-iosched.c  |  1 -
 block/mq-deadline.c    |  3 +++
 include/linux/blkdev.h |  4 ++--
 5 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0d46cb728bbf..caa55a5624bc 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -7188,6 +7188,9 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
 	bfq_init_root_group(bfqd->root_group, bfqd);
 	bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group);
 
+	/* We dispatch from request queue wide instead of hw queue */
+	blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
+
 	wbt_disable_default(q);
 	return 0;
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 22a89c758f70..112dce569192 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2140,20 +2140,6 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 }
 EXPORT_SYMBOL(blk_mq_run_hw_queue);
 
-/*
- * Is the request queue handled by an IO scheduler that does not respect
- * hardware queues when dispatching?
- */
-static bool blk_mq_has_sqsched(struct request_queue *q)
-{
-	struct elevator_queue *e = q->elevator;
-
-	if (e && e->type->ops.dispatch_request &&
-	    !(e->type->elevator_features & ELEVATOR_F_MQ_AWARE))
-		return true;
-	return false;
-}
-
 /*
  * Return prefered queue to dispatch from (if any) for non-mq aware IO
  * scheduler.
@@ -2186,7 +2172,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 	unsigned long i;
 
 	sq_hctx = NULL;
-	if (blk_mq_has_sqsched(q))
+	if (blk_queue_sq_sched(q))
 		sq_hctx = blk_mq_get_sq_hctx(q);
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (blk_mq_hctx_stopped(hctx))
@@ -2214,7 +2200,7 @@ void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs)
 	unsigned long i;
 
 	sq_hctx = NULL;
-	if (blk_mq_has_sqsched(q))
+	if (blk_queue_sq_sched(q))
 		sq_hctx = blk_mq_get_sq_hctx(q);
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (blk_mq_hctx_stopped(hctx))
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 70ff2a599ef6..235ed324d052 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -1033,7 +1033,6 @@ static struct elevator_type kyber_sched = {
 #endif
 	.elevator_attrs = kyber_sched_attrs,
 	.elevator_name = "kyber",
-	.elevator_features = ELEVATOR_F_MQ_AWARE,
 	.elevator_owner = THIS_MODULE,
 };
 
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 6ed602b2f80a..1a9e835e816c 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -642,6 +642,9 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
 	spin_lock_init(&dd->lock);
 	spin_lock_init(&dd->zone_lock);
 
+	/* We dispatch from request queue wide instead of hw queue */
+	blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
+
 	q->elevator = eq;
 	return 0;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 608d577734c2..bb6e3c31b3b7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -575,6 +575,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
+#define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP) |		\
@@ -616,6 +617,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_pm_only(q)	atomic_read(&(q)->pm_only)
 #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
 #define blk_queue_nowait(q)	test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
+#define blk_queue_sq_sched(q)	test_bit(QUEUE_FLAG_SQ_SCHED, &(q)->queue_flags)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
@@ -1006,8 +1008,6 @@ void disk_set_independent_access_ranges(struct gendisk *disk,
  */
 /* Supports zoned block devices sequential write constraint */
 #define ELEVATOR_F_ZBD_SEQ_WRITE	(1U << 0)
-/* Supports scheduling on multiple hardware queues */
-#define ELEVATOR_F_MQ_AWARE		(1U << 1)
 
 extern void blk_queue_required_elevator_features(struct request_queue *q,
 						 unsigned int features);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V2 3/3] blk-mq: don't clear flush_rq from tags->rqs[]
  2022-06-15  6:48 [PATCH V2 0/3] blk-mq: three misc patches Ming Lei
  2022-06-15  6:48 ` [PATCH V2 1/3] blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none Ming Lei
  2022-06-15  6:48 ` [PATCH V2 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
@ 2022-06-15  6:48 ` Ming Lei
  2 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2022-06-15  6:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei, Yu Kuai

commit 364b61818f65 ("blk-mq: clearing flush request reference in
tags->rqs[]") is added to clear the to-be-free flush request from
tags->rqs[] for avoiding use-after-free on the flush rq.

Yu Kuai reported that blk_mq_clear_flush_rq_mapping() slows down boot time
by ~8s because running scsi probe which may create and remove lots of
unpresent LUNs on megaraid-sas which uses BLK_MQ_F_TAG_HCTX_SHARED and
each request queue has lots of hw queues.

Improve the situation by not running blk_mq_clear_flush_rq_mapping if
disk isn't added when there can't be any flush request issued.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 112dce569192..992997f6acbd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3429,8 +3429,9 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 	if (blk_mq_hw_queue_mapped(hctx))
 		blk_mq_tag_idle(hctx);
 
-	blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
-			set->queue_depth, flush_rq);
+	if (blk_queue_init_done(q))
+		blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
+				set->queue_depth, flush_rq);
 	if (set->ops->exit_request)
 		set->ops->exit_request(set, flush_rq, hctx_idx);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 2/3] blk-mq: avoid to touch q->elevator without any protection
  2022-06-15  6:48 ` [PATCH V2 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
@ 2022-06-15  6:50   ` Christoph Hellwig
  2022-06-15  6:56     ` Ming Lei
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2022-06-15  6:50 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block, Jan Kara

On Wed, Jun 15, 2022 at 02:48:25PM +0800, Ming Lei wrote:
> q->elevator is referred in blk_mq_has_sqsched() without any protection,
> no .q_usage_counter is held, no queue srcu and rcu read lock is held,
> so potential use-after-free may be triggered.
> 
> Fix the issue by adding one queue flag for checking if the elevator
> uses single queue style dispatch. Meantime the elevator feature flag
> of ELEVATOR_F_MQ_AWARE isn't needed any more.

We still need to clear the flag when switching elevators.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 2/3] blk-mq: avoid to touch q->elevator without any protection
  2022-06-15  6:50   ` Christoph Hellwig
@ 2022-06-15  6:56     ` Ming Lei
  0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2022-06-15  6:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Jan Kara

On Wed, Jun 15, 2022 at 08:50:34AM +0200, Christoph Hellwig wrote:
> On Wed, Jun 15, 2022 at 02:48:25PM +0800, Ming Lei wrote:
> > q->elevator is referred in blk_mq_has_sqsched() without any protection,
> > no .q_usage_counter is held, no queue srcu and rcu read lock is held,
> > so potential use-after-free may be triggered.
> > 
> > Fix the issue by adding one queue flag for checking if the elevator
> > uses single queue style dispatch. Meantime the elevator feature flag
> > of ELEVATOR_F_MQ_AWARE isn't needed any more.
> 
> We still need to clear the flag when switching elevators.

Indeed, or it can be cleared in initializing none and kyber. 

Thanks,
Ming


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-06-15  6:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-15  6:48 [PATCH V2 0/3] blk-mq: three misc patches Ming Lei
2022-06-15  6:48 ` [PATCH V2 1/3] blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none Ming Lei
2022-06-15  6:48 ` [PATCH V2 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
2022-06-15  6:50   ` Christoph Hellwig
2022-06-15  6:56     ` Ming Lei
2022-06-15  6:48 ` [PATCH V2 3/3] blk-mq: don't clear flush_rq from tags->rqs[] Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).