* [PATCH 0/3] blk-mq: three misc patches
@ 2022-06-15 2:37 Ming Lei
2022-06-15 2:37 ` [PATCH 1/3] blk-mq: protect q->elevator by ->sysfs_lock Ming Lei
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Ming Lei @ 2022-06-15 2:37 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei
Hello Guys,
The 1st two patches make referring to q->elevator more reliable, and
avoid potential use-after-free on q->elevator in two code paths.
The 3rd patch improves boot time for scsi host with lots of hw queues.
Ming Lei (3):
blk-mq: protect q->elevator by ->sysfs_lock
blk-mq: avoid to touch q->elevator without any protection
blk-mq: don't clear flush_rq from tags->rqs[]
block/blk-mq.c | 27 ++++++++-------------------
block/elevator.c | 10 ++++++++++
include/linux/blkdev.h | 2 ++
3 files changed, 20 insertions(+), 19 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/3] blk-mq: protect q->elevator by ->sysfs_lock
2022-06-15 2:37 [PATCH 0/3] blk-mq: three misc patches Ming Lei
@ 2022-06-15 2:37 ` Ming Lei
2022-06-15 6:01 ` Christoph Hellwig
2022-06-15 2:37 ` [PATCH 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
2022-06-15 2:37 ` [PATCH 3/3] blk-mq: don't clear flush_rq from tags->rqs[] Ming Lei
2 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2022-06-15 2:37 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei
elevator can be tore down by sysfs switch interface or disk release, so
hold ->sysfs_lock before referring to q->elevator, then potential
use-after-free can be avoided.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e9bf950983c7..22a89c758f70 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4438,12 +4438,14 @@ static bool blk_mq_elv_switch_none(struct list_head *head,
if (!qe)
return false;
+ /* q->elevator needs protection from ->sysfs_lock */
+ mutex_lock(&q->sysfs_lock);
+
INIT_LIST_HEAD(&qe->node);
qe->q = q;
qe->type = q->elevator->type;
list_add(&qe->node, head);
- mutex_lock(&q->sysfs_lock);
/*
* After elevator_switch_mq, the previous elevator_queue will be
* released by elevator_release. The reference of the io scheduler
--
2.31.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] blk-mq: avoid to touch q->elevator without any protection
2022-06-15 2:37 [PATCH 0/3] blk-mq: three misc patches Ming Lei
2022-06-15 2:37 ` [PATCH 1/3] blk-mq: protect q->elevator by ->sysfs_lock Ming Lei
@ 2022-06-15 2:37 ` Ming Lei
2022-06-15 6:04 ` Christoph Hellwig
2022-06-15 2:37 ` [PATCH 3/3] blk-mq: don't clear flush_rq from tags->rqs[] Ming Lei
2 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2022-06-15 2:37 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei, Jan Kara
q->elevator is referred in blk_mq_has_sqsched() without any protection,
no .q_usage_counter is held, no queue srcu and rcu read lock is held,
so potential use-after-free may be triggered.
Fix the issue by adding one queue flag for checking if the elevator
uses single queue style dispatch.
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 18 ++----------------
block/elevator.c | 10 ++++++++++
include/linux/blkdev.h | 2 ++
3 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 22a89c758f70..112dce569192 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2140,20 +2140,6 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
}
EXPORT_SYMBOL(blk_mq_run_hw_queue);
-/*
- * Is the request queue handled by an IO scheduler that does not respect
- * hardware queues when dispatching?
- */
-static bool blk_mq_has_sqsched(struct request_queue *q)
-{
- struct elevator_queue *e = q->elevator;
-
- if (e && e->type->ops.dispatch_request &&
- !(e->type->elevator_features & ELEVATOR_F_MQ_AWARE))
- return true;
- return false;
-}
-
/*
* Return prefered queue to dispatch from (if any) for non-mq aware IO
* scheduler.
@@ -2186,7 +2172,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
unsigned long i;
sq_hctx = NULL;
- if (blk_mq_has_sqsched(q))
+ if (blk_queue_sq_sched(q))
sq_hctx = blk_mq_get_sq_hctx(q);
queue_for_each_hw_ctx(q, hctx, i) {
if (blk_mq_hctx_stopped(hctx))
@@ -2214,7 +2200,7 @@ void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs)
unsigned long i;
sq_hctx = NULL;
- if (blk_mq_has_sqsched(q))
+ if (blk_queue_sq_sched(q))
sq_hctx = blk_mq_get_sq_hctx(q);
queue_for_each_hw_ctx(q, hctx, i) {
if (blk_mq_hctx_stopped(hctx))
diff --git a/block/elevator.c b/block/elevator.c
index c319765892bb..a2355acd2780 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -612,6 +612,16 @@ int elevator_switch_mq(struct request_queue *q,
}
}
+ /*
+ * Is the request queue handled by an IO scheduler that does not
+ * respect hardware queues when dispatching?
+ */
+ if (new_e && new_e->ops.dispatch_request &&
+ !(new_e->elevator_features & ELEVATOR_F_MQ_AWARE))
+ blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
+ else
+ blk_queue_flag_clear(QUEUE_FLAG_SQ_SCHED, q);
+
if (new_e)
blk_add_trace_msg(q, "elv switch: %s", new_e->elevator_name);
else
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 608d577734c2..ea6ccaeba643 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -575,6 +575,7 @@ struct request_queue {
#define QUEUE_FLAG_RQ_ALLOC_TIME 27 /* record rq->alloc_time_ns */
#define QUEUE_FLAG_HCTX_ACTIVE 28 /* at least one blk-mq hctx is active */
#define QUEUE_FLAG_NOWAIT 29 /* device supports NOWAIT */
+#define QUEUE_FLAG_SQ_SCHED 30 /* single queue style io dispatch */
#define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_SAME_COMP) | \
@@ -616,6 +617,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
#define blk_queue_pm_only(q) atomic_read(&(q)->pm_only)
#define blk_queue_registered(q) test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
#define blk_queue_nowait(q) test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
+#define blk_queue_sq_sched(q) test_bit(QUEUE_FLAG_SQ_SCHED, &(q)->queue_flags)
extern void blk_set_pm_only(struct request_queue *q);
extern void blk_clear_pm_only(struct request_queue *q);
--
2.31.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] blk-mq: don't clear flush_rq from tags->rqs[]
2022-06-15 2:37 [PATCH 0/3] blk-mq: three misc patches Ming Lei
2022-06-15 2:37 ` [PATCH 1/3] blk-mq: protect q->elevator by ->sysfs_lock Ming Lei
2022-06-15 2:37 ` [PATCH 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
@ 2022-06-15 2:37 ` Ming Lei
2022-06-15 6:22 ` Christoph Hellwig
2 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2022-06-15 2:37 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei, Yu Kuai
commit 364b61818f65 ("blk-mq: clearing flush request reference in
tags->rqs[]") is added to clear the to-be-free flush request from
tags->rqs[] for avoiding use-after-free on the flush rq.
Yu Kuai reported that blk_mq_clear_flush_rq_mapping() slows down boot time
by ~8s because running scsi probe which may create and remove lots of
unpresent LUNs on megaraid-sas which uses BLK_MQ_F_TAG_HCTX_SHARED and
each request queue has lots of hw queues.
Improve the situation by not running blk_mq_clear_flush_rq_mapping if
disk isn't added when there can't be any flush request issued.
Reported-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 112dce569192..992997f6acbd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3429,8 +3429,9 @@ static void blk_mq_exit_hctx(struct request_queue *q,
if (blk_mq_hw_queue_mapped(hctx))
blk_mq_tag_idle(hctx);
- blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
- set->queue_depth, flush_rq);
+ if (blk_queue_init_done(q))
+ blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
+ set->queue_depth, flush_rq);
if (set->ops->exit_request)
set->ops->exit_request(set, flush_rq, hctx_idx);
--
2.31.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/3] blk-mq: protect q->elevator by ->sysfs_lock
2022-06-15 2:37 ` [PATCH 1/3] blk-mq: protect q->elevator by ->sysfs_lock Ming Lei
@ 2022-06-15 6:01 ` Christoph Hellwig
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-15 6:01 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block
On Wed, Jun 15, 2022 at 10:37:10AM +0800, Ming Lei wrote:
> elevator can be tore down by sysfs switch interface or disk release, so
> hold ->sysfs_lock before referring to q->elevator, then potential
> use-after-free can be avoided.
The subject probably should really talk about blk_mq_elv_switch_none
as we generally already protect ->elevator with ->sysfs_lock.
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] blk-mq: avoid to touch q->elevator without any protection
2022-06-15 2:37 ` [PATCH 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
@ 2022-06-15 6:04 ` Christoph Hellwig
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-15 6:04 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block, Jan Kara
> +++ b/block/elevator.c
> @@ -612,6 +612,16 @@ int elevator_switch_mq(struct request_queue *q,
> }
> }
>
> + /*
> + * Is the request queue handled by an IO scheduler that does not
> + * respect hardware queues when dispatching?
> + */
> + if (new_e && new_e->ops.dispatch_request &&
> + !(new_e->elevator_features & ELEVATOR_F_MQ_AWARE))
> + blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
> + else
> + blk_queue_flag_clear(QUEUE_FLAG_SQ_SCHED, q);
Please just set the QUEUE_FLAG_SQ_SCHED flag directly from the
mq-deadline and bfq scheduler ans drop the ELEVATOR_F_MQ_AWARE
flag.
Otherwise this approach looks good.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/3] blk-mq: don't clear flush_rq from tags->rqs[]
2022-06-15 2:37 ` [PATCH 3/3] blk-mq: don't clear flush_rq from tags->rqs[] Ming Lei
@ 2022-06-15 6:22 ` Christoph Hellwig
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-15 6:22 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block, Yu Kuai
On Wed, Jun 15, 2022 at 10:37:12AM +0800, Ming Lei wrote:
> commit 364b61818f65 ("blk-mq: clearing flush request reference in
> tags->rqs[]") is added to clear the to-be-free flush request from
> tags->rqs[] for avoiding use-after-free on the flush rq.
>
> Yu Kuai reported that blk_mq_clear_flush_rq_mapping() slows down boot time
> by ~8s because running scsi probe which may create and remove lots of
> unpresent LUNs on megaraid-sas which uses BLK_MQ_F_TAG_HCTX_SHARED and
> each request queue has lots of hw queues.
>
> Improve the situation by not running blk_mq_clear_flush_rq_mapping if
> disk isn't added when there can't be any flush request issued.
This looks ok. Another optimization would be to never do this if we
don't have a write cache enabled and thus never ever use the flush_rq.
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-06-15 6:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-15 2:37 [PATCH 0/3] blk-mq: three misc patches Ming Lei
2022-06-15 2:37 ` [PATCH 1/3] blk-mq: protect q->elevator by ->sysfs_lock Ming Lei
2022-06-15 6:01 ` Christoph Hellwig
2022-06-15 2:37 ` [PATCH 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
2022-06-15 6:04 ` Christoph Hellwig
2022-06-15 2:37 ` [PATCH 3/3] blk-mq: don't clear flush_rq from tags->rqs[] Ming Lei
2022-06-15 6:22 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).