* [PATCH RFC] blk-mq: fix potential uaf for 'queue_hw_ctx'
@ 2022-02-23 11:26 Yu Kuai
2022-02-23 14:30 ` Ming Lei
2022-02-25 2:40 ` Ming Lei
0 siblings, 2 replies; 7+ messages in thread
From: Yu Kuai @ 2022-02-23 11:26 UTC (permalink / raw)
To: axboe; +Cc: linux-block, linux-kernel, yukuai3, yi.zhang
blk_mq_realloc_hw_ctxs() will free the 'queue_hw_ctx'(e.g. undate
submit_queues through configfs for null_blk), while it might still be
used from other context(e.g. switch elevator to none):
t1 t2
elevator_switch
blk_mq_unquiesce_queue
blk_mq_run_hw_queues
queue_for_each_hw_ctx
// assembly code for hctx = (q)->queue_hw_ctx[i]
mov 0x48(%rbp),%rdx -> read old queue_hw_ctx
__blk_mq_update_nr_hw_queues
blk_mq_realloc_hw_ctxs
hctxs = q->queue_hw_ctx
q->queue_hw_ctx = new_hctxs
kfree(hctxs)
movslq %ebx,%rax
mov (%rdx,%rax,8),%rdi ->uaf
This problem was found by code review, and I comfirmed that the concurrent
scenarios do exist(specifically 'q->queue_hw_ctx' can be changed during
blk_mq_run_hw_queues), however, the uaf problem hasn't been repoduced yet
without hacking the kernel.
Sicne the queue is freezed in __blk_mq_update_nr_hw_queues, fix the
problem by protecting 'queue_hw_ctx' through rcu where it can be accessed
without grabbing 'q_usage_counter'.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/blk-mq.c | 8 +++++++-
include/linux/blk-mq.h | 2 +-
include/linux/blkdev.h | 13 ++++++++++++-
3 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6c59ffe765fd..79367457d555 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3955,7 +3955,13 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
if (hctxs)
memcpy(new_hctxs, hctxs, q->nr_hw_queues *
sizeof(*hctxs));
- q->queue_hw_ctx = new_hctxs;
+
+ rcu_assign_pointer(q->queue_hw_ctx, new_hctxs);
+ /*
+ * Make sure reading the old queue_hw_ctx from other
+ * context concurrently won't trigger uaf.
+ */
+ synchronize_rcu();
kfree(hctxs);
hctxs = new_hctxs;
}
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index d319ffa59354..edcf8ead76c6 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -918,7 +918,7 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq)
#define queue_for_each_hw_ctx(q, hctx, i) \
for ((i) = 0; (i) < (q)->nr_hw_queues && \
- ({ hctx = (q)->queue_hw_ctx[i]; 1; }); (i)++)
+ ({ hctx = queue_hctx((q), i); 1; }); (i)++)
#define hctx_for_each_ctx(hctx, ctx, i) \
for ((i) = 0; (i) < (hctx)->nr_ctx && \
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3bfc75a2a450..2018a4dd2028 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -354,7 +354,7 @@ struct request_queue {
unsigned int queue_depth;
/* hw dispatch queues */
- struct blk_mq_hw_ctx **queue_hw_ctx;
+ struct blk_mq_hw_ctx __rcu **queue_hw_ctx;
unsigned int nr_hw_queues;
/*
@@ -622,6 +622,17 @@ static inline bool queue_is_mq(struct request_queue *q)
return q->mq_ops;
}
+static inline struct blk_mq_hw_ctx *queue_hctx(struct request_queue *q, int id)
+{
+ struct blk_mq_hw_ctx *hctx;
+
+ rcu_read_lock();
+ hctx = *(rcu_dereference(q->queue_hw_ctx) + id);
+ rcu_read_unlock();
+
+ return hctx;
+}
+
#ifdef CONFIG_PM
static inline enum rpm_status queue_rpm_status(struct request_queue *q)
{
--
2.31.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH RFC] blk-mq: fix potential uaf for 'queue_hw_ctx' 2022-02-23 11:26 [PATCH RFC] blk-mq: fix potential uaf for 'queue_hw_ctx' Yu Kuai @ 2022-02-23 14:30 ` Ming Lei 2022-02-24 1:29 ` yukuai (C) 2022-02-25 2:40 ` Ming Lei 1 sibling, 1 reply; 7+ messages in thread From: Ming Lei @ 2022-02-23 14:30 UTC (permalink / raw) To: Yu Kuai; +Cc: axboe, linux-block, linux-kernel, yi.zhang On Wed, Feb 23, 2022 at 07:26:01PM +0800, Yu Kuai wrote: > blk_mq_realloc_hw_ctxs() will free the 'queue_hw_ctx'(e.g. undate > submit_queues through configfs for null_blk), while it might still be > used from other context(e.g. switch elevator to none): > > t1 t2 > elevator_switch > blk_mq_unquiesce_queue > blk_mq_run_hw_queues > queue_for_each_hw_ctx > // assembly code for hctx = (q)->queue_hw_ctx[i] > mov 0x48(%rbp),%rdx -> read old queue_hw_ctx > > __blk_mq_update_nr_hw_queues > blk_mq_realloc_hw_ctxs > hctxs = q->queue_hw_ctx > q->queue_hw_ctx = new_hctxs > kfree(hctxs) > movslq %ebx,%rax > mov (%rdx,%rax,8),%rdi ->uaf > Not only uaf on queue_hw_ctx, but also other similar issue on other structures, and I think the correct and easy fix is to quiesce request queue during updating nr_hw_queues, something like the following patch: diff --git a/block/blk-mq.c b/block/blk-mq.c index a05ce7725031..d8e7c3cce0dd 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4467,8 +4467,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, if (set->nr_maps == 1 && nr_hw_queues == set->nr_hw_queues) return; - list_for_each_entry(q, &set->tag_list, tag_set_list) + list_for_each_entry(q, &set->tag_list, tag_set_list) { blk_mq_freeze_queue(q); + blk_mq_quiesce_queue(q); + } /* * Switch IO scheduler to 'none', cleaning up the data associated * with the previous scheduler. We will switch back once we are done @@ -4518,8 +4520,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, list_for_each_entry(q, &set->tag_list, tag_set_list) blk_mq_elv_switch_back(&head, q); - list_for_each_entry(q, &set->tag_list, tag_set_list) + list_for_each_entry(q, &set->tag_list, tag_set_list) { + blk_mq_unquiesce_queue(q); blk_mq_unfreeze_queue(q); + } } void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues) Thanks, Ming ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH RFC] blk-mq: fix potential uaf for 'queue_hw_ctx' 2022-02-23 14:30 ` Ming Lei @ 2022-02-24 1:29 ` yukuai (C) 2022-02-24 2:15 ` Ming Lei 0 siblings, 1 reply; 7+ messages in thread From: yukuai (C) @ 2022-02-24 1:29 UTC (permalink / raw) To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, yi.zhang 在 2022/02/23 22:30, Ming Lei 写道: > On Wed, Feb 23, 2022 at 07:26:01PM +0800, Yu Kuai wrote: >> blk_mq_realloc_hw_ctxs() will free the 'queue_hw_ctx'(e.g. undate >> submit_queues through configfs for null_blk), while it might still be >> used from other context(e.g. switch elevator to none): >> >> t1 t2 >> elevator_switch >> blk_mq_unquiesce_queue >> blk_mq_run_hw_queues >> queue_for_each_hw_ctx >> // assembly code for hctx = (q)->queue_hw_ctx[i] >> mov 0x48(%rbp),%rdx -> read old queue_hw_ctx >> >> __blk_mq_update_nr_hw_queues >> blk_mq_realloc_hw_ctxs >> hctxs = q->queue_hw_ctx >> q->queue_hw_ctx = new_hctxs >> kfree(hctxs) >> movslq %ebx,%rax >> mov (%rdx,%rax,8),%rdi ->uaf >> > > Not only uaf on queue_hw_ctx, but also other similar issue on other > structures, and I think the correct and easy fix is to quiesce request > queue during updating nr_hw_queues, something like the following patch: > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index a05ce7725031..d8e7c3cce0dd 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -4467,8 +4467,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > if (set->nr_maps == 1 && nr_hw_queues == set->nr_hw_queues) > return; > > - list_for_each_entry(q, &set->tag_list, tag_set_list) > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > blk_mq_freeze_queue(q); > + blk_mq_quiesce_queue(q); > + } > /* > * Switch IO scheduler to 'none', cleaning up the data associated > * with the previous scheduler. We will switch back once we are done > @@ -4518,8 +4520,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > list_for_each_entry(q, &set->tag_list, tag_set_list) > blk_mq_elv_switch_back(&head, q); > > - list_for_each_entry(q, &set->tag_list, tag_set_list) > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > + blk_mq_unquiesce_queue(q); > blk_mq_unfreeze_queue(q); > + } > } > > void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues) Hi, Ming If blk_mq_quiesce_queue() is called from __blk_mq_update_nr_hw_queues() first, and then swithing elevator to none won't trigger the problem. However, what if blk_mq_unquiesce_queue() from switching elevator decrease quiesce_depth to 0 first, and then blk_mq_quiesce_queue() is called from __blk_mq_update_nr_hw_queues(), it seems to me such concurrent scenarios still exist. Thanks, Kuai > > > > Thanks, > Ming > > . > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC] blk-mq: fix potential uaf for 'queue_hw_ctx' 2022-02-24 1:29 ` yukuai (C) @ 2022-02-24 2:15 ` Ming Lei 2022-02-24 2:43 ` yukuai (C) 0 siblings, 1 reply; 7+ messages in thread From: Ming Lei @ 2022-02-24 2:15 UTC (permalink / raw) To: yukuai (C); +Cc: axboe, linux-block, linux-kernel, yi.zhang On Thu, Feb 24, 2022 at 09:29:09AM +0800, yukuai (C) wrote: > 在 2022/02/23 22:30, Ming Lei 写道: > > On Wed, Feb 23, 2022 at 07:26:01PM +0800, Yu Kuai wrote: > > > blk_mq_realloc_hw_ctxs() will free the 'queue_hw_ctx'(e.g. undate > > > submit_queues through configfs for null_blk), while it might still be > > > used from other context(e.g. switch elevator to none): > > > > > > t1 t2 > > > elevator_switch > > > blk_mq_unquiesce_queue > > > blk_mq_run_hw_queues > > > queue_for_each_hw_ctx > > > // assembly code for hctx = (q)->queue_hw_ctx[i] > > > mov 0x48(%rbp),%rdx -> read old queue_hw_ctx > > > > > > __blk_mq_update_nr_hw_queues > > > blk_mq_realloc_hw_ctxs > > > hctxs = q->queue_hw_ctx > > > q->queue_hw_ctx = new_hctxs > > > kfree(hctxs) > > > movslq %ebx,%rax > > > mov (%rdx,%rax,8),%rdi ->uaf > > > > > > > Not only uaf on queue_hw_ctx, but also other similar issue on other > > structures, and I think the correct and easy fix is to quiesce request > > queue during updating nr_hw_queues, something like the following patch: > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index a05ce7725031..d8e7c3cce0dd 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -4467,8 +4467,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > > if (set->nr_maps == 1 && nr_hw_queues == set->nr_hw_queues) > > return; > > - list_for_each_entry(q, &set->tag_list, tag_set_list) > > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > > blk_mq_freeze_queue(q); > > + blk_mq_quiesce_queue(q); > > + } > > /* > > * Switch IO scheduler to 'none', cleaning up the data associated > > * with the previous scheduler. We will switch back once we are done > > @@ -4518,8 +4520,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > > list_for_each_entry(q, &set->tag_list, tag_set_list) > > blk_mq_elv_switch_back(&head, q); > > - list_for_each_entry(q, &set->tag_list, tag_set_list) > > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > > + blk_mq_unquiesce_queue(q); > > blk_mq_unfreeze_queue(q); > > + } > > } > > void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues) > Hi, Ming > > If blk_mq_quiesce_queue() is called from __blk_mq_update_nr_hw_queues() > first, and then swithing elevator to none won't trigger the problem. > However, what if blk_mq_unquiesce_queue() from switching elevator > decrease quiesce_depth to 0 first, and then blk_mq_quiesce_queue() is > called from __blk_mq_update_nr_hw_queues(), it seems to me such > concurrent scenarios still exist. No, the scenario won't exist, once blk_mq_quiesce_queue() returns, it is guaranteed that: - in-progress run queue is drained - no new run queue can be started Thanks, Ming ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC] blk-mq: fix potential uaf for 'queue_hw_ctx' 2022-02-24 2:15 ` Ming Lei @ 2022-02-24 2:43 ` yukuai (C) 0 siblings, 0 replies; 7+ messages in thread From: yukuai (C) @ 2022-02-24 2:43 UTC (permalink / raw) To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, yi.zhang 在 2022/02/24 10:15, Ming Lei 写道: >> Hi, Ming >> >> If blk_mq_quiesce_queue() is called from __blk_mq_update_nr_hw_queues() >> first, and then swithing elevator to none won't trigger the problem. >> However, what if blk_mq_unquiesce_queue() from switching elevator >> decrease quiesce_depth to 0 first, and then blk_mq_quiesce_queue() is >> called from __blk_mq_update_nr_hw_queues(), it seems to me such >> concurrent scenarios still exist. > > No, the scenario won't exist, once blk_mq_quiesce_queue() returns, it is > guaranteed that: > > - in-progress run queue is drained > - no new run queue can be started I understand that... What I mean about the concurrent scenario is that reading queue_hw_ctx in blk_mq_run_hw_queues(), not the actual run queue blk_mq_run_hw_queue(): t1 t2 elevator_switch blk_mq_quiesce_queue -> quiesce_depth = 1 blk_mq_unquiesce_queue-> quiesce_depth = 0 blk_mq_run_hw_queues __blk_mq_update_nr_hw_queues blk_mq_quiesce_queue queue_for_each_hw_ctx -> quiesce_queue can't prevent reading queue_hw_ctx blk_mq_run_hw_queue //need_run is always false, nothing to do Am I missing something about blk_mq_quiesce_queue ? Thanks, Kuai ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC] blk-mq: fix potential uaf for 'queue_hw_ctx' 2022-02-23 11:26 [PATCH RFC] blk-mq: fix potential uaf for 'queue_hw_ctx' Yu Kuai 2022-02-23 14:30 ` Ming Lei @ 2022-02-25 2:40 ` Ming Lei 2022-02-25 3:15 ` yukuai (C) 1 sibling, 1 reply; 7+ messages in thread From: Ming Lei @ 2022-02-25 2:40 UTC (permalink / raw) To: Yu Kuai; +Cc: axboe, linux-block, linux-kernel, yi.zhang On Wed, Feb 23, 2022 at 07:26:01PM +0800, Yu Kuai wrote: > blk_mq_realloc_hw_ctxs() will free the 'queue_hw_ctx'(e.g. undate > submit_queues through configfs for null_blk), while it might still be > used from other context(e.g. switch elevator to none): > > t1 t2 > elevator_switch > blk_mq_unquiesce_queue > blk_mq_run_hw_queues > queue_for_each_hw_ctx > // assembly code for hctx = (q)->queue_hw_ctx[i] > mov 0x48(%rbp),%rdx -> read old queue_hw_ctx > > __blk_mq_update_nr_hw_queues > blk_mq_realloc_hw_ctxs > hctxs = q->queue_hw_ctx > q->queue_hw_ctx = new_hctxs > kfree(hctxs) > movslq %ebx,%rax > mov (%rdx,%rax,8),%rdi ->uaf > > This problem was found by code review, and I comfirmed that the concurrent > scenarios do exist(specifically 'q->queue_hw_ctx' can be changed during > blk_mq_run_hw_queues), however, the uaf problem hasn't been repoduced yet > without hacking the kernel. > > Sicne the queue is freezed in __blk_mq_update_nr_hw_queues, fix the > problem by protecting 'queue_hw_ctx' through rcu where it can be accessed > without grabbing 'q_usage_counter'. > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > block/blk-mq.c | 8 +++++++- > include/linux/blk-mq.h | 2 +- > include/linux/blkdev.h | 13 ++++++++++++- > 3 files changed, 20 insertions(+), 3 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 6c59ffe765fd..79367457d555 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -3955,7 +3955,13 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, > if (hctxs) > memcpy(new_hctxs, hctxs, q->nr_hw_queues * > sizeof(*hctxs)); > - q->queue_hw_ctx = new_hctxs; > + > + rcu_assign_pointer(q->queue_hw_ctx, new_hctxs); > + /* > + * Make sure reading the old queue_hw_ctx from other > + * context concurrently won't trigger uaf. > + */ > + synchronize_rcu(); > kfree(hctxs); > hctxs = new_hctxs; > } > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h > index d319ffa59354..edcf8ead76c6 100644 > --- a/include/linux/blk-mq.h > +++ b/include/linux/blk-mq.h > @@ -918,7 +918,7 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq) > > #define queue_for_each_hw_ctx(q, hctx, i) \ > for ((i) = 0; (i) < (q)->nr_hw_queues && \ > - ({ hctx = (q)->queue_hw_ctx[i]; 1; }); (i)++) > + ({ hctx = queue_hctx((q), i); 1; }); (i)++) > > #define hctx_for_each_ctx(hctx, ctx, i) \ > for ((i) = 0; (i) < (hctx)->nr_ctx && \ > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 3bfc75a2a450..2018a4dd2028 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -354,7 +354,7 @@ struct request_queue { > unsigned int queue_depth; > > /* hw dispatch queues */ > - struct blk_mq_hw_ctx **queue_hw_ctx; > + struct blk_mq_hw_ctx __rcu **queue_hw_ctx; > unsigned int nr_hw_queues; > > /* > @@ -622,6 +622,17 @@ static inline bool queue_is_mq(struct request_queue *q) > return q->mq_ops; > } > > +static inline struct blk_mq_hw_ctx *queue_hctx(struct request_queue *q, int id) > +{ > + struct blk_mq_hw_ctx *hctx; > + > + rcu_read_lock(); > + hctx = *(rcu_dereference(q->queue_hw_ctx) + id); > + rcu_read_unlock(); > + > + return hctx; > +} queue_hctx() should be moved into linux/blk-mq.h, otherwise feel free to add: Reviewed-by: Ming Lei <ming.lei@redhat.com> Also it should be fine to implement queue_for_each_hw_ctx() as list, then we can avoid the allocation for q->queue_hw_ctx without extra cost. I will work toward that direction for improving the code. Thanks, Ming ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC] blk-mq: fix potential uaf for 'queue_hw_ctx' 2022-02-25 2:40 ` Ming Lei @ 2022-02-25 3:15 ` yukuai (C) 0 siblings, 0 replies; 7+ messages in thread From: yukuai (C) @ 2022-02-25 3:15 UTC (permalink / raw) To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, yi.zhang 在 2022/02/25 10:40, Ming Lei 写道: >> +static inline struct blk_mq_hw_ctx *queue_hctx(struct request_queue *q, int id) >> +{ >> + struct blk_mq_hw_ctx *hctx; >> + >> + rcu_read_lock(); >> + hctx = *(rcu_dereference(q->queue_hw_ctx) + id); >> + rcu_read_unlock(); >> + >> + return hctx; >> +} > > queue_hctx() should be moved into linux/blk-mq.h, otherwise feel free to > add: > > Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks for the review, I will send a new patch and move queue_hctx. Kuai > > Also it should be fine to implement queue_for_each_hw_ctx() as list, then we > can avoid the allocation for q->queue_hw_ctx without extra cost. I will work > toward that direction for improving the code. > > Thanks, > Ming > > . > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-02-25 3:15 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-02-23 11:26 [PATCH RFC] blk-mq: fix potential uaf for 'queue_hw_ctx' Yu Kuai 2022-02-23 14:30 ` Ming Lei 2022-02-24 1:29 ` yukuai (C) 2022-02-24 2:15 ` Ming Lei 2022-02-24 2:43 ` yukuai (C) 2022-02-25 2:40 ` Ming Lei 2022-02-25 3:15 ` yukuai (C)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox