* [PATCH v3] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash
@ 2018-04-10 23:02 Bart Van Assche
2018-04-10 23:46 ` Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Bart Van Assche @ 2018-04-10 23:02 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
Joseph Qi
Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
it is no longer safe to access cgroup information during or after the
blk_cleanup_queue() call. Hence protect the generic_make_request_checks()
call with blk_queue_enter() / blk_queue_exit().
Reported-by: Ming Lei <ming.lei@redhat.com>
Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
---
Changes compared to v2: converted two ternary expressions into if-statements.
Changes compared to v1: guarded the blk_queue_exit() inside the loop with "if (q)".
block/blk-core.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 34e2f2227fd9..39308e874ffa 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2386,8 +2386,20 @@ blk_qc_t generic_make_request(struct bio *bio)
* yet.
*/
struct bio_list bio_list_on_stack[2];
+ blk_mq_req_flags_t flags = 0;
+ struct request_queue *q = bio->bi_disk->queue;
blk_qc_t ret = BLK_QC_T_NONE;
+ if (bio->bi_opf & REQ_NOWAIT)
+ flags = BLK_MQ_REQ_NOWAIT;
+ if (blk_queue_enter(q, flags) < 0) {
+ if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT))
+ bio_wouldblock_error(bio);
+ else
+ bio_io_error(bio);
+ return ret;
+ }
+
if (!generic_make_request_checks(bio))
goto out;
@@ -2424,11 +2436,22 @@ blk_qc_t generic_make_request(struct bio *bio)
bio_list_init(&bio_list_on_stack[0]);
current->bio_list = bio_list_on_stack;
do {
- struct request_queue *q = bio->bi_disk->queue;
- blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
- BLK_MQ_REQ_NOWAIT : 0;
+ bool enter_succeeded = true;
+
+ if (unlikely(q != bio->bi_disk->queue)) {
+ if (q)
+ blk_queue_exit(q);
+ q = bio->bi_disk->queue;
+ flags = 0;
+ if (bio->bi_opf & REQ_NOWAIT)
+ flags = BLK_MQ_REQ_NOWAIT;
+ if (blk_queue_enter(q, flags) < 0) {
+ enter_succeeded = false;
+ q = NULL;
+ }
+ }
- if (likely(blk_queue_enter(q, flags) == 0)) {
+ if (enter_succeeded) {
struct bio_list lower, same;
/* Create a fresh bio_list for all subordinate requests */
@@ -2436,8 +2459,6 @@ blk_qc_t generic_make_request(struct bio *bio)
bio_list_init(&bio_list_on_stack[0]);
ret = q->make_request_fn(q, bio);
- blk_queue_exit(q);
-
/* sort new bios into those for a lower level
* and those for the same level
*/
@@ -2464,6 +2485,8 @@ blk_qc_t generic_make_request(struct bio *bio)
current->bio_list = NULL; /* deactivate */
out:
+ if (q)
+ blk_queue_exit(q);
return ret;
}
EXPORT_SYMBOL(generic_make_request);
--
2.16.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash
2018-04-10 23:02 [PATCH v3] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash Bart Van Assche
@ 2018-04-10 23:46 ` Jens Axboe
2018-04-12 6:27 ` Christoph Hellwig
2018-04-13 2:43 ` Joseph Qi
2 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2018-04-10 23:46 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-block, Christoph Hellwig, Ming Lei, Joseph Qi
On 4/10/18 5:02 PM, Bart Van Assche wrote:
> Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
> it is no longer safe to access cgroup information during or after the
> blk_cleanup_queue() call. Hence protect the generic_make_request_checks()
> call with blk_queue_enter() / blk_queue_exit().
Looks good, applied.
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash
2018-04-10 23:02 [PATCH v3] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash Bart Van Assche
2018-04-10 23:46 ` Jens Axboe
@ 2018-04-12 6:27 ` Christoph Hellwig
2018-04-12 12:14 ` Bart Van Assche
2018-04-13 2:43 ` Joseph Qi
2 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2018-04-12 6:27 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Ming Lei, Joseph Qi
On Tue, Apr 10, 2018 at 05:02:40PM -0600, Bart Van Assche wrote:
> Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
> it is no longer safe to access cgroup information during or after the
> blk_cleanup_queue() call. Hence protect the generic_make_request_checks()
> call with blk_queue_enter() / blk_queue_exit().
I think the problem is that blkcg does weird things from
blk_cleanup_queue. I'd rather fix that root cause than working around it.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash
2018-04-12 6:27 ` Christoph Hellwig
@ 2018-04-12 12:14 ` Bart Van Assche
0 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2018-04-12 12:14 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Ming Lei, Joseph Qi
On 04/12/18 00:27, Christoph Hellwig wrote:
> On Tue, Apr 10, 2018 at 05:02:40PM -0600, Bart Van Assche wrote:
>> Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
>> it is no longer safe to access cgroup information during or after the
>> blk_cleanup_queue() call. Hence protect the generic_make_request_checks()
>> call with blk_queue_enter() / blk_queue_exit().
>
> I think the problem is that blkcg does weird things from
> blk_cleanup_queue. I'd rather fix that root cause than working around it.
Hello Christoph,
Can you clarify your comment? generic_make_request_checks() calls
blkcg_bio_issue_check() and that function in turn calls blkg_lookup()
and other blkcg functions. Hence this patch that avoids that blkcg code
is called concurrently with removal of a request queue from blkcg.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash
2018-04-10 23:02 [PATCH v3] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash Bart Van Assche
2018-04-10 23:46 ` Jens Axboe
2018-04-12 6:27 ` Christoph Hellwig
@ 2018-04-13 2:43 ` Joseph Qi
2 siblings, 0 replies; 5+ messages in thread
From: Joseph Qi @ 2018-04-13 2:43 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ming Lei
On 18/4/11 07:02, Bart Van Assche wrote:
> Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
> it is no longer safe to access cgroup information during or after the
> blk_cleanup_queue() call. Hence protect the generic_make_request_checks()
> call with blk_queue_enter() / blk_queue_exit().
>
> Reported-by: Ming Lei <ming.lei@redhat.com>
> Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller")
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
I've tested using the following steps:
1) start a fio job with buffered write;
2) then remove the scsi device that fio write to:
echo "scsi remove-single-device ${dev}" > /proc/scsi/scsi
After applying this patch, the reported oops has gone.
Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> ---
>
> Changes compared to v2: converted two ternary expressions into if-statements.
>
> Changes compared to v1: guarded the blk_queue_exit() inside the loop with "if (q)".
>
> block/blk-core.c | 35 +++++++++++++++++++++++++++++------
> 1 file changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 34e2f2227fd9..39308e874ffa 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2386,8 +2386,20 @@ blk_qc_t generic_make_request(struct bio *bio)
> * yet.
> */
> struct bio_list bio_list_on_stack[2];
> + blk_mq_req_flags_t flags = 0;
> + struct request_queue *q = bio->bi_disk->queue;
> blk_qc_t ret = BLK_QC_T_NONE;
>
> + if (bio->bi_opf & REQ_NOWAIT)
> + flags = BLK_MQ_REQ_NOWAIT;
> + if (blk_queue_enter(q, flags) < 0) {
> + if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT))
> + bio_wouldblock_error(bio);
> + else
> + bio_io_error(bio);
> + return ret;
> + }
> +
> if (!generic_make_request_checks(bio))
> goto out;
>
> @@ -2424,11 +2436,22 @@ blk_qc_t generic_make_request(struct bio *bio)
> bio_list_init(&bio_list_on_stack[0]);
> current->bio_list = bio_list_on_stack;
> do {
> - struct request_queue *q = bio->bi_disk->queue;
> - blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
> - BLK_MQ_REQ_NOWAIT : 0;
> + bool enter_succeeded = true;
> +
> + if (unlikely(q != bio->bi_disk->queue)) {
> + if (q)
> + blk_queue_exit(q);
> + q = bio->bi_disk->queue;
> + flags = 0;
> + if (bio->bi_opf & REQ_NOWAIT)
> + flags = BLK_MQ_REQ_NOWAIT;
> + if (blk_queue_enter(q, flags) < 0) {
> + enter_succeeded = false;
> + q = NULL;
> + }
> + }
>
> - if (likely(blk_queue_enter(q, flags) == 0)) {
> + if (enter_succeeded) {
> struct bio_list lower, same;
>
> /* Create a fresh bio_list for all subordinate requests */
> @@ -2436,8 +2459,6 @@ blk_qc_t generic_make_request(struct bio *bio)
> bio_list_init(&bio_list_on_stack[0]);
> ret = q->make_request_fn(q, bio);
>
> - blk_queue_exit(q);
> -
> /* sort new bios into those for a lower level
> * and those for the same level
> */
> @@ -2464,6 +2485,8 @@ blk_qc_t generic_make_request(struct bio *bio)
> current->bio_list = NULL; /* deactivate */
>
> out:
> + if (q)
> + blk_queue_exit(q);
> return ret;
> }
> EXPORT_SYMBOL(generic_make_request);
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-04-13 2:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-10 23:02 [PATCH v3] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash Bart Van Assche
2018-04-10 23:46 ` Jens Axboe
2018-04-12 6:27 ` Christoph Hellwig
2018-04-12 12:14 ` Bart Van Assche
2018-04-13 2:43 ` Joseph Qi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).