From: Damien Le Moal <dlemoal@kernel.org>
To: Jens Axboe <axboe@kernel.dk>, Bart Van Assche <bvanassche@acm.org>
Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
Yu Kuai <yukuai1@huaweicloud.com>, Ming Lei <ming.lei@redhat.com>,
stable@vger.kernel.org
Subject: Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
Date: Fri, 23 May 2025 10:10:30 +0200 [thread overview]
Message-ID: <3cd139d0-5fe0-4ce1-b7a7-36da4fad6eff@kernel.org> (raw)
In-Reply-To: <b1ea4120-e16a-47c8-b10c-ff6c9d5feb69@kernel.dk>
On 5/22/25 19:38, Jens Axboe wrote:
> On 5/22/25 11:14 AM, Bart Van Assche wrote:
>> blk_mq_freeze_queue() never terminates if one or more bios are on the plug
>> list and if the block device driver defines a .submit_bio() method.
>> This is the case for device mapper drivers. The deadlock happens because
>> blk_mq_freeze_queue() waits for q_usage_counter to drop to zero, because
>> a queue reference is held by bios on the plug list and because the
>> __bio_queue_enter() call in __submit_bio() waits for the queue to be
>> unfrozen.
>>
>> This patch fixes the following deadlock:
>>
>> Workqueue: dm-51_zwplugs blk_zone_wplug_bio_work
>> Call trace:
>> __schedule+0xb08/0x1160
>> schedule+0x48/0xc8
>> __bio_queue_enter+0xcc/0x1d0
>> __submit_bio+0x100/0x1b0
>> submit_bio_noacct_nocheck+0x230/0x49c
>> blk_zone_wplug_bio_work+0x168/0x250
>> process_one_work+0x26c/0x65c
>> worker_thread+0x33c/0x498
>> kthread+0x110/0x134
>> ret_from_fork+0x10/0x20
>>
>> Call trace:
>> __switch_to+0x230/0x410
>> __schedule+0xb08/0x1160
>> schedule+0x48/0xc8
>> blk_mq_freeze_queue_wait+0x78/0xb8
>> blk_mq_freeze_queue+0x90/0xa4
>> queue_attr_store+0x7c/0xf0
>> sysfs_kf_write+0x98/0xc8
>> kernfs_fop_write_iter+0x12c/0x1d4
>> vfs_write+0x340/0x3ac
>> ksys_write+0x78/0xe8
>>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Damien Le Moal <dlemoal@kernel.org>
>> Cc: Yu Kuai <yukuai1@huaweicloud.com>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Cc: stable@vger.kernel.org
>> Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>
>> Changes compared to v1: fixed a race condition. Call bio_zone_write_plugging()
>> only before submitting the bio and not after it has been submitted.
>>
>> block/blk-core.c | 18 ++++++++++++++++--
>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index b862c66018f2..713fb3865260 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -621,6 +621,13 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
>> return BLK_STS_OK;
>> }
>>
>> +/*
>> + * Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been
>> + * set because this causes blk_mq_freeze_queue() to deadlock if
>> + * blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios
>> + * on the plug list is not necessary since a q_usage_counter reference is held
>> + * while a bio is on the plug list.
>> + */
>> static void __submit_bio(struct bio *bio)
>> {
>> /* If plug is not used, add new plug here to cache nsecs time. */
>> @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio)
>>
>> if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
>> blk_mq_submit_bio(bio);
>> - } else if (likely(bio_queue_enter(bio) == 0)) {
>> + } else {
>> struct gendisk *disk = bio->bi_bdev->bd_disk;
>> + bool zwp = bio_zone_write_plugging(bio);
>> +
>> + if (unlikely(!zwp && bio_queue_enter(bio) != 0))
>> + goto finish_plug;
>>
>> if ((bio->bi_opf & REQ_POLLED) &&
>> !(disk->queue->limits.features & BLK_FEAT_POLL)) {
>> @@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio)
>> } else {
>> disk->fops->submit_bio(bio);
>> }
>> - blk_queue_exit(disk->queue);
>> +
>> + if (!zwp)
>> + blk_queue_exit(disk->queue);
>> }
>
> This is pretty ugly, and I honestly absolutely hate how there's quite a
> bit of zoned_whatever sprinkling throughout the core code. What's the
> reason for not unplugging here, unaligned writes? Because you should
> presumable have the exact same issues on non-zoned devices if they have
> IO stuck in a plug (and doesn't get unplugged) while someone is waiting
> on a freeze.
>
> A somewhat similar case was solved for IOPOLL and queue entering. That
> would be another thing to look at. Maybe a live enter could work if the
> plug itself pins it?
What about this patch, completely untested...
diff --git a/block/blk-core.c b/block/blk-core.c
index e8cc270a453f..3d2dec088a54 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -621,6 +621,32 @@ static inline blk_status_t blk_check_zone_append(struct
request_queue *q,
return BLK_STS_OK;
}
+static inline void disk_submit_bio(struct bio *bio)
+{
+ struct gendisk *disk = bio->bi_bdev->bd_disk;
+ bool need_queue_enter = !bio_zone_write_plugging(bio);
+
+ /*
+ * BIOs that are issued from a zone write plug already hold a reference
+ * on the device queue usage counter.
+ */
+ if (need_queue_enter) {
+ if (unlikely(bio_queue_enter(bio)))
+ return;
+ }
+
+ if ((bio->bi_opf & REQ_POLLED) &&
+ !(disk->queue->limits.features & BLK_FEAT_POLL)) {
+ bio->bi_status = BLK_STS_NOTSUPP;
+ bio_endio(bio);
+ } else {
+ disk->fops->submit_bio(bio);
+ }
+
+ if (need_queue_enter)
+ blk_queue_exit(disk->queue);
+}
+
static void __submit_bio(struct bio *bio)
{
/* If plug is not used, add new plug here to cache nsecs time. */
@@ -631,20 +657,10 @@ static void __submit_bio(struct bio *bio)
blk_start_plug(&plug);
- if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
+ if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO))
blk_mq_submit_bio(bio);
- } else if (likely(bio_queue_enter(bio) == 0)) {
- struct gendisk *disk = bio->bi_bdev->bd_disk;
-
- if ((bio->bi_opf & REQ_POLLED) &&
- !(disk->queue->limits.features & BLK_FEAT_POLL)) {
- bio->bi_status = BLK_STS_NOTSUPP;
- bio_endio(bio);
- } else {
- disk->fops->submit_bio(bio);
- }
- blk_queue_exit(disk->queue);
- }
+ else
+ disk_submit_bio(bio)
blk_finish_plug(&plug);
}
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2025-05-23 8:10 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-22 17:14 [PATCH] block: Fix a deadlock related freezing zoned storage devices Bart Van Assche
2025-05-22 17:38 ` Jens Axboe
2025-05-22 18:32 ` Bart Van Assche
2025-05-23 2:10 ` Ming Lei
2025-05-23 6:06 ` Damien Le Moal
2025-05-23 5:53 ` Damien Le Moal
2025-05-23 8:10 ` Damien Le Moal [this message]
2025-05-23 8:20 ` Damien Le Moal
2025-05-23 8:22 ` Christoph Hellwig
2025-05-23 8:20 ` Christoph Hellwig
2025-05-23 11:00 ` Damien Le Moal
2025-05-26 7:41 ` Damien Le Moal
2025-05-27 21:49 ` Bart Van Assche
2025-05-23 12:36 ` Jens Axboe
2025-05-23 3:10 ` Christoph Hellwig
2025-05-23 16:08 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3cd139d0-5fe0-4ce1-b7a7-36da4fad6eff@kernel.org \
--to=dlemoal@kernel.org \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=stable@vger.kernel.org \
--cc=yukuai1@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox