From: Jens Axboe <axboe@kernel.dk>
To: Bart Van Assche <bvanassche@acm.org>
Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
Damien Le Moal <dlemoal@kernel.org>,
Yu Kuai <yukuai1@huaweicloud.com>, Ming Lei <ming.lei@redhat.com>,
stable@vger.kernel.org
Subject: Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
Date: Thu, 22 May 2025 11:38:51 -0600 [thread overview]
Message-ID: <b1ea4120-e16a-47c8-b10c-ff6c9d5feb69@kernel.dk> (raw)
In-Reply-To: <20250522171405.3239141-1-bvanassche@acm.org>
On 5/22/25 11:14 AM, Bart Van Assche wrote:
> blk_mq_freeze_queue() never terminates if one or more bios are on the plug
> list and if the block device driver defines a .submit_bio() method.
> This is the case for device mapper drivers. The deadlock happens because
> blk_mq_freeze_queue() waits for q_usage_counter to drop to zero, because
> a queue reference is held by bios on the plug list and because the
> __bio_queue_enter() call in __submit_bio() waits for the queue to be
> unfrozen.
>
> This patch fixes the following deadlock:
>
> Workqueue: dm-51_zwplugs blk_zone_wplug_bio_work
> Call trace:
> __schedule+0xb08/0x1160
> schedule+0x48/0xc8
> __bio_queue_enter+0xcc/0x1d0
> __submit_bio+0x100/0x1b0
> submit_bio_noacct_nocheck+0x230/0x49c
> blk_zone_wplug_bio_work+0x168/0x250
> process_one_work+0x26c/0x65c
> worker_thread+0x33c/0x498
> kthread+0x110/0x134
> ret_from_fork+0x10/0x20
>
> Call trace:
> __switch_to+0x230/0x410
> __schedule+0xb08/0x1160
> schedule+0x48/0xc8
> blk_mq_freeze_queue_wait+0x78/0xb8
> blk_mq_freeze_queue+0x90/0xa4
> queue_attr_store+0x7c/0xf0
> sysfs_kf_write+0x98/0xc8
> kernfs_fop_write_iter+0x12c/0x1d4
> vfs_write+0x340/0x3ac
> ksys_write+0x78/0xe8
>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Yu Kuai <yukuai1@huaweicloud.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: stable@vger.kernel.org
> Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>
> Changes compared to v1: fixed a race condition. Call bio_zone_write_plugging()
> only before submitting the bio and not after it has been submitted.
>
> block/blk-core.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b862c66018f2..713fb3865260 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -621,6 +621,13 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
> return BLK_STS_OK;
> }
>
> +/*
> + * Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been
> + * set because this causes blk_mq_freeze_queue() to deadlock if
> + * blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios
> + * on the plug list is not necessary since a q_usage_counter reference is held
> + * while a bio is on the plug list.
> + */
> static void __submit_bio(struct bio *bio)
> {
> /* If plug is not used, add new plug here to cache nsecs time. */
> @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio)
>
> if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
> blk_mq_submit_bio(bio);
> - } else if (likely(bio_queue_enter(bio) == 0)) {
> + } else {
> struct gendisk *disk = bio->bi_bdev->bd_disk;
> + bool zwp = bio_zone_write_plugging(bio);
> +
> + if (unlikely(!zwp && bio_queue_enter(bio) != 0))
> + goto finish_plug;
>
> if ((bio->bi_opf & REQ_POLLED) &&
> !(disk->queue->limits.features & BLK_FEAT_POLL)) {
> @@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio)
> } else {
> disk->fops->submit_bio(bio);
> }
> - blk_queue_exit(disk->queue);
> +
> + if (!zwp)
> + blk_queue_exit(disk->queue);
> }
This is pretty ugly, and I honestly absolutely hate how there's quite a
bit of zoned_whatever sprinkling throughout the core code. What's the
reason for not unplugging here, unaligned writes? Because you should
presumable have the exact same issues on non-zoned devices if they have
IO stuck in a plug (and doesn't get unplugged) while someone is waiting
on a freeze.
A somewhat similar case was solved for IOPOLL and queue entering. That
would be another thing to look at. Maybe a live enter could work if the
plug itself pins it?
--
Jens Axboe
next prev parent reply other threads:[~2025-05-22 17:38 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-22 17:14 [PATCH] block: Fix a deadlock related freezing zoned storage devices Bart Van Assche
2025-05-22 17:38 ` Jens Axboe [this message]
2025-05-22 18:32 ` Bart Van Assche
2025-05-23 2:10 ` Ming Lei
2025-05-23 6:06 ` Damien Le Moal
2025-05-23 5:53 ` Damien Le Moal
2025-05-23 8:10 ` Damien Le Moal
2025-05-23 8:20 ` Damien Le Moal
2025-05-23 8:22 ` Christoph Hellwig
2025-05-23 8:20 ` Christoph Hellwig
2025-05-23 11:00 ` Damien Le Moal
2025-05-26 7:41 ` Damien Le Moal
2025-05-27 21:49 ` Bart Van Assche
2025-05-23 12:36 ` Jens Axboe
2025-05-23 3:10 ` Christoph Hellwig
2025-05-23 16:08 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b1ea4120-e16a-47c8-b10c-ff6c9d5feb69@kernel.dk \
--to=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=dlemoal@kernel.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=stable@vger.kernel.org \
--cc=yukuai1@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.