* [PATCH] block: Fix a deadlock related freezing zoned storage devices
@ 2025-05-22 17:14 Bart Van Assche
2025-05-22 17:38 ` Jens Axboe
2025-05-23 3:10 ` Christoph Hellwig
0 siblings, 2 replies; 16+ messages in thread
From: Bart Van Assche @ 2025-05-22 17:14 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Christoph Hellwig, Bart Van Assche, Damien Le Moal,
Yu Kuai, Ming Lei, stable
blk_mq_freeze_queue() never terminates if one or more bios are on the plug
list and if the block device driver defines a .submit_bio() method.
This is the case for device mapper drivers. The deadlock happens because
blk_mq_freeze_queue() waits for q_usage_counter to drop to zero, because
a queue reference is held by bios on the plug list and because the
__bio_queue_enter() call in __submit_bio() waits for the queue to be
unfrozen.
This patch fixes the following deadlock:
Workqueue: dm-51_zwplugs blk_zone_wplug_bio_work
Call trace:
__schedule+0xb08/0x1160
schedule+0x48/0xc8
__bio_queue_enter+0xcc/0x1d0
__submit_bio+0x100/0x1b0
submit_bio_noacct_nocheck+0x230/0x49c
blk_zone_wplug_bio_work+0x168/0x250
process_one_work+0x26c/0x65c
worker_thread+0x33c/0x498
kthread+0x110/0x134
ret_from_fork+0x10/0x20
Call trace:
__switch_to+0x230/0x410
__schedule+0xb08/0x1160
schedule+0x48/0xc8
blk_mq_freeze_queue_wait+0x78/0xb8
blk_mq_freeze_queue+0x90/0xa4
queue_attr_store+0x7c/0xf0
sysfs_kf_write+0x98/0xc8
kernfs_fop_write_iter+0x12c/0x1d4
vfs_write+0x340/0x3ac
ksys_write+0x78/0xe8
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Yu Kuai <yukuai1@huaweicloud.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: stable@vger.kernel.org
Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
Changes compared to v1: fixed a race condition. Call bio_zone_write_plugging()
only before submitting the bio and not after it has been submitted.
block/blk-core.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index b862c66018f2..713fb3865260 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -621,6 +621,13 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
return BLK_STS_OK;
}
+/*
+ * Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been
+ * set because this causes blk_mq_freeze_queue() to deadlock if
+ * blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios
+ * on the plug list is not necessary since a q_usage_counter reference is held
+ * while a bio is on the plug list.
+ */
static void __submit_bio(struct bio *bio)
{
/* If plug is not used, add new plug here to cache nsecs time. */
@@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio)
if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
blk_mq_submit_bio(bio);
- } else if (likely(bio_queue_enter(bio) == 0)) {
+ } else {
struct gendisk *disk = bio->bi_bdev->bd_disk;
+ bool zwp = bio_zone_write_plugging(bio);
+
+ if (unlikely(!zwp && bio_queue_enter(bio) != 0))
+ goto finish_plug;
if ((bio->bi_opf & REQ_POLLED) &&
!(disk->queue->limits.features & BLK_FEAT_POLL)) {
@@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio)
} else {
disk->fops->submit_bio(bio);
}
- blk_queue_exit(disk->queue);
+
+ if (!zwp)
+ blk_queue_exit(disk->queue);
}
+finish_plug:
blk_finish_plug(&plug);
}
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-22 17:14 [PATCH] block: Fix a deadlock related freezing zoned storage devices Bart Van Assche
@ 2025-05-22 17:38 ` Jens Axboe
2025-05-22 18:32 ` Bart Van Assche
2025-05-23 8:10 ` Damien Le Moal
2025-05-23 3:10 ` Christoph Hellwig
1 sibling, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2025-05-22 17:38 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Yu Kuai, Ming Lei,
stable
On 5/22/25 11:14 AM, Bart Van Assche wrote:
> blk_mq_freeze_queue() never terminates if one or more bios are on the plug
> list and if the block device driver defines a .submit_bio() method.
> This is the case for device mapper drivers. The deadlock happens because
> blk_mq_freeze_queue() waits for q_usage_counter to drop to zero, because
> a queue reference is held by bios on the plug list and because the
> __bio_queue_enter() call in __submit_bio() waits for the queue to be
> unfrozen.
>
> This patch fixes the following deadlock:
>
> Workqueue: dm-51_zwplugs blk_zone_wplug_bio_work
> Call trace:
> __schedule+0xb08/0x1160
> schedule+0x48/0xc8
> __bio_queue_enter+0xcc/0x1d0
> __submit_bio+0x100/0x1b0
> submit_bio_noacct_nocheck+0x230/0x49c
> blk_zone_wplug_bio_work+0x168/0x250
> process_one_work+0x26c/0x65c
> worker_thread+0x33c/0x498
> kthread+0x110/0x134
> ret_from_fork+0x10/0x20
>
> Call trace:
> __switch_to+0x230/0x410
> __schedule+0xb08/0x1160
> schedule+0x48/0xc8
> blk_mq_freeze_queue_wait+0x78/0xb8
> blk_mq_freeze_queue+0x90/0xa4
> queue_attr_store+0x7c/0xf0
> sysfs_kf_write+0x98/0xc8
> kernfs_fop_write_iter+0x12c/0x1d4
> vfs_write+0x340/0x3ac
> ksys_write+0x78/0xe8
>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Yu Kuai <yukuai1@huaweicloud.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: stable@vger.kernel.org
> Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>
> Changes compared to v1: fixed a race condition. Call bio_zone_write_plugging()
> only before submitting the bio and not after it has been submitted.
>
> block/blk-core.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b862c66018f2..713fb3865260 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -621,6 +621,13 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
> return BLK_STS_OK;
> }
>
> +/*
> + * Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been
> + * set because this causes blk_mq_freeze_queue() to deadlock if
> + * blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios
> + * on the plug list is not necessary since a q_usage_counter reference is held
> + * while a bio is on the plug list.
> + */
> static void __submit_bio(struct bio *bio)
> {
> /* If plug is not used, add new plug here to cache nsecs time. */
> @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio)
>
> if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
> blk_mq_submit_bio(bio);
> - } else if (likely(bio_queue_enter(bio) == 0)) {
> + } else {
> struct gendisk *disk = bio->bi_bdev->bd_disk;
> + bool zwp = bio_zone_write_plugging(bio);
> +
> + if (unlikely(!zwp && bio_queue_enter(bio) != 0))
> + goto finish_plug;
>
> if ((bio->bi_opf & REQ_POLLED) &&
> !(disk->queue->limits.features & BLK_FEAT_POLL)) {
> @@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio)
> } else {
> disk->fops->submit_bio(bio);
> }
> - blk_queue_exit(disk->queue);
> +
> + if (!zwp)
> + blk_queue_exit(disk->queue);
> }
This is pretty ugly, and I honestly absolutely hate how there's quite a
bit of zoned_whatever sprinkling throughout the core code. What's the
reason for not unplugging here, unaligned writes? Because you should
presumable have the exact same issues on non-zoned devices if they have
IO stuck in a plug (and doesn't get unplugged) while someone is waiting
on a freeze.
A somewhat similar case was solved for IOPOLL and queue entering. That
would be another thing to look at. Maybe a live enter could work if the
plug itself pins it?
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-22 17:38 ` Jens Axboe
@ 2025-05-22 18:32 ` Bart Van Assche
2025-05-23 2:10 ` Ming Lei
2025-05-23 5:53 ` Damien Le Moal
2025-05-23 8:10 ` Damien Le Moal
1 sibling, 2 replies; 16+ messages in thread
From: Bart Van Assche @ 2025-05-22 18:32 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Yu Kuai, Ming Lei,
stable
On 5/22/25 10:38 AM, Jens Axboe wrote:
> On 5/22/25 11:14 AM, Bart Van Assche wrote:
>> static void __submit_bio(struct bio *bio)
>> {
>> /* If plug is not used, add new plug here to cache nsecs time. */
>> @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio)
>>
>> if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
>> blk_mq_submit_bio(bio);
>> - } else if (likely(bio_queue_enter(bio) == 0)) {
>> + } else {
>> struct gendisk *disk = bio->bi_bdev->bd_disk;
>> + bool zwp = bio_zone_write_plugging(bio);
>> +
>> + if (unlikely(!zwp && bio_queue_enter(bio) != 0))
>> + goto finish_plug;
>>
>> if ((bio->bi_opf & REQ_POLLED) &&
>> !(disk->queue->limits.features & BLK_FEAT_POLL)) {
>> @@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio)
>> } else {
>> disk->fops->submit_bio(bio);
>> }
>> - blk_queue_exit(disk->queue);
>> +
>> + if (!zwp)
>> + blk_queue_exit(disk->queue);
>> }
>
> This is pretty ugly, and I honestly absolutely hate how there's quite a
> bit of zoned_whatever sprinkling throughout the core code. What's the
> reason for not unplugging here, unaligned writes? Because you should
> presumable have the exact same issues on non-zoned devices if they have
> IO stuck in a plug (and doesn't get unplugged) while someone is waiting
> on a freeze.
>
> A somewhat similar case was solved for IOPOLL and queue entering. That
> would be another thing to look at. Maybe a live enter could work if the
> plug itself pins it?
Hi Jens,
q->q_usage_counter is not increased for bios on current->plug_list.
q->q_usage_counter is increased before a bio is added to the zoned
pluglist. So these two cases are different.
I think it is important to hold a q->q_usage_counter reference for bios
on the zoned plug list because bios are added to that list after bio
splitting happened. Hence, request queue limits must not change while
any bio is on the zoned plug list.
Damien, do you think it would be possible to add a bio to the zoned plug
list before it is split and not to hold q->q_usage_counter for bios on
the zoned plug list?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-22 18:32 ` Bart Van Assche
@ 2025-05-23 2:10 ` Ming Lei
2025-05-23 6:06 ` Damien Le Moal
2025-05-23 5:53 ` Damien Le Moal
1 sibling, 1 reply; 16+ messages in thread
From: Ming Lei @ 2025-05-23 2:10 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
Yu Kuai, stable
On Thu, May 22, 2025 at 11:32:58AM -0700, Bart Van Assche wrote:
> On 5/22/25 10:38 AM, Jens Axboe wrote:
> > On 5/22/25 11:14 AM, Bart Van Assche wrote:
> > > static void __submit_bio(struct bio *bio)
> > > {
> > > /* If plug is not used, add new plug here to cache nsecs time. */
> > > @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio)
> > > if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
> > > blk_mq_submit_bio(bio);
> > > - } else if (likely(bio_queue_enter(bio) == 0)) {
> > > + } else {
> > > struct gendisk *disk = bio->bi_bdev->bd_disk;
> > > + bool zwp = bio_zone_write_plugging(bio);
> > > +
> > > + if (unlikely(!zwp && bio_queue_enter(bio) != 0))
> > > + goto finish_plug;
> > >
> > > if ((bio->bi_opf & REQ_POLLED) &&
> > > !(disk->queue->limits.features & BLK_FEAT_POLL)) {
> > > @@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio)
> > > } else {
> > > disk->fops->submit_bio(bio);
> > > }
> > > - blk_queue_exit(disk->queue);
> > > +
> > > + if (!zwp)
> > > + blk_queue_exit(disk->queue);
> > > }
> >
> > This is pretty ugly, and I honestly absolutely hate how there's quite a
> > bit of zoned_whatever sprinkling throughout the core code. What's the
> > reason for not unplugging here, unaligned writes? Because you should
> > presumable have the exact same issues on non-zoned devices if they have
> > IO stuck in a plug (and doesn't get unplugged) while someone is waiting
> > on a freeze.
> >
> > A somewhat similar case was solved for IOPOLL and queue entering. That
> > would be another thing to look at. Maybe a live enter could work if the
> > plug itself pins it?
>
> Hi Jens,
>
> q->q_usage_counter is not increased for bios on current->plug_list.
> q->q_usage_counter is increased before a bio is added to the zoned pluglist.
> So these two cases are different.
>
> I think it is important to hold a q->q_usage_counter reference for bios
> on the zoned plug list because bios are added to that list after bio
> splitting happened. Hence, request queue limits must not change while
> any bio is on the zoned plug list.
Hi Bart,
Can you share why request queue limit can't be changed after bio is added
to zoned plug list?
If it is really true, we may have to drain zoned plug list when freezing
queue.
Thanks,
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-23 2:10 ` Ming Lei
@ 2025-05-23 6:06 ` Damien Le Moal
0 siblings, 0 replies; 16+ messages in thread
From: Damien Le Moal @ 2025-05-23 6:06 UTC (permalink / raw)
To: Ming Lei, Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Yu Kuai, stable
On 5/23/25 04:10, Ming Lei wrote:
> On Thu, May 22, 2025 at 11:32:58AM -0700, Bart Van Assche wrote:
>> On 5/22/25 10:38 AM, Jens Axboe wrote:
>>> On 5/22/25 11:14 AM, Bart Van Assche wrote:
>>>> static void __submit_bio(struct bio *bio)
>>>> {
>>>> /* If plug is not used, add new plug here to cache nsecs time. */
>>>> @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio)
>>>> if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
>>>> blk_mq_submit_bio(bio);
>>>> - } else if (likely(bio_queue_enter(bio) == 0)) {
>>>> + } else {
>>>> struct gendisk *disk = bio->bi_bdev->bd_disk;
>>>> + bool zwp = bio_zone_write_plugging(bio);
>>>> +
>>>> + if (unlikely(!zwp && bio_queue_enter(bio) != 0))
>>>> + goto finish_plug;
>>>>
>>>> if ((bio->bi_opf & REQ_POLLED) &&
>>>> !(disk->queue->limits.features & BLK_FEAT_POLL)) {
>>>> @@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio)
>>>> } else {
>>>> disk->fops->submit_bio(bio);
>>>> }
>>>> - blk_queue_exit(disk->queue);
>>>> +
>>>> + if (!zwp)
>>>> + blk_queue_exit(disk->queue);
>>>> }
>>>
>>> This is pretty ugly, and I honestly absolutely hate how there's quite a
>>> bit of zoned_whatever sprinkling throughout the core code. What's the
>>> reason for not unplugging here, unaligned writes? Because you should
>>> presumable have the exact same issues on non-zoned devices if they have
>>> IO stuck in a plug (and doesn't get unplugged) while someone is waiting
>>> on a freeze.
>>>
>>> A somewhat similar case was solved for IOPOLL and queue entering. That
>>> would be another thing to look at. Maybe a live enter could work if the
>>> plug itself pins it?
>>
>> Hi Jens,
>>
>> q->q_usage_counter is not increased for bios on current->plug_list.
>> q->q_usage_counter is increased before a bio is added to the zoned pluglist.
>> So these two cases are different.
>>
>> I think it is important to hold a q->q_usage_counter reference for bios
>> on the zoned plug list because bios are added to that list after bio
>> splitting happened. Hence, request queue limits must not change while
>> any bio is on the zoned plug list.
>
> Hi Bart,
>
> Can you share why request queue limit can't be changed after bio is added
> to zoned plug list?
Because BIOs on a zone write plug list have already been split according to the
current request queue limits. So until these BIOs are executed, we cannot change
the limits as that potentially would require again splitting and that would
completely messup the zone write pointer tracking of zone write plugging.
> If it is really true, we may have to drain zoned plug list when freezing
> queue.
Yes, that is what we need. But we currently endup deadlocking on a queue freeze
because the internal issuing of plugged zone write BIOs uses
submit_bio_noacct_nocheck() which calls __submit_bio() and that function will
call blk_queue_enter() again for DM device BIOs. We need to somehow cleanly
avoid calling that queue enter without sprinkling around lots of zone stuff in
this core block layer submission path.
>
>
> Thanks,
> Ming
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-22 18:32 ` Bart Van Assche
2025-05-23 2:10 ` Ming Lei
@ 2025-05-23 5:53 ` Damien Le Moal
1 sibling, 0 replies; 16+ messages in thread
From: Damien Le Moal @ 2025-05-23 5:53 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Yu Kuai, Ming Lei, stable
On 5/22/25 20:32, Bart Van Assche wrote:
> On 5/22/25 10:38 AM, Jens Axboe wrote:
>> On 5/22/25 11:14 AM, Bart Van Assche wrote:
>>> static void __submit_bio(struct bio *bio)
>>> {
>>> /* If plug is not used, add new plug here to cache nsecs time. */
>>> @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio)
>>>
>>> if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
>>> blk_mq_submit_bio(bio);
>>> - } else if (likely(bio_queue_enter(bio) == 0)) {
>>> + } else {
>>> struct gendisk *disk = bio->bi_bdev->bd_disk;
>>> + bool zwp = bio_zone_write_plugging(bio);
>>> +
>>> + if (unlikely(!zwp && bio_queue_enter(bio) != 0))
>>> + goto finish_plug;
>>>
>>> if ((bio->bi_opf & REQ_POLLED) &&
>>> !(disk->queue->limits.features & BLK_FEAT_POLL)) {
>>> @@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio)
>>> } else {
>>> disk->fops->submit_bio(bio);
>>> }
>>> - blk_queue_exit(disk->queue);
>>> +
>>> + if (!zwp)
>>> + blk_queue_exit(disk->queue);
>>> }
>>
>> This is pretty ugly, and I honestly absolutely hate how there's quite a
>> bit of zoned_whatever sprinkling throughout the core code. What's the
>> reason for not unplugging here, unaligned writes? Because you should
>> presumable have the exact same issues on non-zoned devices if they have
>> IO stuck in a plug (and doesn't get unplugged) while someone is waiting
>> on a freeze.
>>
>> A somewhat similar case was solved for IOPOLL and queue entering. That
>> would be another thing to look at. Maybe a live enter could work if the
>> plug itself pins it?
>
> Hi Jens,
>
> q->q_usage_counter is not increased for bios on current->plug_list.
> q->q_usage_counter is increased before a bio is added to the zoned
> pluglist. So these two cases are different.
>
> I think it is important to hold a q->q_usage_counter reference for bios
> on the zoned plug list because bios are added to that list after bio
> splitting happened. Hence, request queue limits must not change while
> any bio is on the zoned plug list.
>
> Damien, do you think it would be possible to add a bio to the zoned plug
> list before it is split and not to hold q->q_usage_counter for bios on
> the zoned plug list?
I do not think this is the right thing to do as that will completely mess up the
queue usage counter value with regard to submit_bio() calls. That value is
incremented when a BIO is submitted, and that should stay the same for zoned
write BIOs that endup in a zone write plug instead of going straigth to the device.
The issue comes from the fact that blk_zone_wplug_bio_work() calls
submit_bio_noacct_nocheck() which eventually calls __submit_bio() and that
function calls blk_queue_enter() for DM devices. We still need to preserve that
for the first submission of the BIO but should not/do not need to do the
blk_queue_enter() call again when a BIO is submitted via the zone write plug
work. I am not sure yet how to best deal with that.
I am traveling today and will not have time to cook something. Will take a look
over the weekend.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-22 17:38 ` Jens Axboe
2025-05-22 18:32 ` Bart Van Assche
@ 2025-05-23 8:10 ` Damien Le Moal
2025-05-23 8:20 ` Damien Le Moal
` (2 more replies)
1 sibling, 3 replies; 16+ messages in thread
From: Damien Le Moal @ 2025-05-23 8:10 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche
Cc: linux-block, Christoph Hellwig, Yu Kuai, Ming Lei, stable
On 5/22/25 19:38, Jens Axboe wrote:
> On 5/22/25 11:14 AM, Bart Van Assche wrote:
>> blk_mq_freeze_queue() never terminates if one or more bios are on the plug
>> list and if the block device driver defines a .submit_bio() method.
>> This is the case for device mapper drivers. The deadlock happens because
>> blk_mq_freeze_queue() waits for q_usage_counter to drop to zero, because
>> a queue reference is held by bios on the plug list and because the
>> __bio_queue_enter() call in __submit_bio() waits for the queue to be
>> unfrozen.
>>
>> This patch fixes the following deadlock:
>>
>> Workqueue: dm-51_zwplugs blk_zone_wplug_bio_work
>> Call trace:
>> __schedule+0xb08/0x1160
>> schedule+0x48/0xc8
>> __bio_queue_enter+0xcc/0x1d0
>> __submit_bio+0x100/0x1b0
>> submit_bio_noacct_nocheck+0x230/0x49c
>> blk_zone_wplug_bio_work+0x168/0x250
>> process_one_work+0x26c/0x65c
>> worker_thread+0x33c/0x498
>> kthread+0x110/0x134
>> ret_from_fork+0x10/0x20
>>
>> Call trace:
>> __switch_to+0x230/0x410
>> __schedule+0xb08/0x1160
>> schedule+0x48/0xc8
>> blk_mq_freeze_queue_wait+0x78/0xb8
>> blk_mq_freeze_queue+0x90/0xa4
>> queue_attr_store+0x7c/0xf0
>> sysfs_kf_write+0x98/0xc8
>> kernfs_fop_write_iter+0x12c/0x1d4
>> vfs_write+0x340/0x3ac
>> ksys_write+0x78/0xe8
>>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Damien Le Moal <dlemoal@kernel.org>
>> Cc: Yu Kuai <yukuai1@huaweicloud.com>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Cc: stable@vger.kernel.org
>> Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>
>> Changes compared to v1: fixed a race condition. Call bio_zone_write_plugging()
>> only before submitting the bio and not after it has been submitted.
>>
>> block/blk-core.c | 18 ++++++++++++++++--
>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index b862c66018f2..713fb3865260 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -621,6 +621,13 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
>> return BLK_STS_OK;
>> }
>>
>> +/*
>> + * Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been
>> + * set because this causes blk_mq_freeze_queue() to deadlock if
>> + * blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios
>> + * on the plug list is not necessary since a q_usage_counter reference is held
>> + * while a bio is on the plug list.
>> + */
>> static void __submit_bio(struct bio *bio)
>> {
>> /* If plug is not used, add new plug here to cache nsecs time. */
>> @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio)
>>
>> if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
>> blk_mq_submit_bio(bio);
>> - } else if (likely(bio_queue_enter(bio) == 0)) {
>> + } else {
>> struct gendisk *disk = bio->bi_bdev->bd_disk;
>> + bool zwp = bio_zone_write_plugging(bio);
>> +
>> + if (unlikely(!zwp && bio_queue_enter(bio) != 0))
>> + goto finish_plug;
>>
>> if ((bio->bi_opf & REQ_POLLED) &&
>> !(disk->queue->limits.features & BLK_FEAT_POLL)) {
>> @@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio)
>> } else {
>> disk->fops->submit_bio(bio);
>> }
>> - blk_queue_exit(disk->queue);
>> +
>> + if (!zwp)
>> + blk_queue_exit(disk->queue);
>> }
>
> This is pretty ugly, and I honestly absolutely hate how there's quite a
> bit of zoned_whatever sprinkling throughout the core code. What's the
> reason for not unplugging here, unaligned writes? Because you should
> presumable have the exact same issues on non-zoned devices if they have
> IO stuck in a plug (and doesn't get unplugged) while someone is waiting
> on a freeze.
>
> A somewhat similar case was solved for IOPOLL and queue entering. That
> would be another thing to look at. Maybe a live enter could work if the
> plug itself pins it?
What about this patch, completely untested...
diff --git a/block/blk-core.c b/block/blk-core.c
index e8cc270a453f..3d2dec088a54 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -621,6 +621,32 @@ static inline blk_status_t blk_check_zone_append(struct
request_queue *q,
return BLK_STS_OK;
}
+static inline void disk_submit_bio(struct bio *bio)
+{
+ struct gendisk *disk = bio->bi_bdev->bd_disk;
+ bool need_queue_enter = !bio_zone_write_plugging(bio);
+
+ /*
+ * BIOs that are issued from a zone write plug already hold a reference
+ * on the device queue usage counter.
+ */
+ if (need_queue_enter) {
+ if (unlikely(bio_queue_enter(bio)))
+ return;
+ }
+
+ if ((bio->bi_opf & REQ_POLLED) &&
+ !(disk->queue->limits.features & BLK_FEAT_POLL)) {
+ bio->bi_status = BLK_STS_NOTSUPP;
+ bio_endio(bio);
+ } else {
+ disk->fops->submit_bio(bio);
+ }
+
+ if (need_queue_enter)
+ blk_queue_exit(disk->queue);
+}
+
static void __submit_bio(struct bio *bio)
{
/* If plug is not used, add new plug here to cache nsecs time. */
@@ -631,20 +657,10 @@ static void __submit_bio(struct bio *bio)
blk_start_plug(&plug);
- if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
+ if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO))
blk_mq_submit_bio(bio);
- } else if (likely(bio_queue_enter(bio) == 0)) {
- struct gendisk *disk = bio->bi_bdev->bd_disk;
-
- if ((bio->bi_opf & REQ_POLLED) &&
- !(disk->queue->limits.features & BLK_FEAT_POLL)) {
- bio->bi_status = BLK_STS_NOTSUPP;
- bio_endio(bio);
- } else {
- disk->fops->submit_bio(bio);
- }
- blk_queue_exit(disk->queue);
- }
+ else
+ disk_submit_bio(bio)
blk_finish_plug(&plug);
}
--
Damien Le Moal
Western Digital Research
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-23 8:10 ` Damien Le Moal
@ 2025-05-23 8:20 ` Damien Le Moal
2025-05-23 8:22 ` Christoph Hellwig
2025-05-23 8:20 ` Christoph Hellwig
2025-05-23 12:36 ` Jens Axboe
2 siblings, 1 reply; 16+ messages in thread
From: Damien Le Moal @ 2025-05-23 8:20 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche
Cc: linux-block, Christoph Hellwig, Yu Kuai, Ming Lei, stable
On 5/23/25 10:10, Damien Le Moal wrote:
> What about this patch, completely untested...
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index e8cc270a453f..3d2dec088a54 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -621,6 +621,32 @@ static inline blk_status_t blk_check_zone_append(struct
> request_queue *q,
> return BLK_STS_OK;
> }
>
> +static inline void disk_submit_bio(struct bio *bio)
> +{
> + struct gendisk *disk = bio->bi_bdev->bd_disk;
> + bool need_queue_enter = !bio_zone_write_plugging(bio);
> +
> + /*
> + * BIOs that are issued from a zone write plug already hold a reference
> + * on the device queue usage counter.
> + */
> + if (need_queue_enter) {
> + if (unlikely(bio_queue_enter(bio)))
> + return;
> + }
> +
> + if ((bio->bi_opf & REQ_POLLED) &&
> + !(disk->queue->limits.features & BLK_FEAT_POLL)) {
> + bio->bi_status = BLK_STS_NOTSUPP;
> + bio_endio(bio);
> + } else {
> + disk->fops->submit_bio(bio);
> + }
> +
> + if (need_queue_enter)
> + blk_queue_exit(disk->queue);
> +}
> +
> static void __submit_bio(struct bio *bio)
> {
> /* If plug is not used, add new plug here to cache nsecs time. */
> @@ -631,20 +657,10 @@ static void __submit_bio(struct bio *bio)
>
> blk_start_plug(&plug);
>
> - if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
> + if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO))
> blk_mq_submit_bio(bio);
> - } else if (likely(bio_queue_enter(bio) == 0)) {
> - struct gendisk *disk = bio->bi_bdev->bd_disk;
> -
> - if ((bio->bi_opf & REQ_POLLED) &&
> - !(disk->queue->limits.features & BLK_FEAT_POLL)) {
> - bio->bi_status = BLK_STS_NOTSUPP;
> - bio_endio(bio);
> - } else {
> - disk->fops->submit_bio(bio);
> - }
> - blk_queue_exit(disk->queue);
> - }
> + else
> + disk_submit_bio(bio)
Missing ";" here.
>
> blk_finish_plug(&plug);
> }
Looking into this deeper, the regular mq path actually likely has the same issue
since it will call blk_queue_enter() if we do not have a cached request. So this
solution is only partial and not good enough. We need something else.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-23 8:20 ` Damien Le Moal
@ 2025-05-23 8:22 ` Christoph Hellwig
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2025-05-23 8:22 UTC (permalink / raw)
To: Damien Le Moal
Cc: Jens Axboe, Bart Van Assche, linux-block, Christoph Hellwig,
Yu Kuai, Ming Lei, stable
On Fri, May 23, 2025 at 10:20:35AM +0200, Damien Le Moal wrote:
>
> Looking into this deeper, the regular mq path actually likely has the same issue
> since it will call blk_queue_enter() if we do not have a cached request. So this
> solution is only partial and not good enough. We need something else.
The bio_zone_write_plugging case in blk_mq_submit_bio always
reused the existing queue reference.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-23 8:10 ` Damien Le Moal
2025-05-23 8:20 ` Damien Le Moal
@ 2025-05-23 8:20 ` Christoph Hellwig
2025-05-23 11:00 ` Damien Le Moal
` (2 more replies)
2025-05-23 12:36 ` Jens Axboe
2 siblings, 3 replies; 16+ messages in thread
From: Christoph Hellwig @ 2025-05-23 8:20 UTC (permalink / raw)
To: Damien Le Moal
Cc: Jens Axboe, Bart Van Assche, linux-block, Christoph Hellwig,
Yu Kuai, Ming Lei, stable
On Fri, May 23, 2025 at 10:10:30AM +0200, Damien Le Moal wrote:
> > This is pretty ugly, and I honestly absolutely hate how there's quite a
> > bit of zoned_whatever sprinkling throughout the core code. What's the
> > reason for not unplugging here, unaligned writes? Because you should
> > presumable have the exact same issues on non-zoned devices if they have
> > IO stuck in a plug (and doesn't get unplugged) while someone is waiting
> > on a freeze.
> >
> > A somewhat similar case was solved for IOPOLL and queue entering. That
> > would be another thing to look at. Maybe a live enter could work if the
> > plug itself pins it?
>
> What about this patch, completely untested...
This still looks extremely backwards as it messed up common core
code for something that it shouldn't. I'd still love to see an
actual reproducer ahead of me, but I think the problem is that
blk_zone_wplug_bio_work calls into the still fairly high-level
submit_bio_noacct_nocheck for bios that already went through
the submit_bio machinery.
Something like this completely untested patch:
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 8f15d1aa6eb8..6841af8a989c 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -1306,16 +1306,18 @@ static void blk_zone_wplug_bio_work(struct work_struct *work)
spin_unlock_irqrestore(&zwplug->lock, flags);
bdev = bio->bi_bdev;
- submit_bio_noacct_nocheck(bio);
-
/*
* blk-mq devices will reuse the extra reference on the request queue
* usage counter we took when the BIO was plugged, but the submission
* path for BIO-based devices will not do that. So drop this extra
* reference here.
*/
- if (bdev_test_flag(bdev, BD_HAS_SUBMIT_BIO))
+ if (bdev_test_flag(bdev, BD_HAS_SUBMIT_BIO)) {
+ bdev->bd_disk->fops->submit_bio(bio);
blk_queue_exit(bdev->bd_disk->queue);
+ } else {
+ blk_mq_submit_bio(bio);
+ }
put_zwplug:
/* Drop the reference we took in disk_zone_wplug_schedule_bio_work(). */
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-23 8:20 ` Christoph Hellwig
@ 2025-05-23 11:00 ` Damien Le Moal
2025-05-26 7:41 ` Damien Le Moal
2025-05-27 21:49 ` Bart Van Assche
2 siblings, 0 replies; 16+ messages in thread
From: Damien Le Moal @ 2025-05-23 11:00 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Bart Van Assche, linux-block, Yu Kuai, Ming Lei,
stable
On 5/23/25 10:20, Christoph Hellwig wrote:
> On Fri, May 23, 2025 at 10:10:30AM +0200, Damien Le Moal wrote:
>>> This is pretty ugly, and I honestly absolutely hate how there's quite a
>>> bit of zoned_whatever sprinkling throughout the core code. What's the
>>> reason for not unplugging here, unaligned writes? Because you should
>>> presumable have the exact same issues on non-zoned devices if they have
>>> IO stuck in a plug (and doesn't get unplugged) while someone is waiting
>>> on a freeze.
>>>
>>> A somewhat similar case was solved for IOPOLL and queue entering. That
>>> would be another thing to look at. Maybe a live enter could work if the
>>> plug itself pins it?
>>
>> What about this patch, completely untested...
>
> This still looks extremely backwards as it messed up common core
> code for something that it shouldn't. I'd still love to see an
> actual reproducer ahead of me, but I think the problem is that
> blk_zone_wplug_bio_work calls into the still fairly high-level
> submit_bio_noacct_nocheck for bios that already went through
> the submit_bio machinery.
>
> Something like this completely untested patch:
>
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 8f15d1aa6eb8..6841af8a989c 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -1306,16 +1306,18 @@ static void blk_zone_wplug_bio_work(struct work_struct *work)
> spin_unlock_irqrestore(&zwplug->lock, flags);
>
> bdev = bio->bi_bdev;
> - submit_bio_noacct_nocheck(bio);
> -
> /*
> * blk-mq devices will reuse the extra reference on the request queue
> * usage counter we took when the BIO was plugged, but the submission
> * path for BIO-based devices will not do that. So drop this extra
> * reference here.
> */
> - if (bdev_test_flag(bdev, BD_HAS_SUBMIT_BIO))
> + if (bdev_test_flag(bdev, BD_HAS_SUBMIT_BIO)) {
> + bdev->bd_disk->fops->submit_bio(bio);
> blk_queue_exit(bdev->bd_disk->queue);
> + } else {
> + blk_mq_submit_bio(bio);
> + }
Indeed, this looks better and simpler. Will give it a spin to check.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-23 8:20 ` Christoph Hellwig
2025-05-23 11:00 ` Damien Le Moal
@ 2025-05-26 7:41 ` Damien Le Moal
2025-05-27 21:49 ` Bart Van Assche
2 siblings, 0 replies; 16+ messages in thread
From: Damien Le Moal @ 2025-05-26 7:41 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Bart Van Assche, linux-block, Yu Kuai, Ming Lei,
stable
On 5/23/25 10:20, Christoph Hellwig wrote:
> Something like this completely untested patch:
>
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 8f15d1aa6eb8..6841af8a989c 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -1306,16 +1306,18 @@ static void blk_zone_wplug_bio_work(struct work_struct *work)
> spin_unlock_irqrestore(&zwplug->lock, flags);
>
> bdev = bio->bi_bdev;
> - submit_bio_noacct_nocheck(bio);
> -
> /*
> * blk-mq devices will reuse the extra reference on the request queue
> * usage counter we took when the BIO was plugged, but the submission
> * path for BIO-based devices will not do that. So drop this extra
> * reference here.
> */
> - if (bdev_test_flag(bdev, BD_HAS_SUBMIT_BIO))
> + if (bdev_test_flag(bdev, BD_HAS_SUBMIT_BIO)) {
> + bdev->bd_disk->fops->submit_bio(bio);
> blk_queue_exit(bdev->bd_disk->queue);
> + } else {
> + blk_mq_submit_bio(bio);
> + }
>
> put_zwplug:
> /* Drop the reference we took in disk_zone_wplug_schedule_bio_work(). */
I ran xfs on an SMR drive with this and I had no issues. Will do more test with
a device mapper added.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-23 8:20 ` Christoph Hellwig
2025-05-23 11:00 ` Damien Le Moal
2025-05-26 7:41 ` Damien Le Moal
@ 2025-05-27 21:49 ` Bart Van Assche
2 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2025-05-27 21:49 UTC (permalink / raw)
To: Christoph Hellwig, Damien Le Moal
Cc: Jens Axboe, linux-block, Yu Kuai, Ming Lei, stable
On 5/23/25 1:20 AM, Christoph Hellwig wrote:
> Something like this completely untested patch:
>
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 8f15d1aa6eb8..6841af8a989c 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -1306,16 +1306,18 @@ static void blk_zone_wplug_bio_work(struct work_struct *work)
> spin_unlock_irqrestore(&zwplug->lock, flags);
>
> bdev = bio->bi_bdev;
> - submit_bio_noacct_nocheck(bio);
> -
> /*
> * blk-mq devices will reuse the extra reference on the request queue
> * usage counter we took when the BIO was plugged, but the submission
> * path for BIO-based devices will not do that. So drop this extra
> * reference here.
> */
> - if (bdev_test_flag(bdev, BD_HAS_SUBMIT_BIO))
> + if (bdev_test_flag(bdev, BD_HAS_SUBMIT_BIO)) {
> + bdev->bd_disk->fops->submit_bio(bio);
> blk_queue_exit(bdev->bd_disk->queue);
> + } else {
> + blk_mq_submit_bio(bio);
> + }
>
> put_zwplug:
> /* Drop the reference we took in disk_zone_wplug_schedule_bio_work(). */
This patch works fine on my test setup.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-23 8:10 ` Damien Le Moal
2025-05-23 8:20 ` Damien Le Moal
2025-05-23 8:20 ` Christoph Hellwig
@ 2025-05-23 12:36 ` Jens Axboe
2 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2025-05-23 12:36 UTC (permalink / raw)
To: Damien Le Moal, Bart Van Assche
Cc: linux-block, Christoph Hellwig, Yu Kuai, Ming Lei, stable
On 5/23/25 2:10 AM, Damien Le Moal wrote:
> On 5/22/25 19:38, Jens Axboe wrote:
>> On 5/22/25 11:14 AM, Bart Van Assche wrote:
>>> blk_mq_freeze_queue() never terminates if one or more bios are on the plug
>>> list and if the block device driver defines a .submit_bio() method.
>>> This is the case for device mapper drivers. The deadlock happens because
>>> blk_mq_freeze_queue() waits for q_usage_counter to drop to zero, because
>>> a queue reference is held by bios on the plug list and because the
>>> __bio_queue_enter() call in __submit_bio() waits for the queue to be
>>> unfrozen.
>>>
>>> This patch fixes the following deadlock:
>>>
>>> Workqueue: dm-51_zwplugs blk_zone_wplug_bio_work
>>> Call trace:
>>> __schedule+0xb08/0x1160
>>> schedule+0x48/0xc8
>>> __bio_queue_enter+0xcc/0x1d0
>>> __submit_bio+0x100/0x1b0
>>> submit_bio_noacct_nocheck+0x230/0x49c
>>> blk_zone_wplug_bio_work+0x168/0x250
>>> process_one_work+0x26c/0x65c
>>> worker_thread+0x33c/0x498
>>> kthread+0x110/0x134
>>> ret_from_fork+0x10/0x20
>>>
>>> Call trace:
>>> __switch_to+0x230/0x410
>>> __schedule+0xb08/0x1160
>>> schedule+0x48/0xc8
>>> blk_mq_freeze_queue_wait+0x78/0xb8
>>> blk_mq_freeze_queue+0x90/0xa4
>>> queue_attr_store+0x7c/0xf0
>>> sysfs_kf_write+0x98/0xc8
>>> kernfs_fop_write_iter+0x12c/0x1d4
>>> vfs_write+0x340/0x3ac
>>> ksys_write+0x78/0xe8
>>>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>> Cc: Damien Le Moal <dlemoal@kernel.org>
>>> Cc: Yu Kuai <yukuai1@huaweicloud.com>
>>> Cc: Ming Lei <ming.lei@redhat.com>
>>> Cc: stable@vger.kernel.org
>>> Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>> ---
>>>
>>> Changes compared to v1: fixed a race condition. Call bio_zone_write_plugging()
>>> only before submitting the bio and not after it has been submitted.
>>>
>>> block/blk-core.c | 18 ++++++++++++++++--
>>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index b862c66018f2..713fb3865260 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -621,6 +621,13 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
>>> return BLK_STS_OK;
>>> }
>>>
>>> +/*
>>> + * Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been
>>> + * set because this causes blk_mq_freeze_queue() to deadlock if
>>> + * blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios
>>> + * on the plug list is not necessary since a q_usage_counter reference is held
>>> + * while a bio is on the plug list.
>>> + */
>>> static void __submit_bio(struct bio *bio)
>>> {
>>> /* If plug is not used, add new plug here to cache nsecs time. */
>>> @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio)
>>>
>>> if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
>>> blk_mq_submit_bio(bio);
>>> - } else if (likely(bio_queue_enter(bio) == 0)) {
>>> + } else {
>>> struct gendisk *disk = bio->bi_bdev->bd_disk;
>>> + bool zwp = bio_zone_write_plugging(bio);
>>> +
>>> + if (unlikely(!zwp && bio_queue_enter(bio) != 0))
>>> + goto finish_plug;
>>>
>>> if ((bio->bi_opf & REQ_POLLED) &&
>>> !(disk->queue->limits.features & BLK_FEAT_POLL)) {
>>> @@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio)
>>> } else {
>>> disk->fops->submit_bio(bio);
>>> }
>>> - blk_queue_exit(disk->queue);
>>> +
>>> + if (!zwp)
>>> + blk_queue_exit(disk->queue);
>>> }
>>
>> This is pretty ugly, and I honestly absolutely hate how there's quite a
>> bit of zoned_whatever sprinkling throughout the core code. What's the
>> reason for not unplugging here, unaligned writes? Because you should
>> presumable have the exact same issues on non-zoned devices if they have
>> IO stuck in a plug (and doesn't get unplugged) while someone is waiting
>> on a freeze.
>>
>> A somewhat similar case was solved for IOPOLL and queue entering. That
>> would be another thing to look at. Maybe a live enter could work if the
>> plug itself pins it?
>
> What about this patch, completely untested...
It's exactly the same thing, more zoned sprinkling in code that should
not need to care. Only difference is that it moved to a function.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Fix a deadlock related freezing zoned storage devices
2025-05-22 17:14 [PATCH] block: Fix a deadlock related freezing zoned storage devices Bart Van Assche
2025-05-22 17:38 ` Jens Axboe
@ 2025-05-23 3:10 ` Christoph Hellwig
2025-05-23 16:08 ` Bart Van Assche
1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2025-05-23 3:10 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
Yu Kuai, Ming Lei, stable
On Thu, May 22, 2025 at 10:14:05AM -0700, Bart Van Assche wrote:
> Changes compared to v1: fixed a race condition. Call bio_zone_write_plugging()
> only before submitting the bio and not after it has been submitted.
And just like I told you for v1 we can't just bypass queue freezing.
And I'll ask you again to provide a reproducer and explain your findings.
Without that you we are just waisting a lot of time and effort.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-05-27 21:49 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22 17:14 [PATCH] block: Fix a deadlock related freezing zoned storage devices Bart Van Assche
2025-05-22 17:38 ` Jens Axboe
2025-05-22 18:32 ` Bart Van Assche
2025-05-23 2:10 ` Ming Lei
2025-05-23 6:06 ` Damien Le Moal
2025-05-23 5:53 ` Damien Le Moal
2025-05-23 8:10 ` Damien Le Moal
2025-05-23 8:20 ` Damien Le Moal
2025-05-23 8:22 ` Christoph Hellwig
2025-05-23 8:20 ` Christoph Hellwig
2025-05-23 11:00 ` Damien Le Moal
2025-05-26 7:41 ` Damien Le Moal
2025-05-27 21:49 ` Bart Van Assche
2025-05-23 12:36 ` Jens Axboe
2025-05-23 3:10 ` Christoph Hellwig
2025-05-23 16:08 ` Bart Van Assche
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox