* [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order
[not found] <20250514202937.2058598-1-bvanassche@acm.org>
@ 2025-05-14 20:29 ` Bart Van Assche
2025-05-15 7:19 ` Niklas Cassel
2025-05-16 4:47 ` Christoph Hellwig
2025-05-14 20:29 ` [PATCH 2/2] block: Fix a deadlock related freezing zoned storage devices Bart Van Assche
1 sibling, 2 replies; 14+ messages in thread
From: Bart Van Assche @ 2025-05-14 20:29 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Christoph Hellwig, Bart Van Assche, Damien Le Moal,
Yu Kuai, Ming Lei, stable
submit_bio() may be called recursively. To limit the stack depth, recursive
calls result in bios being added to a list (current->bio_list).
__submit_bio_noacct() sets up that list and maintains two lists with
requests:
* bio_list_on_stack[0] is the list with bios submitted by recursive
submit_bio() calls from inside the latest __submit_bio() call.
* bio_list_on_stack[1] is the list with bios submitted by recursive
submit_bio() calls from inside previous __submit_bio() calls.
Make sure that bios are submitted to lower devices in the order these
have been submitted by submit_bio() by adding new bios at the end of the
list instead of at the front.
This patch fixes unaligned write errors that I encountered with F2FS
submitting zoned writes to a dm driver stacked on top of a zoned UFS
device.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Yu Kuai <yukuai1@huaweicloud.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
block/blk-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index b862c66018f2..4b728fa1c138 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -704,9 +704,9 @@ static void __submit_bio_noacct(struct bio *bio)
/*
* Now assemble so we handle the lowest level first.
*/
+ bio_list_on_stack[0] = bio_list_on_stack[1];
bio_list_merge(&bio_list_on_stack[0], &lower);
bio_list_merge(&bio_list_on_stack[0], &same);
- bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
} while ((bio = bio_list_pop(&bio_list_on_stack[0])));
current->bio_list = NULL;
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/2] block: Fix a deadlock related freezing zoned storage devices
[not found] <20250514202937.2058598-1-bvanassche@acm.org>
2025-05-14 20:29 ` [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order Bart Van Assche
@ 2025-05-14 20:29 ` Bart Van Assche
2025-05-16 4:51 ` Christoph Hellwig
1 sibling, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2025-05-14 20:29 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Christoph Hellwig, Bart Van Assche, Damien Le Moal,
Yu Kuai, Ming Lei, stable
blk_mq_freeze_queue() never terminates if one or more bios are on the plug
list and if the block device driver defines a .submit_bio() method.
This is the case for device mapper drivers. The deadlock happens because
blk_mq_freeze_queue() waits for q_usage_counter to drop to zero, because
a queue reference is held by bios on the plug list and because the
__bio_queue_enter() call in __submit_bio() waits for the queue to be
unfrozen.
This patch fixes the following deadlock:
Workqueue: dm-51_zwplugs blk_zone_wplug_bio_work
Call trace:
__schedule+0xb08/0x1160
schedule+0x48/0xc8
__bio_queue_enter+0xcc/0x1d0
__submit_bio+0x100/0x1b0
submit_bio_noacct_nocheck+0x230/0x49c
blk_zone_wplug_bio_work+0x168/0x250
process_one_work+0x26c/0x65c
worker_thread+0x33c/0x498
kthread+0x110/0x134
ret_from_fork+0x10/0x20
Call trace:
__switch_to+0x230/0x410
__schedule+0xb08/0x1160
schedule+0x48/0xc8
blk_mq_freeze_queue_wait+0x78/0xb8
blk_mq_freeze_queue+0x90/0xa4
queue_attr_store+0x7c/0xf0
sysfs_kf_write+0x98/0xc8
kernfs_fop_write_iter+0x12c/0x1d4
vfs_write+0x340/0x3ac
ksys_write+0x78/0xe8
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Yu Kuai <yukuai1@huaweicloud.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: stable@vger.kernel.org
Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
block/blk-core.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 4b728fa1c138..e961896a8717 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -621,6 +621,13 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
return BLK_STS_OK;
}
+/*
+ * Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been
+ * set because this causes blk_mq_freeze_queue() to deadlock if
+ * blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios
+ * on the plug list is not necessary since a q_usage_counter reference is held
+ * while a bio is on the plug list.
+ */
static void __submit_bio(struct bio *bio)
{
/* If plug is not used, add new plug here to cache nsecs time. */
@@ -633,7 +640,8 @@ static void __submit_bio(struct bio *bio)
if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
blk_mq_submit_bio(bio);
- } else if (likely(bio_queue_enter(bio) == 0)) {
+ } else if (likely(bio_zone_write_plugging(bio) ||
+ bio_queue_enter(bio) == 0)) {
struct gendisk *disk = bio->bi_bdev->bd_disk;
if ((bio->bi_opf & REQ_POLLED) &&
@@ -643,7 +651,8 @@ static void __submit_bio(struct bio *bio)
} else {
disk->fops->submit_bio(bio);
}
- blk_queue_exit(disk->queue);
+ if (!bio_zone_write_plugging(bio))
+ blk_queue_exit(disk->queue);
}
blk_finish_plug(&plug);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order
2025-05-14 20:29 ` [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order Bart Van Assche
@ 2025-05-15 7:19 ` Niklas Cassel
2025-05-15 15:58 ` Bart Van Assche
2025-05-16 4:47 ` Christoph Hellwig
1 sibling, 1 reply; 14+ messages in thread
From: Niklas Cassel @ 2025-05-15 7:19 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
Yu Kuai, Ming Lei, stable
Hello Bart,
On Wed, May 14, 2025 at 01:29:36PM -0700, Bart Van Assche wrote:
> submit_bio() may be called recursively. To limit the stack depth, recursive
> calls result in bios being added to a list (current->bio_list).
> __submit_bio_noacct() sets up that list and maintains two lists with
> requests:
> * bio_list_on_stack[0] is the list with bios submitted by recursive
> submit_bio() calls from inside the latest __submit_bio() call.
> * bio_list_on_stack[1] is the list with bios submitted by recursive
> submit_bio() calls from inside previous __submit_bio() calls.
>
> Make sure that bios are submitted to lower devices in the order these
> have been submitted by submit_bio() by adding new bios at the end of the
> list instead of at the front.
>
> This patch fixes unaligned write errors that I encountered with F2FS
> submitting zoned writes to a dm driver stacked on top of a zoned UFS
> device.
>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Yu Kuai <yukuai1@huaweicloud.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: stable@vger.kernel.org
Here you add stable to Cc, but you don't specify either
1) a minimum version e.g.
stable@vger.kernel.org # v6.8+
or
2) a Fixes tag.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order
2025-05-15 7:19 ` Niklas Cassel
@ 2025-05-15 15:58 ` Bart Van Assche
0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2025-05-15 15:58 UTC (permalink / raw)
To: Niklas Cassel, NeilBrown
Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
Yu Kuai, Ming Lei, stable
On 5/15/25 12:19 AM, Niklas Cassel wrote:
> Hello Bart,
>
> On Wed, May 14, 2025 at 01:29:36PM -0700, Bart Van Assche wrote:
>> submit_bio() may be called recursively. To limit the stack depth, recursive
>> calls result in bios being added to a list (current->bio_list).
>> __submit_bio_noacct() sets up that list and maintains two lists with
>> requests:
>> * bio_list_on_stack[0] is the list with bios submitted by recursive
>> submit_bio() calls from inside the latest __submit_bio() call.
>> * bio_list_on_stack[1] is the list with bios submitted by recursive
>> submit_bio() calls from inside previous __submit_bio() calls.
>>
>> Make sure that bios are submitted to lower devices in the order these
>> have been submitted by submit_bio() by adding new bios at the end of the
>> list instead of at the front.
>>
>> This patch fixes unaligned write errors that I encountered with F2FS
>> submitting zoned writes to a dm driver stacked on top of a zoned UFS
>> device.
>>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Damien Le Moal <dlemoal@kernel.org>
>> Cc: Yu Kuai <yukuai1@huaweicloud.com>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Cc: stable@vger.kernel.org
>
> Here you add stable to Cc, but you don't specify either
> 1) a minimum version e.g.
> stable@vger.kernel.org # v6.8+
> or
> 2) a Fixes tag.
Hi Niklas,
Let's add the following to this patch:
Fixes: 79bd99596b73 ("blk: improve order of bio handling in
generic_make_request()")
Neil, since that commit was authored by you: the commit message is
elaborate but the names of the drivers that needed that commit have
not been mentioned. Which drivers needed that change? Additionally,
can you please help with reviewing this patch:
https://lore.kernel.org/linux-block/20250514202937.2058598-2-bvanassche@acm.org/
Thanks,
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order
2025-05-14 20:29 ` [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order Bart Van Assche
2025-05-15 7:19 ` Niklas Cassel
@ 2025-05-16 4:47 ` Christoph Hellwig
2025-05-19 22:12 ` Bart Van Assche
1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2025-05-16 4:47 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
Yu Kuai, Ming Lei, stable
On Wed, May 14, 2025 at 01:29:36PM -0700, Bart Van Assche wrote:
> /*
> * Now assemble so we handle the lowest level first.
> */
> + bio_list_on_stack[0] = bio_list_on_stack[1];
> bio_list_merge(&bio_list_on_stack[0], &lower);
> bio_list_merge(&bio_list_on_stack[0], &same);
> - bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
If I read this code correctly, this means that we no keep processing bios
that already were on bio_list_on_stack[0] and the beginning of the loop
in the next iteration(s) instead of finishing off the ones created by
this iteration, which could lead to exhaustion of resources like mempool.
Note that this is a big if - the code is really hard to read, it should
really grow a data structure for the on-stack list that has named members
for both lists instead of the array magic.. :(
I'm still trying to understand your problem given that it wasn't
described much. What I could think it is that bio_split_to_limits through
bio_submit_split first re-submits the remainder bio using
submit_bio_noacct, which the above should place on the same list and then
later the stacking block drivers also submit the bio split off at the
beginning, unlike blk-mq drivers that process it directly. But given
that this resubmission should be on the lower list above I don't
see how it causes problems.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] block: Fix a deadlock related freezing zoned storage devices
2025-05-14 20:29 ` [PATCH 2/2] block: Fix a deadlock related freezing zoned storage devices Bart Van Assche
@ 2025-05-16 4:51 ` Christoph Hellwig
2025-05-19 22:22 ` Bart Van Assche
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2025-05-16 4:51 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
Yu Kuai, Ming Lei, stable
On Wed, May 14, 2025 at 01:29:37PM -0700, Bart Van Assche wrote:
> +/*
> + * Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been
> + * set because this causes blk_mq_freeze_queue() to deadlock if
> + * blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios
> + * on the plug list is not necessary since a q_usage_counter reference is held
> + * while a bio is on the plug list.
> + */
How do we end up with BIO_ZONE_WRITE_PLUGGING set here? If that flag
was set in a stacking driver we need to clear it before resubmitting
the bio I think.
Can you provide a null_blk based reproducer for your testcase to see
what happens here?
Either way we can't just simply skip taking q_usage_count.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order
2025-05-16 4:47 ` Christoph Hellwig
@ 2025-05-19 22:12 ` Bart Van Assche
2025-05-20 13:56 ` Christoph Hellwig
0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2025-05-19 22:12 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, linux-block, Damien Le Moal, Yu Kuai, Ming Lei,
stable
On 5/15/25 9:47 PM, Christoph Hellwig wrote:
> On Wed, May 14, 2025 at 01:29:36PM -0700, Bart Van Assche wrote:
>> /*
>> * Now assemble so we handle the lowest level first.
>> */
>> + bio_list_on_stack[0] = bio_list_on_stack[1];
>> bio_list_merge(&bio_list_on_stack[0], &lower);
>> bio_list_merge(&bio_list_on_stack[0], &same);
>> - bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
>
> If I read this code correctly, this means that we no keep processing bios
> that already were on bio_list_on_stack[0] and the beginning of the loop
> in the next iteration(s) instead of finishing off the ones created by
> this iteration, which could lead to exhaustion of resources like mempool.
>
> Note that this is a big if - the code is really hard to read, it should
> really grow a data structure for the on-stack list that has named members
> for both lists instead of the array magic.. :(
>
> I'm still trying to understand your problem given that it wasn't
> described much. What I could think it is that bio_split_to_limits through
> bio_submit_split first re-submits the remainder bio using
> submit_bio_noacct, which the above should place on the same list and then
> later the stacking block drivers also submit the bio split off at the
> beginning, unlike blk-mq drivers that process it directly. But given
> that this resubmission should be on the lower list above I don't
> see how it causes problems.
Agreed that this should be root-caused. To my own frustration I do not
yet have a full root-cause analysis. What I have done to obtain more
information is to make the kernel issue a warning the first time a bio
is added out-of-order at the end of the bio list. The following output
appeared (sde is the zoned block device at the bottom of the stack):
[ 71.312492][ T1] bio_list_insert_sorted: inserting in the middle
of a bio list
[ 71.313483][ T1] print_bio_list(sde) sector 0x1b7520 size 0x10
[ 71.313034][ T1] bio_list_insert_sorted(sde) sector 0x1b7120 size
0x400
[ ... ]
[ 71.368117][ T163] WARNING: CPU: 4 PID: 163 at block/blk-core.c:725
bio_list_insert_sorted+0x144/0x18c
[ 71.386664][ T163] Workqueue: writeback wb_workfn (flush-253:49)
[ 71.387110][ T163] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT
-SSBS BTYPE=--)
[ 71.393772][ T163] Call trace:
[ 71.393988][ T163] bio_list_insert_sorted+0x144/0x18c
[ 71.394338][ T163] submit_bio_noacct_nocheck+0xd8/0x4f4
[ 71.394696][ T163] submit_bio_noacct+0x32c/0x50c
[ 71.395017][ T163] bio_submit_split+0xf0/0x1f8
[ 71.395349][ T163] bio_split_rw+0xdc/0xf0
[ 71.395631][ T163] blk_mq_submit_bio+0x320/0x940
[ 71.395970][ T163] __submit_bio+0xa4/0x1c4
[ 71.396260][ T163] submit_bio_noacct_nocheck+0x1c0/0x4f4
[ 71.396623][ T163] submit_bio_noacct+0x32c/0x50c
[ 71.396942][ T163] submit_bio+0x17c/0x198
[ 71.397222][ T163] f2fs_submit_write_bio+0x94/0x154
[ 71.397604][ T163] __submit_merged_bio+0x80/0x204
[ 71.397933][ T163] __submit_merged_write_cond+0xd0/0x1fc
[ 71.398297][ T163] f2fs_submit_merged_write+0x24/0x30
[ 71.398646][ T163] f2fs_sync_node_pages+0x5ec/0x64c
[ 71.398999][ T163] f2fs_write_node_pages+0xe8/0x1dc
[ 71.399338][ T163] do_writepages+0xe4/0x2f8
[ 71.399673][ T163] __writeback_single_inode+0x84/0x6e4
[ 71.400036][ T163] writeback_sb_inodes+0x2cc/0x5c0
[ 71.400369][ T163] wb_writeback+0x134/0x550
[ 71.400662][ T163] wb_workfn+0x154/0x588
[ 71.400937][ T163] process_one_work+0x26c/0x65c
[ 71.401271][ T163] worker_thread+0x33c/0x498
[ 71.401575][ T163] kthread+0x110/0x134
[ 71.401844][ T163] ret_from_fork+0x10/0x20
I think that the above call stack indicates the following:
f2fs_submit_write_bio() submits a bio to a dm driver, that the dm driver
submitted a bio for the lower driver (SCSI core), that the bio for the
lower driver is split by bio_split_rw(), and that the second half of the
split bio triggers the above out-of-order warning.
This new patch should address the concerns brought up in your latest
email:
diff --git a/block/blk-core.c b/block/blk-core.c
index 411f005e6b1f..aa270588272a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -649,6 +649,26 @@ static void __submit_bio(struct bio *bio)
blk_finish_plug(&plug);
}
+/*
+ * Insert a bio in LBA order. If no bio for the same bdev with a higher
LBA is
+ * found, append at the end.
+ */
+static void bio_list_insert_sorted(struct bio_list *bl, struct bio *bio)
+{
+ struct block_device *bdev = bio->bi_bdev;
+ struct bio **pprev = &bl->head, *next;
+ sector_t sector = bio->bi_iter.bi_sector;
+
+ for (next = *pprev; next; pprev = &next->bi_next, next = next->bi_next)
+ if (next->bi_bdev == bdev && sector < next->bi_iter.bi_sector)
+ break;
+
+ bio->bi_next = next;
+ *pprev = bio;
+ if (!next)
+ bl->tail = bio;
+}
+
/*
* The loop in this function may be a bit non-obvious, and so deserves
some
* explanation:
@@ -706,7 +726,8 @@ static void __submit_bio_noacct(struct bio *bio)
*/
bio_list_merge(&bio_list_on_stack[0], &lower);
bio_list_merge(&bio_list_on_stack[0], &same);
- bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
+ while ((bio = bio_list_pop(&bio_list_on_stack[1])))
+ bio_list_insert_sorted(&bio_list_on_stack[0], bio);
} while ((bio = bio_list_pop(&bio_list_on_stack[0])));
current->bio_list = NULL;
@@ -746,7 +767,7 @@ void submit_bio_noacct_nocheck(struct bio *bio)
* it is active, and then process them after it returned.
*/
if (current->bio_list)
- bio_list_add(¤t->bio_list[0], bio);
+ bio_list_insert_sorted(¤t->bio_list[0], bio);
else if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO))
__submit_bio_noacct_mq(bio);
else
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] block: Fix a deadlock related freezing zoned storage devices
2025-05-16 4:51 ` Christoph Hellwig
@ 2025-05-19 22:22 ` Bart Van Assche
2025-05-20 13:57 ` Christoph Hellwig
0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2025-05-19 22:22 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, linux-block, Damien Le Moal, Yu Kuai, Ming Lei,
stable
On 5/15/25 9:51 PM, Christoph Hellwig wrote:
> On Wed, May 14, 2025 at 01:29:37PM -0700, Bart Van Assche wrote:
>> +/*
>> + * Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been
>> + * set because this causes blk_mq_freeze_queue() to deadlock if
>> + * blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios
>> + * on the plug list is not necessary since a q_usage_counter reference is held
>> + * while a bio is on the plug list.
>> + */
>
> How do we end up with BIO_ZONE_WRITE_PLUGGING set here? If that flag
> was set in a stacking driver we need to clear it before resubmitting
> the bio I think.
Hmm ... my understanding is that BIO_ZONE_WRITE_PLUGGING, if set, must
remain set until the bio has completed. Here is an example of code in
the bio completion path that tests this flag:
static inline void blk_zone_bio_endio(struct bio *bio)
{
/*
* For write BIOs to zoned devices, signal the completion of the BIO so
* that the next write BIO can be submitted by zone write plugging.
*/
if (bio_zone_write_plugging(bio))
blk_zone_write_plug_bio_endio(bio);
}
The bio_zone_write_plugging() definition is as follows:
static inline bool bio_zone_write_plugging(struct bio *bio)
{
return bio_flagged(bio, BIO_ZONE_WRITE_PLUGGING);
}
> Can you provide a null_blk based reproducer for your testcase to see
> what happens here?
My attempts so far to build a reproduce for the blktests framework have
been unsuccessful. This test script runs fine in the VM that I use for
kernel testing:
https://github.com/bvanassche/blktests/blob/master/tests/zbd/013
> Either way we can't just simply skip taking q_usage_count.
Why not? If BIO_ZONE_WRITE_PLUGGING is set, that guarantees that a
queue reference count is held and will remain held across the entire
disk->fops->submit_bio() invocation, isn't it? From
blk_zone_wplug_bio_work(), below the submit_bio_noacct_nocheck(bio)
call:
if (bdev_test_flag(bdev, BD_HAS_SUBMIT_BIO))
blk_queue_exit(bdev->bd_disk->queue);
Thanks,
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order
2025-05-19 22:12 ` Bart Van Assche
@ 2025-05-20 13:56 ` Christoph Hellwig
2025-05-20 18:09 ` Bart Van Assche
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2025-05-20 13:56 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, Jens Axboe, linux-block, Damien Le Moal,
Yu Kuai, Ming Lei, stable
On Mon, May 19, 2025 at 03:12:11PM -0700, Bart Van Assche wrote:
> This new patch should address the concerns brought up in your latest
> email:
No, we should never need to do a sort, as mentioned we need to fix
how stackable drivers split the I/O. Or maybe even get them out of
all the splits that aren't required.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] block: Fix a deadlock related freezing zoned storage devices
2025-05-19 22:22 ` Bart Van Assche
@ 2025-05-20 13:57 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2025-05-20 13:57 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, Jens Axboe, linux-block, Damien Le Moal,
Yu Kuai, Ming Lei, stable
On Mon, May 19, 2025 at 03:22:42PM -0700, Bart Van Assche wrote:
> Hmm ... my understanding is that BIO_ZONE_WRITE_PLUGGING, if set, must
> remain set until the bio has completed. Here is an example of code in
> the bio completion path that tests this flag:
True. Well, we'll need some other flag that to tell lower levels to
ignore the flag because it is owned by the higher level.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order
2025-05-20 13:56 ` Christoph Hellwig
@ 2025-05-20 18:09 ` Bart Van Assche
2025-05-21 5:53 ` Christoph Hellwig
0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2025-05-20 18:09 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, linux-block, Damien Le Moal, Yu Kuai, Ming Lei,
stable
On 5/20/25 6:56 AM, Christoph Hellwig wrote:
> On Mon, May 19, 2025 at 03:12:11PM -0700, Bart Van Assche wrote:
>> This new patch should address the concerns brought up in your latest
>> email:
>
> No, we should never need to do a sort, as mentioned we need to fix
> how stackable drivers split the I/O. Or maybe even get them out of
> all the splits that aren't required.
If the sequential write bios are split by the device mapper, sorting
bios in the block layer is not necessary. Christoph and Damien, do you
agree to replace the bio sorting code in my previous email with the
patch below?
Thanks,
Bart.
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3d419fd2be57..c41ab294987e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1792,9 +1792,12 @@ static inline bool dm_zone_bio_needs_split(struct
mapped_device *md,
{
/*
* For mapped device that need zone append emulation, we must
- * split any large BIO that straddles zone boundaries.
+ * split any large BIO that straddles zone boundaries. Additionally,
+ * split sequential writes to prevent that splitting lower in the stack
+ * causes reordering.
*/
- return dm_emulate_zone_append(md) && bio_straddles_zones(bio) &&
+ return ((dm_emulate_zone_append(md) && bio_straddles_zones(bio)) ||
+ bio_op(bio) == REQ_OP_WRITE) &&
!bio_flagged(bio, BIO_ZONE_WRITE_PLUGGING);
}
static inline bool dm_zone_plug_bio(struct mapped_device *md, struct
bio *bio)
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order
2025-05-20 18:09 ` Bart Van Assche
@ 2025-05-21 5:53 ` Christoph Hellwig
2025-05-21 21:18 ` Bart Van Assche
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2025-05-21 5:53 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, Jens Axboe, linux-block, Damien Le Moal,
Yu Kuai, Ming Lei, stable
On Tue, May 20, 2025 at 11:09:15AM -0700, Bart Van Assche wrote:
> If the sequential write bios are split by the device mapper, sorting
> bios in the block layer is not necessary. Christoph and Damien, do you
> agree to replace the bio sorting code in my previous email with the
> patch below?
No. First please create a reproducer for your issue using null_blk
or scsi_debug, otherwise we have no way to understand what is going
on here, and will regress in the future.
Second should very much be able to fix the splitting in dm to place
the bios in the right order. As mentioned before I have a theory
of how to do it, but we really need a proper reproducer to test this
and then to write it up to blktests first.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order
2025-05-21 5:53 ` Christoph Hellwig
@ 2025-05-21 21:18 ` Bart Van Assche
2025-05-23 4:21 ` Christoph Hellwig
0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2025-05-21 21:18 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, linux-block, Damien Le Moal, Yu Kuai, Ming Lei,
stable
On 5/20/25 10:53 PM, Christoph Hellwig wrote:
> On Tue, May 20, 2025 at 11:09:15AM -0700, Bart Van Assche wrote:
>> If the sequential write bios are split by the device mapper, sorting
>> bios in the block layer is not necessary. Christoph and Damien, do you
>> agree to replace the bio sorting code in my previous email with the
>> patch below?
>
> No. First please create a reproducer for your issue using null_blk
> or scsi_debug, otherwise we have no way to understand what is going
> on here, and will regress in the future.
>
> Second should very much be able to fix the splitting in dm to place
> the bios in the right order. As mentioned before I have a theory
> of how to do it, but we really need a proper reproducer to test this
> and then to write it up to blktests first.
Hi Christoph,
The following pull request includes a test that triggers the deadlock
fixed by patch 2/2 reliably:
https://github.com/osandov/blktests/pull/171
I do not yet have a reproducer for the bio reordering but I'm still
working on this.
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order
2025-05-21 21:18 ` Bart Van Assche
@ 2025-05-23 4:21 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2025-05-23 4:21 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, Jens Axboe, linux-block, Damien Le Moal,
Yu Kuai, Ming Lei, stable
On Wed, May 21, 2025 at 02:18:12PM -0700, Bart Van Assche wrote:
> The following pull request includes a test that triggers the deadlock
> fixed by patch 2/2 reliably:
>
> https://github.com/osandov/blktests/pull/171
Can you post the pathc please? I've not found any obvious way
to download a patch from the above website.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-05-23 4:21 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250514202937.2058598-1-bvanassche@acm.org>
2025-05-14 20:29 ` [PATCH 1/2] block: Make __submit_bio_noacct() preserve the bio submission order Bart Van Assche
2025-05-15 7:19 ` Niklas Cassel
2025-05-15 15:58 ` Bart Van Assche
2025-05-16 4:47 ` Christoph Hellwig
2025-05-19 22:12 ` Bart Van Assche
2025-05-20 13:56 ` Christoph Hellwig
2025-05-20 18:09 ` Bart Van Assche
2025-05-21 5:53 ` Christoph Hellwig
2025-05-21 21:18 ` Bart Van Assche
2025-05-23 4:21 ` Christoph Hellwig
2025-05-14 20:29 ` [PATCH 2/2] block: Fix a deadlock related freezing zoned storage devices Bart Van Assche
2025-05-16 4:51 ` Christoph Hellwig
2025-05-19 22:22 ` Bart Van Assche
2025-05-20 13:57 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox