linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] Submit split bios in order
@ 2025-05-09  0:42 Bart Van Assche
  2025-05-09  1:15 ` Damien Le Moal
  0 siblings, 1 reply; 3+ messages in thread
From: Bart Van Assche @ 2025-05-09  0:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Christoph Hellwig, Damien Le Moal

If a bio is split, the bio fragments are added in reverse LBA order into
the plug list. This triggers write errors with zoned storage and
sequential writes. Fix this by preserving the LBA order when inserting in
the plug list.

This patch has been posted as an RFC because this patch changes the
complexity of inserting in the plug list from O(1) into O(n).

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 796baeccd37b..e1311264a337 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1386,6 +1386,30 @@ static inline unsigned short blk_plug_max_rq_count(struct blk_plug *plug)
 	return BLK_MAX_REQUEST_COUNT;
 }
 
+/*
+ * If a bio is split, the bio fragments are submitted in opposite order. Hence
+ * this function that inserts in LBA order in the plug list.
+ */
+static inline void rq_list_insert_sorted(struct rq_list *rl, struct request *rq)
+{
+	sector_t rq_pos = rq->bio->bi_iter.bi_sector;
+	struct request *next, *prev;
+
+	for (prev = NULL, next = rl->head; next;
+	     prev = next, next = next->rq_next)
+		if (next->q == rq->q && rq_pos < next->bio->bi_iter.bi_sector)
+			break;
+
+	if (!prev) {
+		rq_list_add_head(rl, rq);
+	} else if (!next) {
+		rq_list_add_tail(rl, rq);
+	} else {
+		prev->rq_next = rq;
+		rq->rq_next = next;
+	}
+}
+
 static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
 {
 	struct request *last = rq_list_peek(&plug->mq_list);
@@ -1408,7 +1432,7 @@ static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
 	 */
 	if (!plug->has_elevator && (rq->rq_flags & RQF_SCHED_TAGS))
 		plug->has_elevator = true;
-	rq_list_add_tail(&plug->mq_list, rq);
+	rq_list_insert_sorted(&plug->mq_list, rq);
 	plug->rq_count++;
 }
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC] Submit split bios in order
  2025-05-09  0:42 [PATCH RFC] Submit split bios in order Bart Van Assche
@ 2025-05-09  1:15 ` Damien Le Moal
  2025-05-09 13:46   ` Bart Van Assche
  0 siblings, 1 reply; 3+ messages in thread
From: Damien Le Moal @ 2025-05-09  1:15 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe; +Cc: linux-block, Christoph Hellwig

On 5/9/25 9:42 AM, Bart Van Assche wrote:
> If a bio is split, the bio fragments are added in reverse LBA order into
> the plug list. This triggers write errors with zoned storage and
> sequential writes. Fix this by preserving the LBA order when inserting in
> the plug list.

Preserving the order of the fragment would be a good thing for all block
devices. But what I fail to see here is how this lack of ordering affects zoned
block device writes since zone write plugging will split large BIOs when a
write BIO goes through zone write plugging. That happens before we have a
request, so we should never endup needing to split a zone write request.

> 
> This patch has been posted as an RFC because this patch changes the
> complexity of inserting in the plug list from O(1) into O(n).
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 796baeccd37b..e1311264a337 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1386,6 +1386,30 @@ static inline unsigned short blk_plug_max_rq_count(struct blk_plug *plug)
>  	return BLK_MAX_REQUEST_COUNT;
>  }
>  
> +/*
> + * If a bio is split, the bio fragments are submitted in opposite order. Hence
> + * this function that inserts in LBA order in the plug list.
> + */
> +static inline void rq_list_insert_sorted(struct rq_list *rl, struct request *rq)
> +{
> +	sector_t rq_pos = rq->bio->bi_iter.bi_sector;
> +	struct request *next, *prev;
> +
> +	for (prev = NULL, next = rl->head; next;
> +	     prev = next, next = next->rq_next)
> +		if (next->q == rq->q && rq_pos < next->bio->bi_iter.bi_sector)
> +			break;
> +
> +	if (!prev) {
> +		rq_list_add_head(rl, rq);
> +	} else if (!next) {
> +		rq_list_add_tail(rl, rq);
> +	} else {
> +		prev->rq_next = rq;
> +		rq->rq_next = next;
> +	}
> +}
> +
>  static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
>  {
>  	struct request *last = rq_list_peek(&plug->mq_list);
> @@ -1408,7 +1432,7 @@ static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
>  	 */
>  	if (!plug->has_elevator && (rq->rq_flags & RQF_SCHED_TAGS))
>  		plug->has_elevator = true;
> -	rq_list_add_tail(&plug->mq_list, rq);
> +	rq_list_insert_sorted(&plug->mq_list, rq);
>  	plug->rq_count++;
>  }
>  


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC] Submit split bios in order
  2025-05-09  1:15 ` Damien Le Moal
@ 2025-05-09 13:46   ` Bart Van Assche
  0 siblings, 0 replies; 3+ messages in thread
From: Bart Van Assche @ 2025-05-09 13:46 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe; +Cc: linux-block, Christoph Hellwig

On 5/8/25 6:15 PM, Damien Le Moal wrote:
> On 5/9/25 9:42 AM, Bart Van Assche wrote:
>> If a bio is split, the bio fragments are added in reverse LBA order into
>> the plug list. This triggers write errors with zoned storage and
>> sequential writes. Fix this by preserving the LBA order when inserting in
>> the plug list.
> 
> Preserving the order of the fragment would be a good thing for all block
> devices. But what I fail to see here is how this lack of ordering affects zoned
> block device writes since zone write plugging will split large BIOs when a
> write BIO goes through zone write plugging. That happens before we have a
> request, so we should never endup needing to split a zone write request.
Hi Damien,

 From a comment in source file block/blk-zoned.c:
"We always receive BIOs after they are split and ready to be issued."

Does this mean that splitting bios happens before these are passed to 
the code in block/blk-zoned.c?

Note: as you probably know Christoph's patch series "don't reorder
requests passed to ->queue_rqs" affects bio splitting. The bio splitting
code pushes bio fragments onto the plug list in decreasing LBA order.
Before that patch series the requests pushed onto the plug list last
were processed first and hence bio fragments are submitted in LBA order.
Since that patch series the requests pushed onto the plug list first are
processed first and hence bio fragments are submitted in decreasing LBA
order.

See also 
https://lore.kernel.org/linux-block/20241113152050.157179-1-hch@lst.de/.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-05-09 13:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-09  0:42 [PATCH RFC] Submit split bios in order Bart Van Assche
2025-05-09  1:15 ` Damien Le Moal
2025-05-09 13:46   ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).