[PATCH 0/2] Fix the mq-deadline async

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Fix the mq-deadline async_depth implementation
@ 2024-04-03 21:23 Bart Van Assche
  2024-04-03 21:23 ` [PATCH 1/2] block: Call .limit_depth() after .hctx has been set Bart Van Assche
  2024-04-03 21:23 ` [PATCH 2/2] block/mq-deadline: Fix the tag reservation code Bart Van Assche
  0 siblings, 2 replies; 7+ messages in thread
From: Bart Van Assche @ 2024-04-03 21:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche

Hi Jens,

The mq-deadline 'async_depth' sysfs attribute doesn't behave as intended. This
patch series fixes the implementation of that attribute.

Please consider this patch series for the next merge window.

Thanks,

Bart.

Bart Van Assche (2):
  block: Call .limit_depth() after .hctx has been set
  block/mq-deadline: Fix the tag reservation code

 block/blk-mq.c      |  8 +++++---
 block/mq-deadline.c | 20 +++++++++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] block: Call .limit_depth() after .hctx has been set
  2024-04-03 21:23 [PATCH 0/2] Fix the mq-deadline async_depth implementation Bart Van Assche
@ 2024-04-03 21:23 ` Bart Van Assche
  2024-04-05  8:46   ` Christoph Hellwig
  2024-04-03 21:23 ` [PATCH 2/2] block/mq-deadline: Fix the tag reservation code Bart Van Assche
  1 sibling, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2024-04-03 21:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Damien Le Moal,
	Zhiguo Niu

Call .limit_depth() after data->hctx has been set such that data->hctx can
be used in .limit_depth() implementations.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Zhiguo Niu <zhiguo.niu@unisoc.com>
Fixes: 07757588e507 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 34060d885c5a..bcaa722896a0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -434,6 +434,7 @@ __blk_mq_alloc_requests_batch(struct blk_mq_alloc_data *data)
 
 static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 {
+	void (*limit_depth)(blk_opf_t, struct blk_mq_alloc_data *) = NULL;
 	struct request_queue *q = data->q;
 	u64 alloc_time_ns = 0;
 	struct request *rq;
@@ -459,13 +460,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 		 */
 		if ((data->cmd_flags & REQ_OP_MASK) != REQ_OP_FLUSH &&
 		    !blk_op_is_passthrough(data->cmd_flags)) {
-			struct elevator_mq_ops *ops = &q->elevator->type->ops;
+			limit_depth = q->elevator->type->ops.limit_depth;
 
 			WARN_ON_ONCE(data->flags & BLK_MQ_REQ_RESERVED);
 
 			data->rq_flags |= RQF_USE_SCHED;
-			if (ops->limit_depth)
-				ops->limit_depth(data->cmd_flags, data);
 		}
 	}
 
@@ -478,6 +477,9 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 	if (data->flags & BLK_MQ_REQ_RESERVED)
 		data->rq_flags |= RQF_RESV;
 
+	if (limit_depth)
+		limit_depth(data->cmd_flags, data);
+
 	/*
 	 * Try batched alloc if we want more than 1 tag.
 	 */

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] block/mq-deadline: Fix the tag reservation code
  2024-04-03 21:23 [PATCH 0/2] Fix the mq-deadline async_depth implementation Bart Van Assche
  2024-04-03 21:23 ` [PATCH 1/2] block: Call .limit_depth() after .hctx has been set Bart Van Assche
@ 2024-04-03 21:23 ` Bart Van Assche
  1 sibling, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2024-04-03 21:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Damien Le Moal,
	Zhiguo Niu

The current tag reservation code is based on a misunderstanding of the
meaning of data->shallow_depth. Fix the tag reservation code as follows:
* By default, do not reserve any tags for synchronous requests because
  for certain use cases reserving tags reduces performance. See also
  Harshit Mogalapalli, [bug-report] Performance regression with fio
  sequential-write on a multipath setup, 2024-03-07
  (https://lore.kernel.org/linux-block/5ce2ae5d-61e2-4ede-ad55-551112602401@oracle.com/)
* Reduce min_shallow_depth to one because min_shallow_depth must be less
  than or equal any shallow_depth value.
* Scale dd->async_depth from the range [1, nr_requests] to [1,
  bits_per_sbitmap_word].

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Zhiguo Niu <zhiguo.niu@unisoc.com>
Fixes: 07757588e507 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 02a916ba62ee..78a8aa204c15 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -621,6 +621,20 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	return rq;
 }
 
+/*
+ * 'depth' is a number in the range 1..INT_MAX representing a number of
+ * requests. Scale it with a factor (1 << bt->sb.shift) / q->nr_requests since
+ * 1..(1 << bt->sb.shift) is the range expected by sbitmap_get_shallow().
+ * Values larger than q->nr_requests have the same effect as q->nr_requests.
+ */
+static int dd_to_word_depth(struct blk_mq_hw_ctx *hctx, unsigned int qdepth)
+{
+	struct sbitmap_queue *bt = &hctx->sched_tags->bitmap_tags;
+	const unsigned int nrr = hctx->queue->nr_requests;
+
+	return ((qdepth << bt->sb.shift) + nrr - 1) / nrr;
+}
+
 /*
  * Called by __blk_mq_alloc_request(). The shallow_depth value set by this
  * function is used by __blk_mq_get_tag().
@@ -637,7 +651,7 @@ static void dd_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
 	 * Throttle asynchronous requests and writes such that these requests
 	 * do not block the allocation of synchronous requests.
 	 */
-	data->shallow_depth = dd->async_depth;
+	data->shallow_depth = dd_to_word_depth(data->hctx, dd->async_depth);
 }
 
 /* Called by blk_mq_update_nr_requests(). */
@@ -647,9 +661,9 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
 	struct deadline_data *dd = q->elevator->elevator_data;
 	struct blk_mq_tags *tags = hctx->sched_tags;
 
-	dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
+	dd->async_depth = q->nr_requests;
 
-	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
+	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, 1);
 }
 
 /* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] block: Call .limit_depth() after .hctx has been set
  2024-04-03 21:23 ` [PATCH 1/2] block: Call .limit_depth() after .hctx has been set Bart Van Assche
@ 2024-04-05  8:46   ` Christoph Hellwig
  2024-04-05 20:05     ` Bart Van Assche
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2024-04-05  8:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
	Zhiguo Niu

Calling limit_depth with the hctx set make sense, but the way it's done
looks odd.  Why not something like this?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b8dbfed8b28be1..88886fd93b1a9c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -448,6 +448,10 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 	if (data->cmd_flags & REQ_NOWAIT)
 		data->flags |= BLK_MQ_REQ_NOWAIT;
 
+retry:
+	data->ctx = blk_mq_get_ctx(q);
+	data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
+
 	if (q->elevator) {
 		/*
 		 * All requests use scheduler tags when an I/O scheduler is
@@ -469,13 +473,9 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 			if (ops->limit_depth)
 				ops->limit_depth(data->cmd_flags, data);
 		}
-	}
-
-retry:
-	data->ctx = blk_mq_get_ctx(q);
-	data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
-	if (!(data->rq_flags & RQF_SCHED_TAGS))
+	} else {
 		blk_mq_tag_busy(data->hctx);
+	}
 
 	if (data->flags & BLK_MQ_REQ_RESERVED)
 		data->rq_flags |= RQF_RESV;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] block: Call .limit_depth() after .hctx has been set
  2024-04-05  8:46   ` Christoph Hellwig
@ 2024-04-05 20:05     ` Bart Van Assche
  2024-05-08  2:28       ` 答复: " 牛志国 (Zhiguo Niu)
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2024-04-05 20:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Damien Le Moal, Zhiguo Niu

On 4/5/24 01:46, Christoph Hellwig wrote:
> Calling limit_depth with the hctx set make sense, but the way it's done
> looks odd.  Why not something like this?
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b8dbfed8b28be1..88886fd93b1a9c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -448,6 +448,10 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>   	if (data->cmd_flags & REQ_NOWAIT)
>   		data->flags |= BLK_MQ_REQ_NOWAIT;
>   
> +retry:
> +	data->ctx = blk_mq_get_ctx(q);
> +	data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
> +
>   	if (q->elevator) {
>   		/*
>   		 * All requests use scheduler tags when an I/O scheduler is
> @@ -469,13 +473,9 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>   			if (ops->limit_depth)
>   				ops->limit_depth(data->cmd_flags, data);
>   		}
> -	}
> -
> -retry:
> -	data->ctx = blk_mq_get_ctx(q);
> -	data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
> -	if (!(data->rq_flags & RQF_SCHED_TAGS))
> +	} else {
>   		blk_mq_tag_busy(data->hctx);
> +	}
>   
>   	if (data->flags & BLK_MQ_REQ_RESERVED)
>   		data->rq_flags |= RQF_RESV;

Hi Christoph,

The above patch looks good to me and I'm fine with replacing patch 1/2
with the above patch. Do you want me to add your Signed-off-by to the
above patch?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* 答复: [PATCH 1/2] block: Call .limit_depth() after .hctx has been set
  2024-04-05 20:05     ` Bart Van Assche
@ 2024-05-08  2:28       ` 牛志国 (Zhiguo Niu)
  2024-05-08  4:52         ` Bart Van Assche
  0 siblings, 1 reply; 7+ messages in thread
From: 牛志国 (Zhiguo Niu) @ 2024-05-08  2:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, Christoph Hellwig, linux-block@vger.kernel.org,
	Damien Le Moal, 王科 (Ke Wang)

Hi  Jens Axboe,

Excuse me, do you have any comments about this patch set from Bart Van Assche, We meet this  "warning issue" about async_depth, more detail info is in:
https://lore.kernel.org/all/CAHJ8P3KEOC_DXQmZK3u7PHgZFmWpMVzPa6pgkOgpyoH7wgT5nw@mail.gmail.com/
please help consider it and it can solve the above warning issue.
Thanks.

Attach The following commit msg from Bart Van Assche https://lore.kernel.org/all/20240403212354.523925-1-bvanassche@acm.org/#R

-----邮件原件-----
发件人: Bart Van Assche <bvanassche@acm.org>
发送时间: 2024年4月6日 4:05
收件人: Christoph Hellwig <hch@lst.de>
抄送: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel.org; Damien Le Moal <dlemoal@kernel.org>; 牛志国 (Zhiguo Niu) <Zhiguo.Niu@unisoc.com>
主题: Re: [PATCH 1/2] block: Call .limit_depth() after .hctx has been set


注意: 这封邮件来自于外部。除非你确定邮件内容安全，否则不要点击任何链接和附件。
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.



On 4/5/24 01:46, Christoph Hellwig wrote:
> Calling limit_depth with the hctx set make sense, but the way it's 
> done looks odd.  Why not something like this?
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c index 
> b8dbfed8b28be1..88886fd93b1a9c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -448,6 +448,10 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>       if (data->cmd_flags & REQ_NOWAIT)
>               data->flags |= BLK_MQ_REQ_NOWAIT;
>
> +retry:
> +     data->ctx = blk_mq_get_ctx(q);
> +     data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
> +
>       if (q->elevator) {
>               /*
>                * All requests use scheduler tags when an I/O scheduler 
> is @@ -469,13 +473,9 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>                       if (ops->limit_depth)
>                               ops->limit_depth(data->cmd_flags, data);
>               }
> -     }
> -
> -retry:
> -     data->ctx = blk_mq_get_ctx(q);
> -     data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
> -     if (!(data->rq_flags & RQF_SCHED_TAGS))
> +     } else {
>               blk_mq_tag_busy(data->hctx);
> +     }
>
>       if (data->flags & BLK_MQ_REQ_RESERVED)
>               data->rq_flags |= RQF_RESV;

Hi Christoph,

The above patch looks good to me and I'm fine with replacing patch 1/2 with the above patch. Do you want me to add your Signed-off-by to the above patch?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 答复: [PATCH 1/2] block: Call .limit_depth() after .hctx has been set
  2024-05-08  2:28       ` 答复: " 牛志国 (Zhiguo Niu)
@ 2024-05-08  4:52         ` Bart Van Assche
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2024-05-08  4:52 UTC (permalink / raw)
  To: 牛志国 (Zhiguo Niu), Jens Axboe
  Cc: Christoph Hellwig, linux-block@vger.kernel.org, Damien Le Moal,
	王科 (Ke Wang)

On 5/7/24 19:28, 牛志国 (Zhiguo Niu) wrote:
> Excuse me, do you have any comments about this patch set from Bart
> Van Assche, We meet this  "warning issue" about async_depth, more
> detail info is in: 
> https://lore.kernel.org/all/CAHJ8P3KEOC_DXQmZK3u7PHgZFmWpMVzPa6pgkOgpyoH7wgT5nw@mail.gmail.com/
 > please help consider it and it can solve the above warning issue.

Since Christoph posted a comment I think it's up to me to address his
comment. I plan to repost this patch series next week. I'm currently OoO.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-05-08  4:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-03 21:23 [PATCH 0/2] Fix the mq-deadline async_depth implementation Bart Van Assche
2024-04-03 21:23 ` [PATCH 1/2] block: Call .limit_depth() after .hctx has been set Bart Van Assche
2024-04-05  8:46   ` Christoph Hellwig
2024-04-05 20:05     ` Bart Van Assche
2024-05-08  2:28       ` 答复: " 牛志国 (Zhiguo Niu)
2024-05-08  4:52         ` Bart Van Assche
2024-04-03 21:23 ` [PATCH 2/2] block/mq-deadline: Fix the tag reservation code Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).