linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] blk-mq: introduce new queue attribute async_depth
@ 2025-09-30  7:11 Yu Kuai
  2025-09-30  7:11 ` [PATCH 1/7] block: convert nr_requests to unsigned int Yu Kuai
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Yu Kuai @ 2025-09-30  7:11 UTC (permalink / raw)
  To: axboe, bvanassche, ming.lei, nilay
  Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Background and motivation:

At first, we test a performance regression from 5.10 to 6.6 in
downstream kernel(described in patch 5), the regression is related to
async_depth in mq-dealine.

While trying to fix this regression, Bart suggests add a new attribute
to request_queue, and I think this is a good idea because all elevators
have similar logical, however only mq-deadline allow user to configure
async_depth.

patch 1-3 add new queue attribute async_depth;
patch 4 convert kyber to use request_queue->async_depth;
patch 5 covnert mq-dedaline to use request_queue->async_depth, also the
performance regression will be fixed;
patch 6 convert bfq to use request_queue->async_depth;

Yu Kuai (7):
  block: convert nr_requests to unsigned int
  blk-mq-sched: unify elevators checking for async requests
  blk-mq: add a new queue sysfs attribute async_depth
  kyber: covert to use request_queue->async_depth
  mq-deadline: covert to use request_queue->async_depth
  block, bfq: convert to use request_queue->async_depth
  blk-mq: add documentation for new queue attribute async_dpeth

 Documentation/ABI/stable/sysfs-block | 10 ++++++
 block/bfq-iosched.c                  | 45 +++++++++++---------------
 block/blk-core.c                     |  1 +
 block/blk-mq-sched.h                 |  5 +++
 block/blk-mq.c                       |  4 +++
 block/blk-sysfs.c                    | 47 ++++++++++++++++++++++++++++
 block/elevator.c                     |  1 +
 block/kyber-iosched.c                | 36 ++-------------------
 block/mq-deadline.c                  | 42 ++-----------------------
 include/linux/blkdev.h               |  3 +-
 10 files changed, 94 insertions(+), 100 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] block: convert nr_requests to unsigned int
  2025-09-30  7:11 [PATCH 0/7] blk-mq: introduce new queue attribute async_depth Yu Kuai
@ 2025-09-30  7:11 ` Yu Kuai
  2025-10-02 15:13   ` Nilay Shroff
  2025-09-30  7:11 ` [PATCH 2/7] blk-mq-sched: unify elevators checking for async requests Yu Kuai
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2025-09-30  7:11 UTC (permalink / raw)
  To: axboe, bvanassche, ming.lei, nilay
  Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

This value represents the number of requests for elevator tags, or drivers
tags if elevator is none. The max value for elevator tags is 2048, and
in drivers at most 16 bits is used for tag.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 include/linux/blkdev.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 066e5309bd45..02c006fb94c5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -541,7 +541,7 @@ struct request_queue {
 	/*
 	 * queue settings
 	 */
-	unsigned long		nr_requests;	/* Max # of requests */
+	unsigned int		nr_requests;	/* Max # of requests */
 
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
 	struct blk_crypto_profile *crypto_profile;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/7] blk-mq-sched: unify elevators checking for async requests
  2025-09-30  7:11 [PATCH 0/7] blk-mq: introduce new queue attribute async_depth Yu Kuai
  2025-09-30  7:11 ` [PATCH 1/7] block: convert nr_requests to unsigned int Yu Kuai
@ 2025-09-30  7:11 ` Yu Kuai
  2025-09-30  7:11 ` [PATCH 3/7] blk-mq: add a new queue sysfs attribute async_depth Yu Kuai
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Yu Kuai @ 2025-09-30  7:11 UTC (permalink / raw)
  To: axboe, bvanassche, ming.lei, nilay
  Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

bfq and mq-deadline consider sync writes as async requests and only
resver tags for sync reads by async_depth, however, kyber doesn't
consider sync writes as async requests for now.

Consider the case there are lots of dirty pages, and user use fsync to
flush dirty pages. In this case sched_tags can be exhausted by sync writes
and sync reads can stuck waiting for tag. Hence let kyber follow what
mq-deadline and bfq did, and unify async requests checking for all
elevators.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bfq-iosched.c   | 2 +-
 block/blk-mq-sched.h  | 5 +++++
 block/kyber-iosched.c | 2 +-
 block/mq-deadline.c   | 2 +-
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 4ffbe4383dd2..704900675949 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -697,7 +697,7 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
 	unsigned int limit, act_idx;
 
 	/* Sync reads have full depth available */
-	if (op_is_sync(opf) && !op_is_write(opf))
+	if (blk_mq_sched_sync_request(opf))
 		limit = data->q->nr_requests;
 	else
 		limit = bfqd->async_depths[!!bfqd->wr_busy_queues][op_is_sync(opf)];
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 8e21a6b1415d..ae747f9053c7 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -103,4 +103,9 @@ static inline void blk_mq_set_min_shallow_depth(struct request_queue *q,
 						depth);
 }
 
+static inline bool blk_mq_sched_sync_request(blk_opf_t opf)
+{
+	return op_is_sync(opf) && !op_is_write(opf);
+}
+
 #endif
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 18efd6ef2a2b..cf243a457175 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -544,7 +544,7 @@ static void kyber_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
 	 * We use the scheduler tags as per-hardware queue queueing tokens.
 	 * Async requests can be limited at this stage.
 	 */
-	if (!op_is_sync(opf)) {
+	if (!blk_mq_sched_sync_request(opf)) {
 		struct kyber_queue_data *kqd = data->q->elevator->elevator_data;
 
 		data->shallow_depth = kqd->async_depth;
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 3e741d33142d..592dd853f6e5 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -492,7 +492,7 @@ static void dd_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
 	struct deadline_data *dd = data->q->elevator->elevator_data;
 
 	/* Do not throttle synchronous reads. */
-	if (op_is_sync(opf) && !op_is_write(opf))
+	if (blk_mq_sched_sync_request(opf))
 		return;
 
 	/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] blk-mq: add a new queue sysfs attribute async_depth
  2025-09-30  7:11 [PATCH 0/7] blk-mq: introduce new queue attribute async_depth Yu Kuai
  2025-09-30  7:11 ` [PATCH 1/7] block: convert nr_requests to unsigned int Yu Kuai
  2025-09-30  7:11 ` [PATCH 2/7] blk-mq-sched: unify elevators checking for async requests Yu Kuai
@ 2025-09-30  7:11 ` Yu Kuai
  2025-10-02 15:10   ` Nilay Shroff
  2025-09-30  7:11 ` [PATCH 4/7] kyber: covert to use request_queue->async_depth Yu Kuai
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2025-09-30  7:11 UTC (permalink / raw)
  To: axboe, bvanassche, ming.lei, nilay
  Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Add a new field async_depth to request_queue and related APIs, this is
currently not used, following patches will convert elevators to use
this instead of internal async_depth.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-core.c       |  1 +
 block/blk-mq.c         |  4 ++++
 block/blk-sysfs.c      | 47 ++++++++++++++++++++++++++++++++++++++++++
 block/elevator.c       |  1 +
 include/linux/blkdev.h |  1 +
 5 files changed, 54 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index dd39ff651095..76df70cfc103 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -463,6 +463,7 @@ struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id)
 	fs_reclaim_release(GFP_KERNEL);
 
 	q->nr_requests = BLKDEV_DEFAULT_RQ;
+	q->async_depth = BLKDEV_DEFAULT_RQ;
 
 	return q;
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 09f579414161..260e54fa48f0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -529,6 +529,8 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 			data->rq_flags |= RQF_USE_SCHED;
 			if (ops->limit_depth)
 				ops->limit_depth(data->cmd_flags, data);
+			else if (!blk_mq_sched_sync_request(data->cmd_flags))
+				data->shallow_depth = q->async_depth;
 		}
 	} else {
 		blk_mq_tag_busy(data->hctx);
@@ -4605,6 +4607,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	spin_lock_init(&q->requeue_lock);
 
 	q->nr_requests = set->queue_depth;
+	q->async_depth = set->queue_depth;
 
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
 	blk_mq_map_swqueue(q);
@@ -4972,6 +4975,7 @@ struct elevator_tags *blk_mq_update_nr_requests(struct request_queue *q,
 	}
 
 	q->nr_requests = nr;
+	q->async_depth = nr;
 	if (q->elevator && q->elevator->type->ops.depth_updated)
 		q->elevator->type->ops.depth_updated(q);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 76c47fe9b8d6..9553cc022c7e 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -127,6 +127,51 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
 	return ret;
 }
 
+static ssize_t queue_async_depth_show(struct gendisk *disk, char *page)
+{
+	ssize_t ret;
+
+	mutex_lock(&disk->queue->elevator_lock);
+	ret = queue_var_show(disk->queue->async_depth, page);
+	mutex_unlock(&disk->queue->elevator_lock);
+	return ret;
+}
+
+static ssize_t
+queue_async_depth_store(struct gendisk *disk, const char *page, size_t count)
+{
+	struct request_queue *q = disk->queue;
+	unsigned int memflags;
+	unsigned long nr;
+	int ret;
+
+	if (!queue_is_mq(q))
+		return -EINVAL;
+
+	ret = queue_var_store(&nr, page, count);
+	if (ret < 0)
+		return ret;
+
+	if (nr == 0)
+		return -EINVAL;
+
+	memflags = blk_mq_freeze_queue(q);
+	mutex_lock(&q->elevator_lock);
+
+	if (q->elevator) {
+		q->async_depth = min(q->nr_requests, nr);
+		if (q->elevator->type->ops.depth_updated)
+			q->elevator->type->ops.depth_updated(q);
+	} else {
+		ret = -EINVAL;
+	}
+
+	mutex_unlock(&q->elevator_lock);
+	blk_mq_unfreeze_queue(q, memflags);
+
+	return ret;
+}
+
 static ssize_t queue_ra_show(struct gendisk *disk, char *page)
 {
 	ssize_t ret;
@@ -542,6 +587,7 @@ static struct queue_sysfs_entry _prefix##_entry = {	\
 }
 
 QUEUE_RW_ENTRY(queue_requests, "nr_requests");
+QUEUE_RW_ENTRY(queue_async_depth, "async_depth");
 QUEUE_RW_ENTRY(queue_ra, "read_ahead_kb");
 QUEUE_LIM_RW_ENTRY(queue_max_sectors, "max_sectors_kb");
 QUEUE_LIM_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb");
@@ -764,6 +810,7 @@ static struct attribute *blk_mq_queue_attrs[] = {
 	 */
 	&elv_iosched_entry.attr,
 	&queue_requests_entry.attr,
+	&queue_async_depth_entry.attr,
 #ifdef CONFIG_BLK_WBT
 	&queue_wb_lat_entry.attr,
 #endif
diff --git a/block/elevator.c b/block/elevator.c
index e2ebfbf107b3..8f510cb881ba 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -601,6 +601,7 @@ static int elevator_switch(struct request_queue *q, struct elv_change_ctx *ctx)
 		blk_queue_flag_clear(QUEUE_FLAG_SQ_SCHED, q);
 		q->elevator = NULL;
 		q->nr_requests = q->tag_set->queue_depth;
+		q->async_depth = q->tag_set->queue_depth;
 	}
 	blk_add_trace_msg(q, "elv switch: %s", ctx->name);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02c006fb94c5..1d470ac71c64 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -542,6 +542,7 @@ struct request_queue {
 	 * queue settings
 	 */
 	unsigned int		nr_requests;	/* Max # of requests */
+	unsigned int		async_depth;	/* Max # of async requests */
 
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
 	struct blk_crypto_profile *crypto_profile;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] kyber: covert to use request_queue->async_depth
  2025-09-30  7:11 [PATCH 0/7] blk-mq: introduce new queue attribute async_depth Yu Kuai
                   ` (2 preceding siblings ...)
  2025-09-30  7:11 ` [PATCH 3/7] blk-mq: add a new queue sysfs attribute async_depth Yu Kuai
@ 2025-09-30  7:11 ` Yu Kuai
  2025-09-30  7:11 ` [PATCH 5/7] mq-deadline: " Yu Kuai
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Yu Kuai @ 2025-09-30  7:11 UTC (permalink / raw)
  To: axboe, bvanassche, ming.lei, nilay
  Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Instead of the internal async_depth, remove kqd->async_depth and related
helpers, also remove limit_depth() method that is useless now.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/kyber-iosched.c | 36 +++---------------------------------
 1 file changed, 3 insertions(+), 33 deletions(-)

diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index cf243a457175..8bb73e5833a0 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -47,9 +47,8 @@ enum {
 	 * asynchronous requests, we reserve 25% of requests for synchronous
 	 * operations.
 	 */
-	KYBER_ASYNC_PERCENT = 75,
+	KYBER_DEFAULT_ASYNC_PERCENT = 75,
 };
-
 /*
  * Maximum device-wide depth for each scheduling domain.
  *
@@ -157,9 +156,6 @@ struct kyber_queue_data {
 	 */
 	struct sbitmap_queue domain_tokens[KYBER_NUM_DOMAINS];
 
-	/* Number of allowed async requests. */
-	unsigned int async_depth;
-
 	struct kyber_cpu_latency __percpu *cpu_latency;
 
 	/* Timer for stats aggregation and adjusting domain tokens. */
@@ -401,10 +397,7 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
 
 static void kyber_depth_updated(struct request_queue *q)
 {
-	struct kyber_queue_data *kqd = q->elevator->elevator_data;
-
-	kqd->async_depth = q->nr_requests * KYBER_ASYNC_PERCENT / 100U;
-	blk_mq_set_min_shallow_depth(q, kqd->async_depth);
+	blk_mq_set_min_shallow_depth(q, q->async_depth);
 }
 
 static int kyber_init_sched(struct request_queue *q, struct elevator_queue *eq)
@@ -421,6 +414,7 @@ static int kyber_init_sched(struct request_queue *q, struct elevator_queue *eq)
 
 	eq->elevator_data = kqd;
 	q->elevator = eq;
+	q->async_depth = q->nr_requests * KYBER_DEFAULT_ASYNC_PERCENT / 100;
 	kyber_depth_updated(q);
 
 	return 0;
@@ -538,19 +532,6 @@ static void rq_clear_domain_token(struct kyber_queue_data *kqd,
 	}
 }
 
-static void kyber_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
-{
-	/*
-	 * We use the scheduler tags as per-hardware queue queueing tokens.
-	 * Async requests can be limited at this stage.
-	 */
-	if (!blk_mq_sched_sync_request(opf)) {
-		struct kyber_queue_data *kqd = data->q->elevator->elevator_data;
-
-		data->shallow_depth = kqd->async_depth;
-	}
-}
-
 static bool kyber_bio_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs)
 {
@@ -944,15 +925,6 @@ KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_DISCARD, discard)
 KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_OTHER, other)
 #undef KYBER_DEBUGFS_DOMAIN_ATTRS
 
-static int kyber_async_depth_show(void *data, struct seq_file *m)
-{
-	struct request_queue *q = data;
-	struct kyber_queue_data *kqd = q->elevator->elevator_data;
-
-	seq_printf(m, "%u\n", kqd->async_depth);
-	return 0;
-}
-
 static int kyber_cur_domain_show(void *data, struct seq_file *m)
 {
 	struct blk_mq_hw_ctx *hctx = data;
@@ -978,7 +950,6 @@ static const struct blk_mq_debugfs_attr kyber_queue_debugfs_attrs[] = {
 	KYBER_QUEUE_DOMAIN_ATTRS(write),
 	KYBER_QUEUE_DOMAIN_ATTRS(discard),
 	KYBER_QUEUE_DOMAIN_ATTRS(other),
-	{"async_depth", 0400, kyber_async_depth_show},
 	{},
 };
 #undef KYBER_QUEUE_DOMAIN_ATTRS
@@ -1004,7 +975,6 @@ static struct elevator_type kyber_sched = {
 		.exit_sched = kyber_exit_sched,
 		.init_hctx = kyber_init_hctx,
 		.exit_hctx = kyber_exit_hctx,
-		.limit_depth = kyber_limit_depth,
 		.bio_merge = kyber_bio_merge,
 		.prepare_request = kyber_prepare_request,
 		.insert_requests = kyber_insert_requests,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] mq-deadline: covert to use request_queue->async_depth
  2025-09-30  7:11 [PATCH 0/7] blk-mq: introduce new queue attribute async_depth Yu Kuai
                   ` (3 preceding siblings ...)
  2025-09-30  7:11 ` [PATCH 4/7] kyber: covert to use request_queue->async_depth Yu Kuai
@ 2025-09-30  7:11 ` Yu Kuai
  2025-10-02 18:56   ` Jeff Moyer
  2025-09-30  7:11 ` [PATCH 6/7] block, bfq: convert " Yu Kuai
  2025-09-30  7:11 ` [PATCH 7/7] blk-mq: add documentation for new queue attribute async_dpeth Yu Kuai
  6 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2025-09-30  7:11 UTC (permalink / raw)
  To: axboe, bvanassche, ming.lei, nilay
  Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

In downstream kernel, we test with mq-deadline with many fio workloads, and
we found a performance regression after commit 39823b47bbd4
("block/mq-deadline: Fix the tag reservation code") with following test:

[global]
rw=randread
direct=1
ramp_time=1
ioengine=libaio
iodepth=1024
numjobs=24
bs=1024k
group_reporting=1
runtime=60

[job1]
filename=/dev/sda

Root cause is that mq-deadline now support configuring async_depth,
although the default value is nr_request, however the minimal value is
1, hence min_shallow_depth is set to 1, causing wake_batch to be 1. For
consequence, sbitmap_queue will be waken up after each IO instead of
8 IO.

In this test case, sda is HDD and max_sectors is 128k, hence each
submitted 1M io will be splited into 8 sequential 128k requests, however
due to there are 24 jobs and total tags are exhausted, the 8 requests are
unlikely to be dispatched sequentially, and changing wake_batch to 1
will make this much worse, accounting blktrace D stage, the percentage
of sequential io is decreased from 8% to 0.8%.

Fix this problem by converting to request_queue->async_depth, where
min_shallow_depth is set each time async_depth is updated.

Fixes: 39823b47bbd4 ("block/mq-deadline: Fix the tag reservation code")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/mq-deadline.c | 42 +++---------------------------------------
 1 file changed, 3 insertions(+), 39 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 592dd853f6e5..74585c67e47b 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -98,7 +98,6 @@ struct deadline_data {
 	int fifo_batch;
 	int writes_starved;
 	int front_merges;
-	u32 async_depth;
 	int prio_aging_expire;
 
 	spinlock_t lock;
@@ -483,32 +482,10 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	return rq;
 }
 
-/*
- * Called by __blk_mq_alloc_request(). The shallow_depth value set by this
- * function is used by __blk_mq_get_tag().
- */
-static void dd_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
-{
-	struct deadline_data *dd = data->q->elevator->elevator_data;
-
-	/* Do not throttle synchronous reads. */
-	if (blk_mq_sched_sync_request(opf))
-		return;
-
-	/*
-	 * Throttle asynchronous requests and writes such that these requests
-	 * do not block the allocation of synchronous requests.
-	 */
-	data->shallow_depth = dd->async_depth;
-}
-
-/* Called by blk_mq_update_nr_requests(). */
+/* Called by blk_mq_init_sched() and blk_mq_update_nr_requests(). */
 static void dd_depth_updated(struct request_queue *q)
 {
-	struct deadline_data *dd = q->elevator->elevator_data;
-
-	dd->async_depth = q->nr_requests;
-	blk_mq_set_min_shallow_depth(q, 1);
+	blk_mq_set_min_shallow_depth(q, q->async_depth);
 }
 
 static void dd_exit_sched(struct elevator_queue *e)
@@ -573,6 +550,7 @@ static int dd_init_sched(struct request_queue *q, struct elevator_queue *eq)
 	blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
 
 	q->elevator = eq;
+	q->async_depth = q->nr_requests;
 	dd_depth_updated(q);
 	return 0;
 }
@@ -758,7 +736,6 @@ SHOW_JIFFIES(deadline_write_expire_show, dd->fifo_expire[DD_WRITE]);
 SHOW_JIFFIES(deadline_prio_aging_expire_show, dd->prio_aging_expire);
 SHOW_INT(deadline_writes_starved_show, dd->writes_starved);
 SHOW_INT(deadline_front_merges_show, dd->front_merges);
-SHOW_INT(deadline_async_depth_show, dd->async_depth);
 SHOW_INT(deadline_fifo_batch_show, dd->fifo_batch);
 #undef SHOW_INT
 #undef SHOW_JIFFIES
@@ -788,7 +765,6 @@ STORE_JIFFIES(deadline_write_expire_store, &dd->fifo_expire[DD_WRITE], 0, INT_MA
 STORE_JIFFIES(deadline_prio_aging_expire_store, &dd->prio_aging_expire, 0, INT_MAX);
 STORE_INT(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX);
 STORE_INT(deadline_front_merges_store, &dd->front_merges, 0, 1);
-STORE_INT(deadline_async_depth_store, &dd->async_depth, 1, INT_MAX);
 STORE_INT(deadline_fifo_batch_store, &dd->fifo_batch, 0, INT_MAX);
 #undef STORE_FUNCTION
 #undef STORE_INT
@@ -802,7 +778,6 @@ static const struct elv_fs_entry deadline_attrs[] = {
 	DD_ATTR(write_expire),
 	DD_ATTR(writes_starved),
 	DD_ATTR(front_merges),
-	DD_ATTR(async_depth),
 	DD_ATTR(fifo_batch),
 	DD_ATTR(prio_aging_expire),
 	__ATTR_NULL
@@ -889,15 +864,6 @@ static int deadline_starved_show(void *data, struct seq_file *m)
 	return 0;
 }
 
-static int dd_async_depth_show(void *data, struct seq_file *m)
-{
-	struct request_queue *q = data;
-	struct deadline_data *dd = q->elevator->elevator_data;
-
-	seq_printf(m, "%u\n", dd->async_depth);
-	return 0;
-}
-
 static int dd_queued_show(void *data, struct seq_file *m)
 {
 	struct request_queue *q = data;
@@ -1007,7 +973,6 @@ static const struct blk_mq_debugfs_attr deadline_queue_debugfs_attrs[] = {
 	DEADLINE_NEXT_RQ_ATTR(write2),
 	{"batching", 0400, deadline_batching_show},
 	{"starved", 0400, deadline_starved_show},
-	{"async_depth", 0400, dd_async_depth_show},
 	{"dispatch0", 0400, .seq_ops = &deadline_dispatch0_seq_ops},
 	{"dispatch1", 0400, .seq_ops = &deadline_dispatch1_seq_ops},
 	{"dispatch2", 0400, .seq_ops = &deadline_dispatch2_seq_ops},
@@ -1021,7 +986,6 @@ static const struct blk_mq_debugfs_attr deadline_queue_debugfs_attrs[] = {
 static struct elevator_type mq_deadline = {
 	.ops = {
 		.depth_updated		= dd_depth_updated,
-		.limit_depth		= dd_limit_depth,
 		.insert_requests	= dd_insert_requests,
 		.dispatch_request	= dd_dispatch_request,
 		.prepare_request	= dd_prepare_request,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] block, bfq: convert to use request_queue->async_depth
  2025-09-30  7:11 [PATCH 0/7] blk-mq: introduce new queue attribute async_depth Yu Kuai
                   ` (4 preceding siblings ...)
  2025-09-30  7:11 ` [PATCH 5/7] mq-deadline: " Yu Kuai
@ 2025-09-30  7:11 ` Yu Kuai
  2025-09-30  7:11 ` [PATCH 7/7] blk-mq: add documentation for new queue attribute async_dpeth Yu Kuai
  6 siblings, 0 replies; 16+ messages in thread
From: Yu Kuai @ 2025-09-30  7:11 UTC (permalink / raw)
  To: axboe, bvanassche, ming.lei, nilay
  Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

The default limits is unchanged, and user can configure async_depth now.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bfq-iosched.c | 43 +++++++++++++++++--------------------------
 1 file changed, 17 insertions(+), 26 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 704900675949..3350c9b22eb4 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -7112,39 +7112,29 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
 static void bfq_depth_updated(struct request_queue *q)
 {
 	struct bfq_data *bfqd = q->elevator->elevator_data;
-	unsigned int nr_requests = q->nr_requests;
+	unsigned int async_depth = q->async_depth;
 
 	/*
-	 * In-word depths if no bfq_queue is being weight-raised:
-	 * leaving 25% of tags only for sync reads.
+	 * By default:
+	 *  - sync reads are not limited
+	 * If bfqq is not being weight-raised:
+	 *  - sync writes are limited to 75%(async depth default value)
+	 *  - async IO are limited to 50%
+	 * If bfqq is being weight-raised:
+	 *  - sync writes are limited to ~37%
+	 *  - async IO are limited to ~18
 	 *
-	 * In next formulas, right-shift the value
-	 * (1U<<bt->sb.shift), instead of computing directly
-	 * (1U<<(bt->sb.shift - something)), to be robust against
-	 * any possible value of bt->sb.shift, without having to
-	 * limit 'something'.
+	 * If request_queue->async_depth is updated by user, all limit are
+	 * updated relatively.
 	 */
-	/* no more than 50% of tags for async I/O */
-	bfqd->async_depths[0][0] = max(nr_requests >> 1, 1U);
-	/*
-	 * no more than 75% of tags for sync writes (25% extra tags
-	 * w.r.t. async I/O, to prevent async I/O from starving sync
-	 * writes)
-	 */
-	bfqd->async_depths[0][1] = max((nr_requests * 3) >> 2, 1U);
+	bfqd->async_depths[0][1] = async_depth;
+	bfqd->async_depths[0][0] = max(async_depth * 2 / 3, 1U);
+	bfqd->async_depths[1][1] = max(async_depth >> 1, 1U);
+	bfqd->async_depths[1][0] = max(async_depth >> 2, 1U);
 
 	/*
-	 * In-word depths in case some bfq_queue is being weight-
-	 * raised: leaving ~63% of tags for sync reads. This is the
-	 * highest percentage for which, in our tests, application
-	 * start-up times didn't suffer from any regression due to tag
-	 * shortage.
+	 * Due to cgroup qos, the allowed request for bfqq might be 1
 	 */
-	/* no more than ~18% of tags for async I/O */
-	bfqd->async_depths[1][0] = max((nr_requests * 3) >> 4, 1U);
-	/* no more than ~37% of tags for sync writes (~20% extra tags) */
-	bfqd->async_depths[1][1] = max((nr_requests * 6) >> 4, 1U);
-
 	blk_mq_set_min_shallow_depth(q, 1);
 }
 
@@ -7360,6 +7350,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_queue *eq)
 	blk_queue_flag_set(QUEUE_FLAG_DISABLE_WBT_DEF, q);
 	wbt_disable_default(q->disk);
 	blk_stat_enable_accounting(q);
+	q->async_depth = (q->nr_requests * 3) >> 2;
 
 	return 0;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] blk-mq: add documentation for new queue attribute async_dpeth
  2025-09-30  7:11 [PATCH 0/7] blk-mq: introduce new queue attribute async_depth Yu Kuai
                   ` (5 preceding siblings ...)
  2025-09-30  7:11 ` [PATCH 6/7] block, bfq: convert " Yu Kuai
@ 2025-09-30  7:11 ` Yu Kuai
  2025-10-02 15:12   ` Nilay Shroff
  6 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2025-09-30  7:11 UTC (permalink / raw)
  To: axboe, bvanassche, ming.lei, nilay
  Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Explain the attribute and the default value in different case.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 Documentation/ABI/stable/sysfs-block | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index 0ed10aeff86b..09b9b3db9a1f 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -609,6 +609,16 @@ Description:
 		enabled, and whether tags are shared.
 
 
+What:		/sys/block/<disk>/queue/async_depth
+Date:		August 2025
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] This controls how many async requests may be allocated in the
+		block layer. If elevator is none, then this value is nr_requests.
+		By default, this value is 75% of nr_requests for bfq and kyber,
+		abd nr_requests for mq-deadline.
+
+
 What:		/sys/block/<disk>/queue/nr_zones
 Date:		November 2018
 Contact:	Damien Le Moal <damien.lemoal@wdc.com>
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/7] blk-mq: add a new queue sysfs attribute async_depth
  2025-09-30  7:11 ` [PATCH 3/7] blk-mq: add a new queue sysfs attribute async_depth Yu Kuai
@ 2025-10-02 15:10   ` Nilay Shroff
  2025-10-06  1:57     ` Yu Kuai
  0 siblings, 1 reply; 16+ messages in thread
From: Nilay Shroff @ 2025-10-02 15:10 UTC (permalink / raw)
  To: Yu Kuai, axboe, bvanassche, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi



On 9/30/25 12:41 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Add a new field async_depth to request_queue and related APIs, this is
> currently not used, following patches will convert elevators to use
> this instead of internal async_depth.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  block/blk-core.c       |  1 +
>  block/blk-mq.c         |  4 ++++
>  block/blk-sysfs.c      | 47 ++++++++++++++++++++++++++++++++++++++++++
>  block/elevator.c       |  1 +
>  include/linux/blkdev.h |  1 +
>  5 files changed, 54 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index dd39ff651095..76df70cfc103 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -463,6 +463,7 @@ struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id)
>  	fs_reclaim_release(GFP_KERNEL);
>  
>  	q->nr_requests = BLKDEV_DEFAULT_RQ;
> +	q->async_depth = BLKDEV_DEFAULT_RQ;
>  
>  	return q;
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 09f579414161..260e54fa48f0 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -529,6 +529,8 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>  			data->rq_flags |= RQF_USE_SCHED;
>  			if (ops->limit_depth)
>  				ops->limit_depth(data->cmd_flags, data);
> +			else if (!blk_mq_sched_sync_request(data->cmd_flags))
> +				data->shallow_depth = q->async_depth;
>  		}

In the subsequent patches, I saw that ->limit_depth is still used for the
BFQ scheduler. Given that, it seems more consistent to also retain ->limit_depth
for the mq-deadline and Kyber schedulers, and set data->shallow_depth within their
respective ->limit_depth methods. If we take this approach, the additional 
blk_mq_sched_sync_request() check above becomes unnecessary.

So IMO:
- Keep ->limit_depth for all schedulers (bfq, mq-deadline, kyber).
- Remove the extra blk_mq_sched_sync_request() check from the core code.

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] blk-mq: add documentation for new queue attribute async_dpeth
  2025-09-30  7:11 ` [PATCH 7/7] blk-mq: add documentation for new queue attribute async_dpeth Yu Kuai
@ 2025-10-02 15:12   ` Nilay Shroff
  2025-10-06  2:00     ` Yu Kuai
  0 siblings, 1 reply; 16+ messages in thread
From: Nilay Shroff @ 2025-10-02 15:12 UTC (permalink / raw)
  To: Yu Kuai, axboe, bvanassche, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi



On 9/30/25 12:41 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Explain the attribute and the default value in different case.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  Documentation/ABI/stable/sysfs-block | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
> index 0ed10aeff86b..09b9b3db9a1f 100644
> --- a/Documentation/ABI/stable/sysfs-block
> +++ b/Documentation/ABI/stable/sysfs-block
> @@ -609,6 +609,16 @@ Description:
>  		enabled, and whether tags are shared.
>  
>  
> +What:		/sys/block/<disk>/queue/async_depth
> +Date:		August 2025
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RW] This controls how many async requests may be allocated in the
> +		block layer. If elevator is none, then this value is nr_requests.
> +		By default, this value is 75% of nr_requests for bfq and kyber,
> +		abd nr_requests for mq-deadline.
> +
Hmm, it seems we need to further elaborate above documentation, seeing the
way this new sysfs interface is playing out now for different I/O schedulers. 
I'd suggest rewriting this as follow (you may further modify/simplify it based
on your taste, if needed):

Description:
[RW] Controls how many asynchronous requests may be allocated in the
block layer. The value is always capped at nr_requests.

  When no elevator is active (none):
  - async_depth is always equal to nr_requests.

  For bfq scheduler:
  - By default, async_depth is set to 75% of nr_requests. 
    Internal limits are then derived from this value:
    * Sync writes: limited to async_depth (≈75% of nr_requests).
    * Async I/O: limited to ~2/3 of async_depth (≈50% of nr_requests).

    If a bfq_queue is weight-raised:
    * Sync writes: limited to ~1/2 of async_depth (≈37% of nr_requests).
    * Async I/O: limited to ~1/4 of async_depth (≈18% of nr_requests).

  - If the user writes a custom value to async_depth, BFQ will recompute
    these limits proportionally based on the new value.

  For Kyber:
  - By default async_depth is set to 75% of nr_requests.
  - If the user writes a custom value to async_depth, then it override the
    default and directly control the limit for writes and async I/O.

  For mq-deadline:
  - By default async_depth is set to nr_requests.
  - If the user writes a custom value to async_depth, then it override the
    default and directly control the limit for writes and async I/O.

Thanks,
--Nilay


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] block: convert nr_requests to unsigned int
  2025-09-30  7:11 ` [PATCH 1/7] block: convert nr_requests to unsigned int Yu Kuai
@ 2025-10-02 15:13   ` Nilay Shroff
  0 siblings, 0 replies; 16+ messages in thread
From: Nilay Shroff @ 2025-10-02 15:13 UTC (permalink / raw)
  To: Yu Kuai, axboe, bvanassche, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi



On 9/30/25 12:41 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> This value represents the number of requests for elevator tags, or drivers
> tags if elevator is none. The max value for elevator tags is 2048, and
> in drivers at most 16 bits is used for tag.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] mq-deadline: covert to use request_queue->async_depth
  2025-09-30  7:11 ` [PATCH 5/7] mq-deadline: " Yu Kuai
@ 2025-10-02 18:56   ` Jeff Moyer
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Moyer @ 2025-10-02 18:56 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, bvanassche, ming.lei, nilay, linux-block, linux-kernel,
	yukuai3, yi.zhang, yangerkun, johnny.chenyi

Yu Kuai <yukuai1@huaweicloud.com> writes:

> Fix this problem by converting to request_queue->async_depth, where
> min_shallow_depth is set each time async_depth is updated.

Removing the iosched/async_depth attribute may cause problems for
existing scripts or udev rules.  I'm not sure if we care, but the
changelog does not seem to address this.

-Jeff


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/7] blk-mq: add a new queue sysfs attribute async_depth
  2025-10-02 15:10   ` Nilay Shroff
@ 2025-10-06  1:57     ` Yu Kuai
  2025-10-09  0:48       ` Yu Kuai
  0 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2025-10-06  1:57 UTC (permalink / raw)
  To: Nilay Shroff, Yu Kuai, axboe, bvanassche, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

Hi,

在 2025/10/2 23:10, Nilay Shroff 写道:
>
> On 9/30/25 12:41 PM, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Add a new field async_depth to request_queue and related APIs, this is
>> currently not used, following patches will convert elevators to use
>> this instead of internal async_depth.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   block/blk-core.c       |  1 +
>>   block/blk-mq.c         |  4 ++++
>>   block/blk-sysfs.c      | 47 ++++++++++++++++++++++++++++++++++++++++++
>>   block/elevator.c       |  1 +
>>   include/linux/blkdev.h |  1 +
>>   5 files changed, 54 insertions(+)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index dd39ff651095..76df70cfc103 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -463,6 +463,7 @@ struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id)
>>   	fs_reclaim_release(GFP_KERNEL);
>>   
>>   	q->nr_requests = BLKDEV_DEFAULT_RQ;
>> +	q->async_depth = BLKDEV_DEFAULT_RQ;
>>   
>>   	return q;
>>   
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 09f579414161..260e54fa48f0 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -529,6 +529,8 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>>   			data->rq_flags |= RQF_USE_SCHED;
>>   			if (ops->limit_depth)
>>   				ops->limit_depth(data->cmd_flags, data);
>> +			else if (!blk_mq_sched_sync_request(data->cmd_flags))
>> +				data->shallow_depth = q->async_depth;
>>   		}
> In the subsequent patches, I saw that ->limit_depth is still used for the
> BFQ scheduler. Given that, it seems more consistent to also retain ->limit_depth
> for the mq-deadline and Kyber schedulers, and set data->shallow_depth within their
> respective ->limit_depth methods. If we take this approach, the additional
> blk_mq_sched_sync_request() check above becomes unnecessary.
>
> So IMO:
> - Keep ->limit_depth for all schedulers (bfq, mq-deadline, kyber).
> - Remove the extra blk_mq_sched_sync_request() check from the core code.

I was thinking to save a function call for deadline and kyber, however, I don't
have preference here and I can do this in the next version.

Thanks,
Kuai

> Thanks,
> --Nilay
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] blk-mq: add documentation for new queue attribute async_dpeth
  2025-10-02 15:12   ` Nilay Shroff
@ 2025-10-06  2:00     ` Yu Kuai
  0 siblings, 0 replies; 16+ messages in thread
From: Yu Kuai @ 2025-10-06  2:00 UTC (permalink / raw)
  To: Nilay Shroff, Yu Kuai, axboe, bvanassche, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

Hi,

在 2025/10/2 23:12, Nilay Shroff 写道:
>
> On 9/30/25 12:41 PM, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Explain the attribute and the default value in different case.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   Documentation/ABI/stable/sysfs-block | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
>> index 0ed10aeff86b..09b9b3db9a1f 100644
>> --- a/Documentation/ABI/stable/sysfs-block
>> +++ b/Documentation/ABI/stable/sysfs-block
>> @@ -609,6 +609,16 @@ Description:
>>   		enabled, and whether tags are shared.
>>   
>>   
>> +What:		/sys/block/<disk>/queue/async_depth
>> +Date:		August 2025
>> +Contact:	linux-block@vger.kernel.org
>> +Description:
>> +		[RW] This controls how many async requests may be allocated in the
>> +		block layer. If elevator is none, then this value is nr_requests.
>> +		By default, this value is 75% of nr_requests for bfq and kyber,
>> +		abd nr_requests for mq-deadline.
>> +
> Hmm, it seems we need to further elaborate above documentation, seeing the
> way this new sysfs interface is playing out now for different I/O schedulers.
> I'd suggest rewriting this as follow (you may further modify/simplify it based
> on your taste, if needed):
>
> Description:
> [RW] Controls how many asynchronous requests may be allocated in the
> block layer. The value is always capped at nr_requests.
>
>    When no elevator is active (none):
>    - async_depth is always equal to nr_requests.
>
>    For bfq scheduler:
>    - By default, async_depth is set to 75% of nr_requests.
>      Internal limits are then derived from this value:
>      * Sync writes: limited to async_depth (≈75% of nr_requests).
>      * Async I/O: limited to ~2/3 of async_depth (≈50% of nr_requests).
>
>      If a bfq_queue is weight-raised:
>      * Sync writes: limited to ~1/2 of async_depth (≈37% of nr_requests).
>      * Async I/O: limited to ~1/4 of async_depth (≈18% of nr_requests).
>
>    - If the user writes a custom value to async_depth, BFQ will recompute
>      these limits proportionally based on the new value.
>
>    For Kyber:
>    - By default async_depth is set to 75% of nr_requests.
>    - If the user writes a custom value to async_depth, then it override the
>      default and directly control the limit for writes and async I/O.
>
>    For mq-deadline:
>    - By default async_depth is set to nr_requests.
>    - If the user writes a custom value to async_depth, then it override the
>      default and directly control the limit for writes and async I/O.

This is great! I will use this in the next version.

Thanks
Kuai

> Thanks,
> --Nilay
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/7] blk-mq: add a new queue sysfs attribute async_depth
  2025-10-06  1:57     ` Yu Kuai
@ 2025-10-09  0:48       ` Yu Kuai
  2025-10-09  6:42         ` Yu Kuai
  0 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2025-10-09  0:48 UTC (permalink / raw)
  To: Yu Kuai, Nilay Shroff, Yu Kuai, axboe, bvanassche, ming.lei
  Cc: linux-block, linux-kernel, yi.zhang, yangerkun, johnny.chenyi,
	yukuai (C)

Hi,

在 2025/10/06 9:57, Yu Kuai 写道:
> Hi,
> 
> 在 2025/10/2 23:10, Nilay Shroff 写道:
>>
>> On 9/30/25 12:41 PM, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> Add a new field async_depth to request_queue and related APIs, this is
>>> currently not used, following patches will convert elevators to use
>>> this instead of internal async_depth.
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>> ---
>>>   block/blk-core.c       |  1 +
>>>   block/blk-mq.c         |  4 ++++
>>>   block/blk-sysfs.c      | 47 ++++++++++++++++++++++++++++++++++++++++++
>>>   block/elevator.c       |  1 +
>>>   include/linux/blkdev.h |  1 +
>>>   5 files changed, 54 insertions(+)
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index dd39ff651095..76df70cfc103 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -463,6 +463,7 @@ struct request_queue *blk_alloc_queue(struct 
>>> queue_limits *lim, int node_id)
>>>       fs_reclaim_release(GFP_KERNEL);
>>>       q->nr_requests = BLKDEV_DEFAULT_RQ;
>>> +    q->async_depth = BLKDEV_DEFAULT_RQ;
>>>       return q;
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 09f579414161..260e54fa48f0 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -529,6 +529,8 @@ static struct request 
>>> *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>>>               data->rq_flags |= RQF_USE_SCHED;
>>>               if (ops->limit_depth)
>>>                   ops->limit_depth(data->cmd_flags, data);
>>> +            else if (!blk_mq_sched_sync_request(data->cmd_flags))
>>> +                data->shallow_depth = q->async_depth;
>>>           }
>> In the subsequent patches, I saw that ->limit_depth is still used for the
>> BFQ scheduler. Given that, it seems more consistent to also retain 
>> ->limit_depth
>> for the mq-deadline and Kyber schedulers, and set data->shallow_depth 
>> within their
>> respective ->limit_depth methods. If we take this approach, the 
>> additional
>> blk_mq_sched_sync_request() check above becomes unnecessary.
>>
>> So IMO:
>> - Keep ->limit_depth for all schedulers (bfq, mq-deadline, kyber).
>> - Remove the extra blk_mq_sched_sync_request() check from the core code.
> 
> I was thinking to save a function call for deadline and kyber, however, 
> I don't
> have preference here and I can do this in the next version.

How abount following, I feel this is better while cooking the new
version. Consider only bfq have specail handling for async request.

static void blk_mq_sched_limit_async_depth(struct blk_mq_alloc_data *data)
{
	if (blk_mq_sched_sync_request(data->cmd_flags))
		return;

	data->shallow_depth = q->async_depth;
	if (ops->limit_async_depth)
		ops->limit_async_depth(data);
}

Thanks,
Kuai

> 
> Thanks,
> Kuai
> 
>> Thanks,
>> --Nilay
>>
> .
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/7] blk-mq: add a new queue sysfs attribute async_depth
  2025-10-09  0:48       ` Yu Kuai
@ 2025-10-09  6:42         ` Yu Kuai
  0 siblings, 0 replies; 16+ messages in thread
From: Yu Kuai @ 2025-10-09  6:42 UTC (permalink / raw)
  To: Yu Kuai, Yu Kuai, Nilay Shroff, axboe, bvanassche, ming.lei
  Cc: linux-block, linux-kernel, yi.zhang, yangerkun, johnny.chenyi,
	yukuai (C)

Hi,

在 2025/10/09 8:48, Yu Kuai 写道:
> Hi,
> 
> 在 2025/10/06 9:57, Yu Kuai 写道:
>> Hi,
>>
>> 在 2025/10/2 23:10, Nilay Shroff 写道:
>>>
>>> On 9/30/25 12:41 PM, Yu Kuai wrote:
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> Add a new field async_depth to request_queue and related APIs, this is
>>>> currently not used, following patches will convert elevators to use
>>>> this instead of internal async_depth.
>>>>
>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>> ---
>>>>   block/blk-core.c       |  1 +
>>>>   block/blk-mq.c         |  4 ++++
>>>>   block/blk-sysfs.c      | 47 
>>>> ++++++++++++++++++++++++++++++++++++++++++
>>>>   block/elevator.c       |  1 +
>>>>   include/linux/blkdev.h |  1 +
>>>>   5 files changed, 54 insertions(+)
>>>>
>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>> index dd39ff651095..76df70cfc103 100644
>>>> --- a/block/blk-core.c
>>>> +++ b/block/blk-core.c
>>>> @@ -463,6 +463,7 @@ struct request_queue *blk_alloc_queue(struct 
>>>> queue_limits *lim, int node_id)
>>>>       fs_reclaim_release(GFP_KERNEL);
>>>>       q->nr_requests = BLKDEV_DEFAULT_RQ;
>>>> +    q->async_depth = BLKDEV_DEFAULT_RQ;
>>>>       return q;
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index 09f579414161..260e54fa48f0 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -529,6 +529,8 @@ static struct request 
>>>> *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>>>>               data->rq_flags |= RQF_USE_SCHED;
>>>>               if (ops->limit_depth)
>>>>                   ops->limit_depth(data->cmd_flags, data);
>>>> +            else if (!blk_mq_sched_sync_request(data->cmd_flags))
>>>> +                data->shallow_depth = q->async_depth;
>>>>           }
>>> In the subsequent patches, I saw that ->limit_depth is still used for 
>>> the
>>> BFQ scheduler. Given that, it seems more consistent to also retain 
>>> ->limit_depth
>>> for the mq-deadline and Kyber schedulers, and set data->shallow_depth 
>>> within their
>>> respective ->limit_depth methods. If we take this approach, the 
>>> additional
>>> blk_mq_sched_sync_request() check above becomes unnecessary.
>>>
>>> So IMO:
>>> - Keep ->limit_depth for all schedulers (bfq, mq-deadline, kyber).
>>> - Remove the extra blk_mq_sched_sync_request() check from the core code.
>>
>> I was thinking to save a function call for deadline and kyber, 
>> however, I don't
>> have preference here and I can do this in the next version.
> 
> How abount following, I feel this is better while cooking the new
> version. Consider only bfq have specail handling for async request.
> 
> static void blk_mq_sched_limit_async_depth(struct blk_mq_alloc_data *data)
> {
>      if (blk_mq_sched_sync_request(data->cmd_flags))
>          return;
> 
>      data->shallow_depth = q->async_depth;
>      if (ops->limit_async_depth)
>          ops->limit_async_depth(data);
> }
> 

Just realize I forgot that bfq can limit sync requests as well due to
bfq cgroup policy, so this is not good.

Please ignore this :)

Thanks,
Kuai

> Thanks,
> Kuai
> 
>>
>> Thanks,
>> Kuai
>>
>>> Thanks,
>>> --Nilay
>>>
>> .
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-10-09  6:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-30  7:11 [PATCH 0/7] blk-mq: introduce new queue attribute async_depth Yu Kuai
2025-09-30  7:11 ` [PATCH 1/7] block: convert nr_requests to unsigned int Yu Kuai
2025-10-02 15:13   ` Nilay Shroff
2025-09-30  7:11 ` [PATCH 2/7] blk-mq-sched: unify elevators checking for async requests Yu Kuai
2025-09-30  7:11 ` [PATCH 3/7] blk-mq: add a new queue sysfs attribute async_depth Yu Kuai
2025-10-02 15:10   ` Nilay Shroff
2025-10-06  1:57     ` Yu Kuai
2025-10-09  0:48       ` Yu Kuai
2025-10-09  6:42         ` Yu Kuai
2025-09-30  7:11 ` [PATCH 4/7] kyber: covert to use request_queue->async_depth Yu Kuai
2025-09-30  7:11 ` [PATCH 5/7] mq-deadline: " Yu Kuai
2025-10-02 18:56   ` Jeff Moyer
2025-09-30  7:11 ` [PATCH 6/7] block, bfq: convert " Yu Kuai
2025-09-30  7:11 ` [PATCH 7/7] blk-mq: add documentation for new queue attribute async_dpeth Yu Kuai
2025-10-02 15:12   ` Nilay Shroff
2025-10-06  2:00     ` Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).