[PATCH 0/6] blk-mq-sched: support request batch dispatching for sq elevator

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] blk-mq-sched: support request batch dispatching for sq elevator
@ 2025-07-22  7:24 Yu Kuai
  2025-07-22  7:24 ` [PATCH 1/6] mq-deadline: switch to use high layer elevator lock Yu Kuai
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Yu Kuai @ 2025-07-22  7:24 UTC (permalink / raw)
  To: dlemoal, hare, tj, josef, axboe, yukuai3
  Cc: cgroups, linux-block, linux-kernel, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Currently, both mq-deadline and bfq have global spin lock that will be
grabbed inside elevator methods like dispatch_request, insert_requests,
and bio_merge. And the global lock is the main reason mq-deadline and
bfq can't scale very well.

For dispatch_request method, current behavior is dispatching one request at
a time. In the case of multiple dispatching contexts, This behavior, on the
one hand, introduce intense lock contention:

t1:                     t2:                     t3:
lock                    lock                    lock
// grab lock
ops.dispatch_request
unlock
                        // grab lock
                        ops.dispatch_request
                        unlock
                                                // grab lock
                                                ops.dispatch_request
                                                unlock

on the other hand, messing up the requests dispatching order:
t1:

lock
rq1 = ops.dispatch_request
unlock
                        t2:
                        lock
                        rq2 = ops.dispatch_request
                        unlock

lock
rq3 = ops.dispatch_request
unlock

                        lock
                        rq4 = ops.dispatch_request
                        unlock

//rq1,rq3 issue to disk
                        // rq2, rq4 issue to disk

In this case, the elevator dispatch order is rq 1-2-3-4, however,
such order in disk is rq 1-3-2-4, the order for rq2 and rq3 is inversed.

While dispatching request, blk_mq_get_disatpch_budget() and
blk_mq_get_driver_tag() must be called, and they are not ready to be
called inside elevator methods, hence introduce a new method like
dispatch_requests is not possible.

In conclusion, this set factor the global lock out of dispatch_request
method, and support request batch dispatch by calling the methods
multiple time while holding the lock.

nullblk setup:
modprobe null_blk nr_devices=0 &&
    udevadm settle &&
    cd /sys/kernel/config/nullb &&
    mkdir nullb0 &&
    cd nullb0 &&
    echo 0 > completion_nsec &&
    echo 512 > blocksize &&
    echo 0 > home_node &&
    echo 0 > irqmode &&
    echo 128 > submit_queues &&
    echo 1024 > hw_queue_depth &&
    echo 1024 > size &&
    echo 0 > memory_backed &&
    echo 2 > queue_mode &&
    echo 1 > power ||
    exit $?

Test script:
fio -filename=/dev/$disk -name=test -rw=randwrite -bs=4k -iodepth=32 \
  -numjobs=16 --iodepth_batch_submit=8 --iodepth_batch_complete=8 \
  -direct=1 -ioengine=io_uring -group_reporting -time_based -runtime=30

Test result: iops

|                 | deadline | bfq      |
| --------------- | -------- | -------- |
| before this set | 263k     | 124k     |
| after this set  | 475k     | 292k     |

Yu Kuai (6):
  mq-deadline: switch to use high layer elevator lock
  block, bfq: don't grab queue_lock from io path
  block, bfq: switch to use elevator lock
  elevator: factor elevator lock out of dispatch_request method
  blk-mq-sched: refactor __blk_mq_do_dispatch_sched()
  blk-mq-sched: support request batch dispatching for sq elevator

 block/bfq-cgroup.c   |   4 +-
 block/bfq-iosched.c  |  73 ++++++-------
 block/bfq-iosched.h  |   2 +-
 block/blk-ioc.c      |  43 +++++++-
 block/blk-mq-sched.c | 240 ++++++++++++++++++++++++++++++-------------
 block/blk-mq.h       |  21 ++++
 block/blk.h          |   2 +-
 block/elevator.c     |   1 +
 block/elevator.h     |   4 +-
 block/mq-deadline.c  |  58 +++++------
 10 files changed, 293 insertions(+), 155 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] mq-deadline: switch to use high layer elevator lock
  2025-07-22  7:24 [PATCH 0/6] blk-mq-sched: support request batch dispatching for sq elevator Yu Kuai
@ 2025-07-22  7:24 ` Yu Kuai
  2025-07-23  1:46   ` Damien Le Moal
  2025-07-22  7:24 ` [PATCH 2/6] block, bfq: don't grab queue_lock from io path Yu Kuai
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-07-22  7:24 UTC (permalink / raw)
  To: dlemoal, hare, tj, josef, axboe, yukuai3
  Cc: cgroups, linux-block, linux-kernel, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Introduce a new spinlock in elevator_queue, and switch dd->lock to
use the new lock. There are no functional changes.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/elevator.c    |  1 +
 block/elevator.h    |  4 ++--
 block/mq-deadline.c | 57 ++++++++++++++++++++++-----------------------
 3 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index ab22542e6cf0..91df270d9d91 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -144,6 +144,7 @@ struct elevator_queue *elevator_alloc(struct request_queue *q,
 	eq->type = e;
 	kobject_init(&eq->kobj, &elv_ktype);
 	mutex_init(&eq->sysfs_lock);
+	spin_lock_init(&eq->lock);
 	hash_init(eq->hash);
 
 	return eq;
diff --git a/block/elevator.h b/block/elevator.h
index a07ce773a38f..cbbac4f7825c 100644
--- a/block/elevator.h
+++ b/block/elevator.h
@@ -110,12 +110,12 @@ struct request *elv_rqhash_find(struct request_queue *q, sector_t offset);
 /*
  * each queue has an elevator_queue associated with it
  */
-struct elevator_queue
-{
+struct elevator_queue {
 	struct elevator_type *type;
 	void *elevator_data;
 	struct kobject kobj;
 	struct mutex sysfs_lock;
+	spinlock_t lock;
 	unsigned long flags;
 	DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
 };
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 2edf1cac06d5..e31da6de7764 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -101,7 +101,7 @@ struct deadline_data {
 	u32 async_depth;
 	int prio_aging_expire;
 
-	spinlock_t lock;
+	spinlock_t *lock;
 };
 
 /* Maps an I/O priority class to a deadline scheduler priority. */
@@ -213,7 +213,7 @@ static void dd_merged_requests(struct request_queue *q, struct request *req,
 	const u8 ioprio_class = dd_rq_ioclass(next);
 	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
 
-	lockdep_assert_held(&dd->lock);
+	lockdep_assert_held(dd->lock);
 
 	dd->per_prio[prio].stats.merged++;
 
@@ -253,7 +253,7 @@ static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
 {
 	const struct io_stats_per_prio *stats = &dd->per_prio[prio].stats;
 
-	lockdep_assert_held(&dd->lock);
+	lockdep_assert_held(dd->lock);
 
 	return stats->inserted - atomic_read(&stats->completed);
 }
@@ -323,7 +323,7 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 	enum dd_prio prio;
 	u8 ioprio_class;
 
-	lockdep_assert_held(&dd->lock);
+	lockdep_assert_held(dd->lock);
 
 	if (!list_empty(&per_prio->dispatch)) {
 		rq = list_first_entry(&per_prio->dispatch, struct request,
@@ -434,7 +434,7 @@ static struct request *dd_dispatch_prio_aged_requests(struct deadline_data *dd,
 	enum dd_prio prio;
 	int prio_cnt;
 
-	lockdep_assert_held(&dd->lock);
+	lockdep_assert_held(dd->lock);
 
 	prio_cnt = !!dd_queued(dd, DD_RT_PRIO) + !!dd_queued(dd, DD_BE_PRIO) +
 		   !!dd_queued(dd, DD_IDLE_PRIO);
@@ -466,7 +466,7 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	struct request *rq;
 	enum dd_prio prio;
 
-	spin_lock(&dd->lock);
+	spin_lock(dd->lock);
 	rq = dd_dispatch_prio_aged_requests(dd, now);
 	if (rq)
 		goto unlock;
@@ -482,8 +482,7 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	}
 
 unlock:
-	spin_unlock(&dd->lock);
-
+	spin_unlock(dd->lock);
 	return rq;
 }
 
@@ -552,9 +551,9 @@ static void dd_exit_sched(struct elevator_queue *e)
 		WARN_ON_ONCE(!list_empty(&per_prio->fifo_list[DD_READ]));
 		WARN_ON_ONCE(!list_empty(&per_prio->fifo_list[DD_WRITE]));
 
-		spin_lock(&dd->lock);
+		spin_lock(dd->lock);
 		queued = dd_queued(dd, prio);
-		spin_unlock(&dd->lock);
+		spin_unlock(dd->lock);
 
 		WARN_ONCE(queued != 0,
 			  "statistics for priority %d: i %u m %u d %u c %u\n",
@@ -601,7 +600,7 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
 	dd->last_dir = DD_WRITE;
 	dd->fifo_batch = fifo_batch;
 	dd->prio_aging_expire = prio_aging_expire;
-	spin_lock_init(&dd->lock);
+	dd->lock = &eq->lock;
 
 	/* We dispatch from request queue wide instead of hw queue */
 	blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
@@ -657,9 +656,9 @@ static bool dd_bio_merge(struct request_queue *q, struct bio *bio,
 	struct request *free = NULL;
 	bool ret;
 
-	spin_lock(&dd->lock);
+	spin_lock(dd->lock);
 	ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
-	spin_unlock(&dd->lock);
+	spin_unlock(dd->lock);
 
 	if (free)
 		blk_mq_free_request(free);
@@ -681,7 +680,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	struct dd_per_prio *per_prio;
 	enum dd_prio prio;
 
-	lockdep_assert_held(&dd->lock);
+	lockdep_assert_held(dd->lock);
 
 	prio = ioprio_class_to_prio[ioprio_class];
 	per_prio = &dd->per_prio[prio];
@@ -725,7 +724,7 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 	struct deadline_data *dd = q->elevator->elevator_data;
 	LIST_HEAD(free);
 
-	spin_lock(&dd->lock);
+	spin_lock(dd->lock);
 	while (!list_empty(list)) {
 		struct request *rq;
 
@@ -733,7 +732,7 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 		list_del_init(&rq->queuelist);
 		dd_insert_request(hctx, rq, flags, &free);
 	}
-	spin_unlock(&dd->lock);
+	spin_unlock(dd->lock);
 
 	blk_mq_free_requests(&free);
 }
@@ -849,13 +848,13 @@ static const struct elv_fs_entry deadline_attrs[] = {
 #define DEADLINE_DEBUGFS_DDIR_ATTRS(prio, data_dir, name)		\
 static void *deadline_##name##_fifo_start(struct seq_file *m,		\
 					  loff_t *pos)			\
-	__acquires(&dd->lock)						\
+	__acquires(dd->lock)						\
 {									\
 	struct request_queue *q = m->private;				\
 	struct deadline_data *dd = q->elevator->elevator_data;		\
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];		\
 									\
-	spin_lock(&dd->lock);						\
+	spin_lock(dd->lock);						\
 	return seq_list_start(&per_prio->fifo_list[data_dir], *pos);	\
 }									\
 									\
@@ -870,12 +869,12 @@ static void *deadline_##name##_fifo_next(struct seq_file *m, void *v,	\
 }									\
 									\
 static void deadline_##name##_fifo_stop(struct seq_file *m, void *v)	\
-	__releases(&dd->lock)						\
+	__releases(dd->lock)						\
 {									\
 	struct request_queue *q = m->private;				\
 	struct deadline_data *dd = q->elevator->elevator_data;		\
 									\
-	spin_unlock(&dd->lock);						\
+	spin_unlock(dd->lock);						\
 }									\
 									\
 static const struct seq_operations deadline_##name##_fifo_seq_ops = {	\
@@ -941,11 +940,11 @@ static int dd_queued_show(void *data, struct seq_file *m)
 	struct deadline_data *dd = q->elevator->elevator_data;
 	u32 rt, be, idle;
 
-	spin_lock(&dd->lock);
+	spin_lock(dd->lock);
 	rt = dd_queued(dd, DD_RT_PRIO);
 	be = dd_queued(dd, DD_BE_PRIO);
 	idle = dd_queued(dd, DD_IDLE_PRIO);
-	spin_unlock(&dd->lock);
+	spin_unlock(dd->lock);
 
 	seq_printf(m, "%u %u %u\n", rt, be, idle);
 
@@ -957,7 +956,7 @@ static u32 dd_owned_by_driver(struct deadline_data *dd, enum dd_prio prio)
 {
 	const struct io_stats_per_prio *stats = &dd->per_prio[prio].stats;
 
-	lockdep_assert_held(&dd->lock);
+	lockdep_assert_held(dd->lock);
 
 	return stats->dispatched + stats->merged -
 		atomic_read(&stats->completed);
@@ -969,11 +968,11 @@ static int dd_owned_by_driver_show(void *data, struct seq_file *m)
 	struct deadline_data *dd = q->elevator->elevator_data;
 	u32 rt, be, idle;
 
-	spin_lock(&dd->lock);
+	spin_lock(dd->lock);
 	rt = dd_owned_by_driver(dd, DD_RT_PRIO);
 	be = dd_owned_by_driver(dd, DD_BE_PRIO);
 	idle = dd_owned_by_driver(dd, DD_IDLE_PRIO);
-	spin_unlock(&dd->lock);
+	spin_unlock(dd->lock);
 
 	seq_printf(m, "%u %u %u\n", rt, be, idle);
 
@@ -983,13 +982,13 @@ static int dd_owned_by_driver_show(void *data, struct seq_file *m)
 #define DEADLINE_DISPATCH_ATTR(prio)					\
 static void *deadline_dispatch##prio##_start(struct seq_file *m,	\
 					     loff_t *pos)		\
-	__acquires(&dd->lock)						\
+	__acquires(dd->lock)						\
 {									\
 	struct request_queue *q = m->private;				\
 	struct deadline_data *dd = q->elevator->elevator_data;		\
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];		\
 									\
-	spin_lock(&dd->lock);						\
+	spin_lock(dd->lock);						\
 	return seq_list_start(&per_prio->dispatch, *pos);		\
 }									\
 									\
@@ -1004,12 +1003,12 @@ static void *deadline_dispatch##prio##_next(struct seq_file *m,		\
 }									\
 									\
 static void deadline_dispatch##prio##_stop(struct seq_file *m, void *v)	\
-	__releases(&dd->lock)						\
+	__releases(dd->lock)						\
 {									\
 	struct request_queue *q = m->private;				\
 	struct deadline_data *dd = q->elevator->elevator_data;		\
 									\
-	spin_unlock(&dd->lock);						\
+	spin_unlock(dd->lock);						\
 }									\
 									\
 static const struct seq_operations deadline_dispatch##prio##_seq_ops = { \
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] block, bfq: don't grab queue_lock from io path
  2025-07-22  7:24 [PATCH 0/6] blk-mq-sched: support request batch dispatching for sq elevator Yu Kuai
  2025-07-22  7:24 ` [PATCH 1/6] mq-deadline: switch to use high layer elevator lock Yu Kuai
@ 2025-07-22  7:24 ` Yu Kuai
  2025-07-23  1:52   ` Damien Le Moal
  2025-07-22  7:24 ` [PATCH 3/6] block, bfq: switch to use elevator lock Yu Kuai
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-07-22  7:24 UTC (permalink / raw)
  To: dlemoal, hare, tj, josef, axboe, yukuai3
  Cc: cgroups, linux-block, linux-kernel, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Currently issue io can grab queue_lock three times from bfq_bio_merge(),
bfq_limit_depth() and bfq_prepare_request(), the queue_lock is not
necessary if icq is already created:

- queue_usage_counter is already grabbed and queue won't exist;
- current thread won't exist;
- if other thread is allocating and inserting new icq to ioc->icq_tree,
  rcu can be used to protect lookup icq from the raidx tree, it's safe
  to use extracted icq until queue or current thread exit;

If ioc or icq is not created, then bfq_prepare_request() will create it,
which means the task is issuing io to queue the first time, this can
consider a slow path and queue_lock will still be held to protect
inserting allocated icq to ioc->icq_tree.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bfq-iosched.c | 24 +++++++-----------------
 block/blk-ioc.c     | 43 ++++++++++++++++++++++++++++++++++++++-----
 block/blk.h         |  2 +-
 3 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0cb1e9873aab..58d57c482acd 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -454,17 +454,13 @@ static struct bfq_io_cq *icq_to_bic(struct io_cq *icq)
  */
 static struct bfq_io_cq *bfq_bic_lookup(struct request_queue *q)
 {
-	struct bfq_io_cq *icq;
-	unsigned long flags;
-
-	if (!current->io_context)
-		return NULL;
+	struct io_cq *icq;
 
-	spin_lock_irqsave(&q->queue_lock, flags);
-	icq = icq_to_bic(ioc_lookup_icq(q));
-	spin_unlock_irqrestore(&q->queue_lock, flags);
+	rcu_read_lock();
+	icq = ioc_lookup_icq_rcu(q);
+	rcu_read_unlock();
 
-	return icq;
+	return icq_to_bic(icq);
 }
 
 /*
@@ -2456,16 +2452,10 @@ static void bfq_remove_request(struct request_queue *q,
 static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs)
 {
+	/* bic will not be freed until current or elevator exit */
+	struct bfq_io_cq *bic = bfq_bic_lookup(q);
 	struct bfq_data *bfqd = q->elevator->elevator_data;
 	struct request *free = NULL;
-	/*
-	 * bfq_bic_lookup grabs the queue_lock: invoke it now and
-	 * store its return value for later use, to avoid nesting
-	 * queue_lock inside the bfqd->lock. We assume that the bic
-	 * returned by bfq_bic_lookup does not go away before
-	 * bfqd->lock is taken.
-	 */
-	struct bfq_io_cq *bic = bfq_bic_lookup(q);
 	bool ret;
 
 	spin_lock_irq(&bfqd->lock);
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index ce82770c72ab..0be097a37e22 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -314,7 +314,7 @@ int __copy_io(unsigned long clone_flags, struct task_struct *tsk)
  * Look up io_cq associated with @ioc - @q pair from @ioc.  Must be called
  * with @q->queue_lock held.
  */
-struct io_cq *ioc_lookup_icq(struct request_queue *q)
+static struct io_cq *ioc_lookup_icq(struct request_queue *q)
 {
 	struct io_context *ioc = current->io_context;
 	struct io_cq *icq;
@@ -341,7 +341,40 @@ struct io_cq *ioc_lookup_icq(struct request_queue *q)
 	rcu_read_unlock();
 	return icq;
 }
-EXPORT_SYMBOL(ioc_lookup_icq);
+
+/**
+ * ioc_lookup_icq_rcu - lookup io_cq from ioc in io path
+ * @q: the associated request_queue
+ *
+ * Look up io_cq associated with @ioc - @q pair from @ioc.  Must be called
+ * from io path, either return NULL if current issue io to @q for the first
+ * time, or return a valid icq.
+ */
+struct io_cq *ioc_lookup_icq_rcu(struct request_queue *q)
+{
+	struct io_context *ioc = current->io_context;
+	struct io_cq *icq;
+
+	WARN_ON_ONCE(percpu_ref_is_zero(&q->q_usage_counter));
+
+	if (!ioc)
+		return NULL;
+
+	icq = rcu_dereference(ioc->icq_hint);
+	if (icq && icq->q == q)
+		return icq;
+
+	icq = radix_tree_lookup(&ioc->icq_tree, q->id);
+	if (!icq)
+		return NULL;
+
+	if (WARN_ON_ONCE(icq->q != q))
+		return NULL;
+
+	rcu_assign_pointer(ioc->icq_hint, icq);
+	return icq;
+}
+EXPORT_SYMBOL(ioc_lookup_icq_rcu);
 
 /**
  * ioc_create_icq - create and link io_cq
@@ -420,9 +453,9 @@ struct io_cq *ioc_find_get_icq(struct request_queue *q)
 	} else {
 		get_io_context(ioc);
 
-		spin_lock_irq(&q->queue_lock);
-		icq = ioc_lookup_icq(q);
-		spin_unlock_irq(&q->queue_lock);
+		rcu_read_lock();
+		icq = ioc_lookup_icq_rcu(q);
+		rcu_read_unlock();
 	}
 
 	if (!icq) {
diff --git a/block/blk.h b/block/blk.h
index 468aa83c5a22..3c078e517d59 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -460,7 +460,7 @@ static inline void req_set_nomerge(struct request_queue *q, struct request *req)
  * Internal io_context interface
  */
 struct io_cq *ioc_find_get_icq(struct request_queue *q);
-struct io_cq *ioc_lookup_icq(struct request_queue *q);
+struct io_cq *ioc_lookup_icq_rcu(struct request_queue *q);
 #ifdef CONFIG_BLK_ICQ
 void ioc_clear_queue(struct request_queue *q);
 #else
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] block, bfq: switch to use elevator lock
  2025-07-22  7:24 [PATCH 0/6] blk-mq-sched: support request batch dispatching for sq elevator Yu Kuai
  2025-07-22  7:24 ` [PATCH 1/6] mq-deadline: switch to use high layer elevator lock Yu Kuai
  2025-07-22  7:24 ` [PATCH 2/6] block, bfq: don't grab queue_lock from io path Yu Kuai
@ 2025-07-22  7:24 ` Yu Kuai
  2025-07-23  1:53   ` Damien Le Moal
  2025-07-22  7:24 ` [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method Yu Kuai
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-07-22  7:24 UTC (permalink / raw)
  To: dlemoal, hare, tj, josef, axboe, yukuai3
  Cc: cgroups, linux-block, linux-kernel, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Replace the internal spinlock bfqd->lock with the new spinlock in
elevator_queue. There are no functional changes.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bfq-cgroup.c  |  4 ++--
 block/bfq-iosched.c | 50 ++++++++++++++++++++++-----------------------
 block/bfq-iosched.h |  2 +-
 3 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 9fb9f3533150..1717bac7eccc 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -878,7 +878,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
 	unsigned long flags;
 	int i;
 
-	spin_lock_irqsave(&bfqd->lock, flags);
+	spin_lock_irqsave(bfqd->lock, flags);
 
 	if (!entity) /* root group */
 		goto put_async_queues;
@@ -923,7 +923,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
 put_async_queues:
 	bfq_put_async_queues(bfqd, bfqg);
 
-	spin_unlock_irqrestore(&bfqd->lock, flags);
+	spin_unlock_irqrestore(bfqd->lock, flags);
 	/*
 	 * @blkg is going offline and will be ignored by
 	 * blkg_[rw]stat_recursive_sum().  Transfer stats to the parent so
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 58d57c482acd..11b81b11242c 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -469,7 +469,7 @@ static struct bfq_io_cq *bfq_bic_lookup(struct request_queue *q)
  */
 void bfq_schedule_dispatch(struct bfq_data *bfqd)
 {
-	lockdep_assert_held(&bfqd->lock);
+	lockdep_assert_held(bfqd->lock);
 
 	if (bfqd->queued != 0) {
 		bfq_log(bfqd, "schedule dispatch");
@@ -594,7 +594,7 @@ static bool bfqq_request_over_limit(struct bfq_data *bfqd,
 	int level;
 
 retry:
-	spin_lock_irq(&bfqd->lock);
+	spin_lock_irq(bfqd->lock);
 	bfqq = bic_to_bfqq(bic, op_is_sync(opf), act_idx);
 	if (!bfqq)
 		goto out;
@@ -606,7 +606,7 @@ static bool bfqq_request_over_limit(struct bfq_data *bfqd,
 	/* +1 for bfqq entity, root cgroup not included */
 	depth = bfqg_to_blkg(bfqq_group(bfqq))->blkcg->css.cgroup->level + 1;
 	if (depth > alloc_depth) {
-		spin_unlock_irq(&bfqd->lock);
+		spin_unlock_irq(bfqd->lock);
 		if (entities != inline_entities)
 			kfree(entities);
 		entities = kmalloc_array(depth, sizeof(*entities), GFP_NOIO);
@@ -664,7 +664,7 @@ static bool bfqq_request_over_limit(struct bfq_data *bfqd,
 		}
 	}
 out:
-	spin_unlock_irq(&bfqd->lock);
+	spin_unlock_irq(bfqd->lock);
 	if (entities != inline_entities)
 		kfree(entities);
 	return ret;
@@ -2458,7 +2458,7 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
 	struct request *free = NULL;
 	bool ret;
 
-	spin_lock_irq(&bfqd->lock);
+	spin_lock_irq(bfqd->lock);
 
 	if (bic) {
 		/*
@@ -2476,7 +2476,7 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
 
 	ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
 
-	spin_unlock_irq(&bfqd->lock);
+	spin_unlock_irq(bfqd->lock);
 	if (free)
 		blk_mq_free_request(free);
 
@@ -2651,7 +2651,7 @@ static void bfq_end_wr(struct bfq_data *bfqd)
 	struct bfq_queue *bfqq;
 	int i;
 
-	spin_lock_irq(&bfqd->lock);
+	spin_lock_irq(bfqd->lock);
 
 	for (i = 0; i < bfqd->num_actuators; i++) {
 		list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list)
@@ -2661,7 +2661,7 @@ static void bfq_end_wr(struct bfq_data *bfqd)
 		bfq_bfqq_end_wr(bfqq);
 	bfq_end_wr_async(bfqd);
 
-	spin_unlock_irq(&bfqd->lock);
+	spin_unlock_irq(bfqd->lock);
 }
 
 static sector_t bfq_io_struct_pos(void *io_struct, bool request)
@@ -5307,7 +5307,7 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	struct bfq_queue *in_serv_queue;
 	bool waiting_rq, idle_timer_disabled = false;
 
-	spin_lock_irq(&bfqd->lock);
+	spin_lock_irq(bfqd->lock);
 
 	in_serv_queue = bfqd->in_service_queue;
 	waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
@@ -5318,7 +5318,7 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 			waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
 	}
 
-	spin_unlock_irq(&bfqd->lock);
+	spin_unlock_irq(bfqd->lock);
 	bfq_update_dispatch_stats(hctx->queue, rq,
 			idle_timer_disabled ? in_serv_queue : NULL,
 				idle_timer_disabled);
@@ -5496,9 +5496,9 @@ static void bfq_exit_icq(struct io_cq *icq)
 	 * this is the last time these queues are accessed.
 	 */
 	if (bfqd) {
-		spin_lock_irqsave(&bfqd->lock, flags);
+		spin_lock_irqsave(bfqd->lock, flags);
 		_bfq_exit_icq(bic, bfqd->num_actuators);
-		spin_unlock_irqrestore(&bfqd->lock, flags);
+		spin_unlock_irqrestore(bfqd->lock, flags);
 	} else {
 		_bfq_exit_icq(bic, BFQ_MAX_ACTUATORS);
 	}
@@ -6254,10 +6254,10 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	if (!cgroup_subsys_on_dfl(io_cgrp_subsys) && rq->bio)
 		bfqg_stats_update_legacy_io(q, rq);
 #endif
-	spin_lock_irq(&bfqd->lock);
+	spin_lock_irq(bfqd->lock);
 	bfqq = bfq_init_rq(rq);
 	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
-		spin_unlock_irq(&bfqd->lock);
+		spin_unlock_irq(bfqd->lock);
 		blk_mq_free_requests(&free);
 		return;
 	}
@@ -6290,7 +6290,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	 * merge).
 	 */
 	cmd_flags = rq->cmd_flags;
-	spin_unlock_irq(&bfqd->lock);
+	spin_unlock_irq(bfqd->lock);
 
 	bfq_update_insert_stats(q, bfqq, idle_timer_disabled,
 				cmd_flags);
@@ -6671,7 +6671,7 @@ static void bfq_finish_requeue_request(struct request *rq)
 					     rq->io_start_time_ns,
 					     rq->cmd_flags);
 
-	spin_lock_irqsave(&bfqd->lock, flags);
+	spin_lock_irqsave(bfqd->lock, flags);
 	if (likely(rq->rq_flags & RQF_STARTED)) {
 		if (rq == bfqd->waited_rq)
 			bfq_update_inject_limit(bfqd, bfqq);
@@ -6681,7 +6681,7 @@ static void bfq_finish_requeue_request(struct request *rq)
 	bfqq_request_freed(bfqq);
 	bfq_put_queue(bfqq);
 	RQ_BIC(rq)->requests--;
-	spin_unlock_irqrestore(&bfqd->lock, flags);
+	spin_unlock_irqrestore(bfqd->lock, flags);
 
 	/*
 	 * Reset private fields. In case of a requeue, this allows
@@ -7012,7 +7012,7 @@ bfq_idle_slice_timer_body(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 	enum bfqq_expiration reason;
 	unsigned long flags;
 
-	spin_lock_irqsave(&bfqd->lock, flags);
+	spin_lock_irqsave(bfqd->lock, flags);
 
 	/*
 	 * Considering that bfqq may be in race, we should firstly check
@@ -7022,7 +7022,7 @@ bfq_idle_slice_timer_body(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 	 * been cleared in __bfq_bfqd_reset_in_service func.
 	 */
 	if (bfqq != bfqd->in_service_queue) {
-		spin_unlock_irqrestore(&bfqd->lock, flags);
+		spin_unlock_irqrestore(bfqd->lock, flags);
 		return;
 	}
 
@@ -7050,7 +7050,7 @@ bfq_idle_slice_timer_body(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 
 schedule_dispatch:
 	bfq_schedule_dispatch(bfqd);
-	spin_unlock_irqrestore(&bfqd->lock, flags);
+	spin_unlock_irqrestore(bfqd->lock, flags);
 }
 
 /*
@@ -7176,10 +7176,10 @@ static void bfq_exit_queue(struct elevator_queue *e)
 
 	hrtimer_cancel(&bfqd->idle_slice_timer);
 
-	spin_lock_irq(&bfqd->lock);
+	spin_lock_irq(bfqd->lock);
 	list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
 		bfq_deactivate_bfqq(bfqd, bfqq, false, false);
-	spin_unlock_irq(&bfqd->lock);
+	spin_unlock_irq(bfqd->lock);
 
 	for (actuator = 0; actuator < bfqd->num_actuators; actuator++)
 		WARN_ON_ONCE(bfqd->rq_in_driver[actuator]);
@@ -7193,10 +7193,10 @@ static void bfq_exit_queue(struct elevator_queue *e)
 #ifdef CONFIG_BFQ_GROUP_IOSCHED
 	blkcg_deactivate_policy(bfqd->queue->disk, &blkcg_policy_bfq);
 #else
-	spin_lock_irq(&bfqd->lock);
+	spin_lock_irq(bfqd->lock);
 	bfq_put_async_queues(bfqd, bfqd->root_group);
 	kfree(bfqd->root_group);
-	spin_unlock_irq(&bfqd->lock);
+	spin_unlock_irq(bfqd->lock);
 #endif
 
 	blk_stat_disable_accounting(bfqd->queue);
@@ -7361,7 +7361,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
 	/* see comments on the definition of next field inside bfq_data */
 	bfqd->actuator_load_threshold = 4;
 
-	spin_lock_init(&bfqd->lock);
+	bfqd->lock = &eq->lock;
 
 	/*
 	 * The invocation of the next bfq_create_group_hierarchy
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 687a3a7ba784..d70eb6529dab 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -795,7 +795,7 @@ struct bfq_data {
 	/* fallback dummy bfqq for extreme OOM conditions */
 	struct bfq_queue oom_bfqq;
 
-	spinlock_t lock;
+	spinlock_t *lock;
 
 	/*
 	 * bic associated with the task issuing current bio for
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method
  2025-07-22  7:24 [PATCH 0/6] blk-mq-sched: support request batch dispatching for sq elevator Yu Kuai
                   ` (2 preceding siblings ...)
  2025-07-22  7:24 ` [PATCH 3/6] block, bfq: switch to use elevator lock Yu Kuai
@ 2025-07-22  7:24 ` Yu Kuai
  2025-07-23  1:59   ` Damien Le Moal
  2025-07-22  7:24 ` [PATCH 5/6] blk-mq-sched: refactor __blk_mq_do_dispatch_sched() Yu Kuai
  2025-07-22  7:24 ` [PATCH 6/6] blk-mq-sched: support request batch dispatching for sq elevator Yu Kuai
  5 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-07-22  7:24 UTC (permalink / raw)
  To: dlemoal, hare, tj, josef, axboe, yukuai3
  Cc: cgroups, linux-block, linux-kernel, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Currently, both mq-deadline and bfq have global spin lock that will be
grabbed inside elevator methods like dispatch_request, insert_requests,
and bio_merge. And the global lock is the main reason mq-deadline and
bfq can't scale very well.

For dispatch_request method, current behavior is dispatching one request at
a time. In the case of multiple dispatching contexts, this behavior will
cause huge lock contention and messing up the requests dispatching
order. And folloiwng patches will support requests batch dispatching to
fix thoses problems.

While dispatching request, blk_mq_get_disatpch_budget() and
blk_mq_get_driver_tag() must be called, and they are not ready to be
called inside elevator methods, hence introduce a new method like
dispatch_requests is not possible.

In conclusion, this patch factor the global lock out of dispatch_request
method, and following patches will support request batch dispatch by
calling the methods multiple time while holding the lock.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bfq-iosched.c  | 3 ---
 block/blk-mq-sched.c | 6 ++++++
 block/mq-deadline.c  | 5 +----
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 11b81b11242c..9f8a256e43f2 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5307,8 +5307,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	struct bfq_queue *in_serv_queue;
 	bool waiting_rq, idle_timer_disabled = false;
 
-	spin_lock_irq(bfqd->lock);
-
 	in_serv_queue = bfqd->in_service_queue;
 	waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
 
@@ -5318,7 +5316,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 			waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
 	}
 
-	spin_unlock_irq(bfqd->lock);
 	bfq_update_dispatch_stats(hctx->queue, rq,
 			idle_timer_disabled ? in_serv_queue : NULL,
 				idle_timer_disabled);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 55a0fd105147..82c4f4eef9ed 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -98,6 +98,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 		max_dispatch = hctx->queue->nr_requests;
 
 	do {
+		bool sq_sched = blk_queue_sq_sched(q);
 		struct request *rq;
 		int budget_token;
 
@@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 		if (budget_token < 0)
 			break;
 
+		if (sq_sched)
+			spin_lock_irq(&e->lock);
 		rq = e->type->ops.dispatch_request(hctx);
+		if (sq_sched)
+			spin_unlock_irq(&e->lock);
+
 		if (!rq) {
 			blk_mq_put_dispatch_budget(q, budget_token);
 			/*
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index e31da6de7764..a008e41bc861 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -466,10 +466,9 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	struct request *rq;
 	enum dd_prio prio;
 
-	spin_lock(dd->lock);
 	rq = dd_dispatch_prio_aged_requests(dd, now);
 	if (rq)
-		goto unlock;
+		return rq;
 
 	/*
 	 * Next, dispatch requests in priority order. Ignore lower priority
@@ -481,8 +480,6 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 			break;
 	}
 
-unlock:
-	spin_unlock(dd->lock);
 	return rq;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] blk-mq-sched: refactor __blk_mq_do_dispatch_sched()
  2025-07-22  7:24 [PATCH 0/6] blk-mq-sched: support request batch dispatching for sq elevator Yu Kuai
                   ` (3 preceding siblings ...)
  2025-07-22  7:24 ` [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method Yu Kuai
@ 2025-07-22  7:24 ` Yu Kuai
  2025-07-22  7:24 ` [PATCH 6/6] blk-mq-sched: support request batch dispatching for sq elevator Yu Kuai
  5 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-07-22  7:24 UTC (permalink / raw)
  To: dlemoal, hare, tj, josef, axboe, yukuai3
  Cc: cgroups, linux-block, linux-kernel, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Introduce struct sched_dispatch_ctx, and split the helper into
elevator_dispatch_one_request() and elevator_finish_dispatch(). Also
and comments about the non-error return value.

Make code cleaner, and make it easier to add a new branch to dispatch
a batch of requests at a time in the next patch.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-mq-sched.c | 196 ++++++++++++++++++++++++++-----------------
 1 file changed, 119 insertions(+), 77 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 82c4f4eef9ed..f18aecf710ad 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -74,91 +74,100 @@ static bool blk_mq_dispatch_hctx_list(struct list_head *rq_list)
 
 #define BLK_MQ_BUDGET_DELAY	3		/* ms units */
 
-/*
- * Only SCSI implements .get_budget and .put_budget, and SCSI restarts
- * its queue by itself in its completion handler, so we don't need to
- * restart queue if .get_budget() fails to get the budget.
- *
- * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
- * be run again.  This is necessary to avoid starving flushes.
- */
-static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
-{
-	struct request_queue *q = hctx->queue;
-	struct elevator_queue *e = q->elevator;
-	bool multi_hctxs = false, run_queue = false;
-	bool dispatched = false, busy = false;
-	unsigned int max_dispatch;
-	LIST_HEAD(rq_list);
-	int count = 0;
+struct sched_dispatch_ctx {
+	struct blk_mq_hw_ctx *hctx;
+	struct elevator_queue *e;
+	struct request_queue *q;
 
-	if (hctx->dispatch_busy)
-		max_dispatch = 1;
-	else
-		max_dispatch = hctx->queue->nr_requests;
+	struct list_head rq_list;
+	int count;
 
-	do {
-		bool sq_sched = blk_queue_sq_sched(q);
-		struct request *rq;
-		int budget_token;
+	bool multi_hctxs;
+	bool run_queue;
+	bool busy;
+};
 
-		if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
-			break;
+static bool elevator_can_dispatch(struct sched_dispatch_ctx *ctx)
+{
+	if (ctx->e->type->ops.has_work &&
+	    !ctx->e->type->ops.has_work(ctx->hctx))
+		return false;
 
-		if (!list_empty_careful(&hctx->dispatch)) {
-			busy = true;
-			break;
-		}
+	if (!list_empty_careful(&ctx->hctx->dispatch)) {
+		ctx->busy = true;
+		return false;
+	}
 
-		budget_token = blk_mq_get_dispatch_budget(q);
-		if (budget_token < 0)
-			break;
+	return true;
+}
 
-		if (sq_sched)
-			spin_lock_irq(&e->lock);
-		rq = e->type->ops.dispatch_request(hctx);
-		if (sq_sched)
-			spin_unlock_irq(&e->lock);
+static bool elevator_dispatch_one_request(struct sched_dispatch_ctx *ctx)
+{
+	bool sq_sched = blk_queue_sq_sched(ctx->q);
+	struct request *rq;
+	int budget_token;
 
-		if (!rq) {
-			blk_mq_put_dispatch_budget(q, budget_token);
-			/*
-			 * We're releasing without dispatching. Holding the
-			 * budget could have blocked any "hctx"s with the
-			 * same queue and if we didn't dispatch then there's
-			 * no guarantee anyone will kick the queue.  Kick it
-			 * ourselves.
-			 */
-			run_queue = true;
-			break;
-		}
+	if (!elevator_can_dispatch(ctx))
+		return false;
 
-		blk_mq_set_rq_budget_token(rq, budget_token);
+	budget_token = blk_mq_get_dispatch_budget(ctx->q);
+	if (budget_token < 0)
+		return false;
 
-		/*
-		 * Now this rq owns the budget which has to be released
-		 * if this rq won't be queued to driver via .queue_rq()
-		 * in blk_mq_dispatch_rq_list().
-		 */
-		list_add_tail(&rq->queuelist, &rq_list);
-		count++;
-		if (rq->mq_hctx != hctx)
-			multi_hctxs = true;
+	if (sq_sched)
+		spin_lock_irq(&ctx->e->lock);
+	rq = ctx->e->type->ops.dispatch_request(ctx->hctx);
+	if (sq_sched)
+		spin_unlock_irq(&ctx->e->lock);
 
+	if (!rq) {
+		blk_mq_put_dispatch_budget(ctx->q, budget_token);
 		/*
-		 * If we cannot get tag for the request, stop dequeueing
-		 * requests from the IO scheduler. We are unlikely to be able
-		 * to submit them anyway and it creates false impression for
-		 * scheduling heuristics that the device can take more IO.
+		 * We're releasing without dispatching. Holding the
+		 * budget could have blocked any "hctx"s with the
+		 * same queue and if we didn't dispatch then there's
+		 * no guarantee anyone will kick the queue.  Kick it
+		 * ourselves.
 		 */
-		if (!blk_mq_get_driver_tag(rq))
-			break;
-	} while (count < max_dispatch);
+		ctx->run_queue = true;
+		return false;
+	}
 
-	if (!count) {
-		if (run_queue)
-			blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY);
-	} else if (multi_hctxs) {
+	blk_mq_set_rq_budget_token(rq, budget_token);
+
+	/*
+	 * Now this rq owns the budget which has to be released
+	 * if this rq won't be queued to driver via .queue_rq()
+	 * in blk_mq_dispatch_rq_list().
+	 */
+	list_add_tail(&rq->queuelist, &ctx->rq_list);
+	ctx->count++;
+	if (rq->mq_hctx != ctx->hctx)
+		ctx->multi_hctxs = true;
+
+	/*
+	 * If we cannot get tag for the request, stop dequeueing
+	 * requests from the IO scheduler. We are unlikely to be able
+	 * to submit them anyway and it creates false impression for
+	 * scheduling heuristics that the device can take more IO.
+	 */
+	return blk_mq_get_driver_tag(rq);
+}
+
+/*
+ * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
+ * be run again. This is necessary to avoid starving flushes.
+ * Return 0 if no request is dispatched.
+ * Return 1 if at least one request is dispatched.
+ */
+static int elevator_finish_dispatch(struct sched_dispatch_ctx *ctx)
+{
+	bool dispatched = false;
+
+	if (!ctx->count) {
+		if (ctx->run_queue)
+			blk_mq_delay_run_hw_queues(ctx->q, BLK_MQ_BUDGET_DELAY);
+	} else if (ctx->multi_hctxs) {
 		/*
 		 * Requests from different hctx may be dequeued from some
 		 * schedulers, such as bfq and deadline.
@@ -166,19 +175,52 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 		 * Sort the requests in the list according to their hctx,
 		 * dispatch batching requests from same hctx at a time.
 		 */
-		list_sort(NULL, &rq_list, sched_rq_cmp);
+		list_sort(NULL, &ctx->rq_list, sched_rq_cmp);
 		do {
-			dispatched |= blk_mq_dispatch_hctx_list(&rq_list);
-		} while (!list_empty(&rq_list));
+			dispatched |= blk_mq_dispatch_hctx_list(&ctx->rq_list);
+		} while (!list_empty(&ctx->rq_list));
 	} else {
-		dispatched = blk_mq_dispatch_rq_list(hctx, &rq_list, false);
+		dispatched = blk_mq_dispatch_rq_list(ctx->hctx, &ctx->rq_list,
+						     false);
 	}
 
-	if (busy)
+	if (ctx->busy)
 		return -EAGAIN;
+
 	return !!dispatched;
 }
 
+/*
+ * Only SCSI implements .get_budget and .put_budget, and SCSI restarts
+ * its queue by itself in its completion handler, so we don't need to
+ * restart queue if .get_budget() fails to get the budget.
+ *
+ * See elevator_finish_dispatch() for return values.
+ */
+static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
+{
+	unsigned int max_dispatch;
+	struct sched_dispatch_ctx ctx = {
+		.hctx	= hctx,
+		.q	= hctx->queue,
+		.e	= hctx->queue->elevator,
+	};
+
+	INIT_LIST_HEAD(&ctx.rq_list);
+
+	if (hctx->dispatch_busy)
+		max_dispatch = 1;
+	else
+		max_dispatch = hctx->queue->nr_requests;
+
+	do {
+		if (!elevator_dispatch_one_request(&ctx))
+			break;
+	} while (ctx.count < max_dispatch);
+
+	return elevator_finish_dispatch(&ctx);
+}
+
 static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 {
 	unsigned long end = jiffies + HZ;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] blk-mq-sched: support request batch dispatching for sq elevator
  2025-07-22  7:24 [PATCH 0/6] blk-mq-sched: support request batch dispatching for sq elevator Yu Kuai
                   ` (4 preceding siblings ...)
  2025-07-22  7:24 ` [PATCH 5/6] blk-mq-sched: refactor __blk_mq_do_dispatch_sched() Yu Kuai
@ 2025-07-22  7:24 ` Yu Kuai
  5 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-07-22  7:24 UTC (permalink / raw)
  To: dlemoal, hare, tj, josef, axboe, yukuai3
  Cc: cgroups, linux-block, linux-kernel, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

For dispatch_request method, current behavior is dispatching one request at
a time. In the case of multiple dispatching contexts, This behavior, on the
one hand, introduce intense lock contention:

t1:                     t2:                     t3:
lock                    lock                    lock
// grab lock
ops.dispatch_request
unlock
                        // grab lock
                        ops.dispatch_request
                        unlock
                                                // grab lock
                                                ops.dispatch_request
                                                unlock

on the other hand, messing up the requests dispatching order:
t1:

lock
rq1 = ops.dispatch_request
unlock
                        t2:
                        lock
                        rq2 = ops.dispatch_request
                        unlock

lock
rq3 = ops.dispatch_request
unlock

                        lock
                        rq4 = ops.dispatch_request
                        unlock

//rq1,rq3 issue to disk
                        // rq2, rq4 issue to disk

In this case, the elevator dispatch order is rq 1-2-3-4, however,
such order in disk is rq 1-3-2-4, the order for rq2 and rq3 is inversed.

Fix those problems by introducing elevator_dispatch_requests(), this
helper will grab the lock and dispatch a batch of requests while holding
the lock.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-mq-sched.c | 60 +++++++++++++++++++++++++++++++++++++++++---
 block/blk-mq.h       | 21 ++++++++++++++++
 2 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index f18aecf710ad..c4450b73ab25 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -101,6 +101,54 @@ static bool elevator_can_dispatch(struct sched_dispatch_ctx *ctx)
 	return true;
 }
 
+static void elevator_dispatch_requests(struct sched_dispatch_ctx *ctx)
+{
+	struct request *rq;
+	bool has_get_budget = ctx->q->mq_ops->get_budget != NULL;
+	int budget_token[BUDGET_TOKEN_BATCH];
+	int count = ctx->q->nr_requests;
+	int i;
+
+	while (true) {
+		if (!elevator_can_dispatch(ctx))
+			return;
+
+		if (has_get_budget) {
+			count = blk_mq_get_dispatch_budgets(ctx->q, budget_token);
+			if (count <= 0)
+				return;
+		}
+
+		spin_lock_irq(&ctx->e->lock);
+		for (i = 0; i < count; ++i) {
+			rq = ctx->e->type->ops.dispatch_request(ctx->hctx);
+			if (!rq) {
+				ctx->run_queue = true;
+				goto err_free_budgets;
+			}
+
+			if (has_get_budget)
+				blk_mq_set_rq_budget_token(rq, budget_token[i]);
+			list_add_tail(&rq->queuelist, &ctx->rq_list);
+			ctx->count++;
+			if (rq->mq_hctx != ctx->hctx)
+				ctx->multi_hctxs = true;
+
+			if (!blk_mq_get_driver_tag(rq)) {
+				i++;
+				goto err_free_budgets;
+			}
+		}
+		spin_unlock_irq(&ctx->e->lock);
+	}
+
+err_free_budgets:
+	spin_unlock_irq(&ctx->e->lock);
+	if (has_get_budget)
+		for (; i < count; ++i)
+			blk_mq_put_dispatch_budget(ctx->q, budget_token[i]);
+}
+
 static bool elevator_dispatch_one_request(struct sched_dispatch_ctx *ctx)
 {
 	bool sq_sched = blk_queue_sq_sched(ctx->q);
@@ -213,10 +261,14 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 	else
 		max_dispatch = hctx->queue->nr_requests;
 
-	do {
-		if (!elevator_dispatch_one_request(&ctx))
-			break;
-	} while (ctx.count < max_dispatch);
+	if (!hctx->dispatch_busy && blk_queue_sq_sched(ctx.q))
+		elevator_dispatch_requests(&ctx);
+	else {
+		do {
+			if (!elevator_dispatch_one_request(&ctx))
+				break;
+		} while (ctx.count < max_dispatch);
+	}
 
 	return elevator_finish_dispatch(&ctx);
 }
diff --git a/block/blk-mq.h b/block/blk-mq.h
index affb2e14b56e..450c16a07841 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -37,6 +37,7 @@ enum {
 };
 
 #define BLK_MQ_CPU_WORK_BATCH	(8)
+#define BUDGET_TOKEN_BATCH	(8)
 
 typedef unsigned int __bitwise blk_insert_t;
 #define BLK_MQ_INSERT_AT_HEAD		((__force blk_insert_t)0x01)
@@ -262,6 +263,26 @@ static inline int blk_mq_get_dispatch_budget(struct request_queue *q)
 	return 0;
 }
 
+static inline int blk_mq_get_dispatch_budgets(struct request_queue *q,
+					      int *budget_token)
+{
+	int count = 0;
+
+	while (count < BUDGET_TOKEN_BATCH) {
+		int token = 0;
+
+		if (q->mq_ops->get_budget)
+			token = q->mq_ops->get_budget(q);
+
+		if (token < 0)
+			return count;
+
+		budget_token[count++] = token;
+	}
+
+	return count;
+}
+
 static inline void blk_mq_set_rq_budget_token(struct request *rq, int token)
 {
 	if (token < 0)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] mq-deadline: switch to use high layer elevator lock
  2025-07-22  7:24 ` [PATCH 1/6] mq-deadline: switch to use high layer elevator lock Yu Kuai
@ 2025-07-23  1:46   ` Damien Le Moal
  2025-07-23  2:07     ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Damien Le Moal @ 2025-07-23  1:46 UTC (permalink / raw)
  To: Yu Kuai, hare, tj, josef, axboe, yukuai3
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi

On 7/22/25 4:24 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Introduce a new spinlock in elevator_queue, and switch dd->lock to
> use the new lock. There are no functional changes.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  block/elevator.c    |  1 +
>  block/elevator.h    |  4 ++--
>  block/mq-deadline.c | 57 ++++++++++++++++++++++-----------------------
>  3 files changed, 31 insertions(+), 31 deletions(-)
> 
> diff --git a/block/elevator.c b/block/elevator.c
> index ab22542e6cf0..91df270d9d91 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -144,6 +144,7 @@ struct elevator_queue *elevator_alloc(struct request_queue *q,
>  	eq->type = e;
>  	kobject_init(&eq->kobj, &elv_ktype);
>  	mutex_init(&eq->sysfs_lock);
> +	spin_lock_init(&eq->lock);
>  	hash_init(eq->hash);
>  
>  	return eq;
> diff --git a/block/elevator.h b/block/elevator.h
> index a07ce773a38f..cbbac4f7825c 100644
> --- a/block/elevator.h
> +++ b/block/elevator.h
> @@ -110,12 +110,12 @@ struct request *elv_rqhash_find(struct request_queue *q, sector_t offset);
>  /*
>   * each queue has an elevator_queue associated with it
>   */
> -struct elevator_queue
> -{
> +struct elevator_queue {
>  	struct elevator_type *type;
>  	void *elevator_data;
>  	struct kobject kobj;
>  	struct mutex sysfs_lock;
> +	spinlock_t lock;
>  	unsigned long flags;
>  	DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
>  };

I wonder if the above should not be its own patch, and the remaining below
staying in this patch as that match exactly the commit title.

Other than that, this looks good to me.

> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 2edf1cac06d5..e31da6de7764 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -101,7 +101,7 @@ struct deadline_data {
>  	u32 async_depth;
>  	int prio_aging_expire;
>  
> -	spinlock_t lock;
> +	spinlock_t *lock;
>  };
>  
>  /* Maps an I/O priority class to a deadline scheduler priority. */
> @@ -213,7 +213,7 @@ static void dd_merged_requests(struct request_queue *q, struct request *req,
>  	const u8 ioprio_class = dd_rq_ioclass(next);
>  	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
>  
> -	lockdep_assert_held(&dd->lock);
> +	lockdep_assert_held(dd->lock);
>  
>  	dd->per_prio[prio].stats.merged++;
>  
> @@ -253,7 +253,7 @@ static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
>  {
>  	const struct io_stats_per_prio *stats = &dd->per_prio[prio].stats;
>  
> -	lockdep_assert_held(&dd->lock);
> +	lockdep_assert_held(dd->lock);
>  
>  	return stats->inserted - atomic_read(&stats->completed);
>  }
> @@ -323,7 +323,7 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
>  	enum dd_prio prio;
>  	u8 ioprio_class;
>  
> -	lockdep_assert_held(&dd->lock);
> +	lockdep_assert_held(dd->lock);
>  
>  	if (!list_empty(&per_prio->dispatch)) {
>  		rq = list_first_entry(&per_prio->dispatch, struct request,
> @@ -434,7 +434,7 @@ static struct request *dd_dispatch_prio_aged_requests(struct deadline_data *dd,
>  	enum dd_prio prio;
>  	int prio_cnt;
>  
> -	lockdep_assert_held(&dd->lock);
> +	lockdep_assert_held(dd->lock);
>  
>  	prio_cnt = !!dd_queued(dd, DD_RT_PRIO) + !!dd_queued(dd, DD_BE_PRIO) +
>  		   !!dd_queued(dd, DD_IDLE_PRIO);
> @@ -466,7 +466,7 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  	struct request *rq;
>  	enum dd_prio prio;
>  
> -	spin_lock(&dd->lock);
> +	spin_lock(dd->lock);
>  	rq = dd_dispatch_prio_aged_requests(dd, now);
>  	if (rq)
>  		goto unlock;
> @@ -482,8 +482,7 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  	}
>  
>  unlock:
> -	spin_unlock(&dd->lock);
> -
> +	spin_unlock(dd->lock);
>  	return rq;
>  }
>  
> @@ -552,9 +551,9 @@ static void dd_exit_sched(struct elevator_queue *e)
>  		WARN_ON_ONCE(!list_empty(&per_prio->fifo_list[DD_READ]));
>  		WARN_ON_ONCE(!list_empty(&per_prio->fifo_list[DD_WRITE]));
>  
> -		spin_lock(&dd->lock);
> +		spin_lock(dd->lock);
>  		queued = dd_queued(dd, prio);
> -		spin_unlock(&dd->lock);
> +		spin_unlock(dd->lock);
>  
>  		WARN_ONCE(queued != 0,
>  			  "statistics for priority %d: i %u m %u d %u c %u\n",
> @@ -601,7 +600,7 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
>  	dd->last_dir = DD_WRITE;
>  	dd->fifo_batch = fifo_batch;
>  	dd->prio_aging_expire = prio_aging_expire;
> -	spin_lock_init(&dd->lock);
> +	dd->lock = &eq->lock;
>  
>  	/* We dispatch from request queue wide instead of hw queue */
>  	blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
> @@ -657,9 +656,9 @@ static bool dd_bio_merge(struct request_queue *q, struct bio *bio,
>  	struct request *free = NULL;
>  	bool ret;
>  
> -	spin_lock(&dd->lock);
> +	spin_lock(dd->lock);
>  	ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
> -	spin_unlock(&dd->lock);
> +	spin_unlock(dd->lock);
>  
>  	if (free)
>  		blk_mq_free_request(free);
> @@ -681,7 +680,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  	struct dd_per_prio *per_prio;
>  	enum dd_prio prio;
>  
> -	lockdep_assert_held(&dd->lock);
> +	lockdep_assert_held(dd->lock);
>  
>  	prio = ioprio_class_to_prio[ioprio_class];
>  	per_prio = &dd->per_prio[prio];
> @@ -725,7 +724,7 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
>  	struct deadline_data *dd = q->elevator->elevator_data;
>  	LIST_HEAD(free);
>  
> -	spin_lock(&dd->lock);
> +	spin_lock(dd->lock);
>  	while (!list_empty(list)) {
>  		struct request *rq;
>  
> @@ -733,7 +732,7 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
>  		list_del_init(&rq->queuelist);
>  		dd_insert_request(hctx, rq, flags, &free);
>  	}
> -	spin_unlock(&dd->lock);
> +	spin_unlock(dd->lock);
>  
>  	blk_mq_free_requests(&free);
>  }
> @@ -849,13 +848,13 @@ static const struct elv_fs_entry deadline_attrs[] = {
>  #define DEADLINE_DEBUGFS_DDIR_ATTRS(prio, data_dir, name)		\
>  static void *deadline_##name##_fifo_start(struct seq_file *m,		\
>  					  loff_t *pos)			\
> -	__acquires(&dd->lock)						\
> +	__acquires(dd->lock)						\
>  {									\
>  	struct request_queue *q = m->private;				\
>  	struct deadline_data *dd = q->elevator->elevator_data;		\
>  	struct dd_per_prio *per_prio = &dd->per_prio[prio];		\
>  									\
> -	spin_lock(&dd->lock);						\
> +	spin_lock(dd->lock);						\
>  	return seq_list_start(&per_prio->fifo_list[data_dir], *pos);	\
>  }									\
>  									\
> @@ -870,12 +869,12 @@ static void *deadline_##name##_fifo_next(struct seq_file *m, void *v,	\
>  }									\
>  									\
>  static void deadline_##name##_fifo_stop(struct seq_file *m, void *v)	\
> -	__releases(&dd->lock)						\
> +	__releases(dd->lock)						\
>  {									\
>  	struct request_queue *q = m->private;				\
>  	struct deadline_data *dd = q->elevator->elevator_data;		\
>  									\
> -	spin_unlock(&dd->lock);						\
> +	spin_unlock(dd->lock);						\
>  }									\
>  									\
>  static const struct seq_operations deadline_##name##_fifo_seq_ops = {	\
> @@ -941,11 +940,11 @@ static int dd_queued_show(void *data, struct seq_file *m)
>  	struct deadline_data *dd = q->elevator->elevator_data;
>  	u32 rt, be, idle;
>  
> -	spin_lock(&dd->lock);
> +	spin_lock(dd->lock);
>  	rt = dd_queued(dd, DD_RT_PRIO);
>  	be = dd_queued(dd, DD_BE_PRIO);
>  	idle = dd_queued(dd, DD_IDLE_PRIO);
> -	spin_unlock(&dd->lock);
> +	spin_unlock(dd->lock);
>  
>  	seq_printf(m, "%u %u %u\n", rt, be, idle);
>  
> @@ -957,7 +956,7 @@ static u32 dd_owned_by_driver(struct deadline_data *dd, enum dd_prio prio)
>  {
>  	const struct io_stats_per_prio *stats = &dd->per_prio[prio].stats;
>  
> -	lockdep_assert_held(&dd->lock);
> +	lockdep_assert_held(dd->lock);
>  
>  	return stats->dispatched + stats->merged -
>  		atomic_read(&stats->completed);
> @@ -969,11 +968,11 @@ static int dd_owned_by_driver_show(void *data, struct seq_file *m)
>  	struct deadline_data *dd = q->elevator->elevator_data;
>  	u32 rt, be, idle;
>  
> -	spin_lock(&dd->lock);
> +	spin_lock(dd->lock);
>  	rt = dd_owned_by_driver(dd, DD_RT_PRIO);
>  	be = dd_owned_by_driver(dd, DD_BE_PRIO);
>  	idle = dd_owned_by_driver(dd, DD_IDLE_PRIO);
> -	spin_unlock(&dd->lock);
> +	spin_unlock(dd->lock);
>  
>  	seq_printf(m, "%u %u %u\n", rt, be, idle);
>  
> @@ -983,13 +982,13 @@ static int dd_owned_by_driver_show(void *data, struct seq_file *m)
>  #define DEADLINE_DISPATCH_ATTR(prio)					\
>  static void *deadline_dispatch##prio##_start(struct seq_file *m,	\
>  					     loff_t *pos)		\
> -	__acquires(&dd->lock)						\
> +	__acquires(dd->lock)						\
>  {									\
>  	struct request_queue *q = m->private;				\
>  	struct deadline_data *dd = q->elevator->elevator_data;		\
>  	struct dd_per_prio *per_prio = &dd->per_prio[prio];		\
>  									\
> -	spin_lock(&dd->lock);						\
> +	spin_lock(dd->lock);						\
>  	return seq_list_start(&per_prio->dispatch, *pos);		\
>  }									\
>  									\
> @@ -1004,12 +1003,12 @@ static void *deadline_dispatch##prio##_next(struct seq_file *m,		\
>  }									\
>  									\
>  static void deadline_dispatch##prio##_stop(struct seq_file *m, void *v)	\
> -	__releases(&dd->lock)						\
> +	__releases(dd->lock)						\
>  {									\
>  	struct request_queue *q = m->private;				\
>  	struct deadline_data *dd = q->elevator->elevator_data;		\
>  									\
> -	spin_unlock(&dd->lock);						\
> +	spin_unlock(dd->lock);						\
>  }									\
>  									\
>  static const struct seq_operations deadline_dispatch##prio##_seq_ops = { \


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] block, bfq: don't grab queue_lock from io path
  2025-07-22  7:24 ` [PATCH 2/6] block, bfq: don't grab queue_lock from io path Yu Kuai
@ 2025-07-23  1:52   ` Damien Le Moal
  2025-07-23  2:04     ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Damien Le Moal @ 2025-07-23  1:52 UTC (permalink / raw)
  To: Yu Kuai, hare, tj, josef, axboe, yukuai3
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi

On 7/22/25 4:24 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Currently issue io can grab queue_lock three times from bfq_bio_merge(),
> bfq_limit_depth() and bfq_prepare_request(), the queue_lock is not
> necessary if icq is already created:
> 
> - queue_usage_counter is already grabbed and queue won't exist;
> - current thread won't exist;
> - if other thread is allocating and inserting new icq to ioc->icq_tree,
>   rcu can be used to protect lookup icq from the raidx tree, it's safe
>   to use extracted icq until queue or current thread exit;
> 
> If ioc or icq is not created, then bfq_prepare_request() will create it,
> which means the task is issuing io to queue the first time, this can
> consider a slow path and queue_lock will still be held to protect
> inserting allocated icq to ioc->icq_tree.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  block/bfq-iosched.c | 24 +++++++-----------------
>  block/blk-ioc.c     | 43 ++++++++++++++++++++++++++++++++++++++-----
>  block/blk.h         |  2 +-
>  3 files changed, 46 insertions(+), 23 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 0cb1e9873aab..58d57c482acd 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -454,17 +454,13 @@ static struct bfq_io_cq *icq_to_bic(struct io_cq *icq)
>   */
>  static struct bfq_io_cq *bfq_bic_lookup(struct request_queue *q)
>  {
> -	struct bfq_io_cq *icq;
> -	unsigned long flags;
> -
> -	if (!current->io_context)
> -		return NULL;
> +	struct io_cq *icq;
>  
> -	spin_lock_irqsave(&q->queue_lock, flags);
> -	icq = icq_to_bic(ioc_lookup_icq(q));
> -	spin_unlock_irqrestore(&q->queue_lock, flags);
> +	rcu_read_lock();
> +	icq = ioc_lookup_icq_rcu(q);
> +	rcu_read_unlock();
>  
> -	return icq;
> +	return icq_to_bic(icq);

icq cannot be NULL here ? If it can, that needs checking, otherwise,
icq_to_bic() will return a bad address.

>  }
>  
>  /*
> @@ -2456,16 +2452,10 @@ static void bfq_remove_request(struct request_queue *q,
>  static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
>  		unsigned int nr_segs)
>  {
> +	/* bic will not be freed until current or elevator exit */

I would drop this comment, or move it somewhere else as having a comment in the
declarations seems odd.

> +	struct bfq_io_cq *bic = bfq_bic_lookup(q);
>  	struct bfq_data *bfqd = q->elevator->elevator_data;
>  	struct request *free = NULL;
> -	/*
> -	 * bfq_bic_lookup grabs the queue_lock: invoke it now and
> -	 * store its return value for later use, to avoid nesting
> -	 * queue_lock inside the bfqd->lock. We assume that the bic
> -	 * returned by bfq_bic_lookup does not go away before
> -	 * bfqd->lock is taken.
> -	 */
> -	struct bfq_io_cq *bic = bfq_bic_lookup(q);
>  	bool ret;
>  
>  	spin_lock_irq(&bfqd->lock);
> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index ce82770c72ab..0be097a37e22 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -314,7 +314,7 @@ int __copy_io(unsigned long clone_flags, struct task_struct *tsk)
>   * Look up io_cq associated with @ioc - @q pair from @ioc.  Must be called
>   * with @q->queue_lock held.
>   */
> -struct io_cq *ioc_lookup_icq(struct request_queue *q)
> +static struct io_cq *ioc_lookup_icq(struct request_queue *q)
>  {
>  	struct io_context *ioc = current->io_context;
>  	struct io_cq *icq;
> @@ -341,7 +341,40 @@ struct io_cq *ioc_lookup_icq(struct request_queue *q)
>  	rcu_read_unlock();
>  	return icq;
>  }
> -EXPORT_SYMBOL(ioc_lookup_icq);
> +
> +/**
> + * ioc_lookup_icq_rcu - lookup io_cq from ioc in io path
> + * @q: the associated request_queue
> + *
> + * Look up io_cq associated with @ioc - @q pair from @ioc.  Must be called
> + * from io path, either return NULL if current issue io to @q for the first
> + * time, or return a valid icq.
> + */
> +struct io_cq *ioc_lookup_icq_rcu(struct request_queue *q)
> +{
> +	struct io_context *ioc = current->io_context;
> +	struct io_cq *icq;
> +
> +	WARN_ON_ONCE(percpu_ref_is_zero(&q->q_usage_counter));
> +
> +	if (!ioc)
> +		return NULL;
> +
> +	icq = rcu_dereference(ioc->icq_hint);
> +	if (icq && icq->q == q)
> +		return icq;
> +
> +	icq = radix_tree_lookup(&ioc->icq_tree, q->id);
> +	if (!icq)
> +		return NULL;
> +
> +	if (WARN_ON_ONCE(icq->q != q))
> +		return NULL;
> +
> +	rcu_assign_pointer(ioc->icq_hint, icq);
> +	return icq;
> +}
> +EXPORT_SYMBOL(ioc_lookup_icq_rcu);
>  
>  /**
>   * ioc_create_icq - create and link io_cq
> @@ -420,9 +453,9 @@ struct io_cq *ioc_find_get_icq(struct request_queue *q)
>  	} else {
>  		get_io_context(ioc);
>  
> -		spin_lock_irq(&q->queue_lock);
> -		icq = ioc_lookup_icq(q);
> -		spin_unlock_irq(&q->queue_lock);
> +		rcu_read_lock();
> +		icq = ioc_lookup_icq_rcu(q);
> +		rcu_read_unlock();
>  	}
>  
>  	if (!icq) {
> diff --git a/block/blk.h b/block/blk.h
> index 468aa83c5a22..3c078e517d59 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -460,7 +460,7 @@ static inline void req_set_nomerge(struct request_queue *q, struct request *req)
>   * Internal io_context interface
>   */
>  struct io_cq *ioc_find_get_icq(struct request_queue *q);
> -struct io_cq *ioc_lookup_icq(struct request_queue *q);
> +struct io_cq *ioc_lookup_icq_rcu(struct request_queue *q);
>  #ifdef CONFIG_BLK_ICQ
>  void ioc_clear_queue(struct request_queue *q);
>  #else

The blk-ioc changes should go into there own patch, to separate block layer
changes and bfq scheduler changes. No ?


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] block, bfq: switch to use elevator lock
  2025-07-22  7:24 ` [PATCH 3/6] block, bfq: switch to use elevator lock Yu Kuai
@ 2025-07-23  1:53   ` Damien Le Moal
  0 siblings, 0 replies; 19+ messages in thread
From: Damien Le Moal @ 2025-07-23  1:53 UTC (permalink / raw)
  To: Yu Kuai, hare, tj, josef, axboe, yukuai3
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi

On 7/22/25 4:24 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Replace the internal spinlock bfqd->lock with the new spinlock in
> elevator_queue. There are no functional changes.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method
  2025-07-22  7:24 ` [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method Yu Kuai
@ 2025-07-23  1:59   ` Damien Le Moal
  2025-07-23  2:17     ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Damien Le Moal @ 2025-07-23  1:59 UTC (permalink / raw)
  To: Yu Kuai, hare, tj, josef, axboe, yukuai3
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi

On 7/22/25 4:24 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Currently, both mq-deadline and bfq have global spin lock that will be
> grabbed inside elevator methods like dispatch_request, insert_requests,
> and bio_merge. And the global lock is the main reason mq-deadline and
> bfq can't scale very well.
> 
> For dispatch_request method, current behavior is dispatching one request at

s/current/the current

> a time. In the case of multiple dispatching contexts, this behavior will
> cause huge lock contention and messing up the requests dispatching

s/messing up/change

> order. And folloiwng patches will support requests batch dispatching to

s/folloiwng/following

> fix thoses problems.
> 
> While dispatching request, blk_mq_get_disatpch_budget() and
> blk_mq_get_driver_tag() must be called, and they are not ready to be
> called inside elevator methods, hence introduce a new method like
> dispatch_requests is not possible.
> 
> In conclusion, this patch factor the global lock out of dispatch_request
> method, and following patches will support request batch dispatch by
> calling the methods multiple time while holding the lock.

You are creating a bisect problem here. This patch breaks the schedulers
dispatch atomicity without the changes to the calls to the elevator methods in
the block layer.

So maybe reorganize these patches to have the block layer changes first, and
move patch 1 and 3 after these to switch mq-deadline and bfq to using the
higher level lock correctly, removing the locking from bfq_dispatch_request()
and dd_dispatch_request().

> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  block/bfq-iosched.c  | 3 ---
>  block/blk-mq-sched.c | 6 ++++++
>  block/mq-deadline.c  | 5 +----
>  3 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 11b81b11242c..9f8a256e43f2 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -5307,8 +5307,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  	struct bfq_queue *in_serv_queue;
>  	bool waiting_rq, idle_timer_disabled = false;
>  
> -	spin_lock_irq(bfqd->lock);
> -
>  	in_serv_queue = bfqd->in_service_queue;
>  	waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
>  
> @@ -5318,7 +5316,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  			waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
>  	}
>  
> -	spin_unlock_irq(bfqd->lock);
>  	bfq_update_dispatch_stats(hctx->queue, rq,
>  			idle_timer_disabled ? in_serv_queue : NULL,
>  				idle_timer_disabled);
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 55a0fd105147..82c4f4eef9ed 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -98,6 +98,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>  		max_dispatch = hctx->queue->nr_requests;
>  
>  	do {
> +		bool sq_sched = blk_queue_sq_sched(q);
>  		struct request *rq;
>  		int budget_token;
>  
> @@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>  		if (budget_token < 0)
>  			break;
>  
> +		if (sq_sched)
> +			spin_lock_irq(&e->lock);
>  		rq = e->type->ops.dispatch_request(hctx);
> +		if (sq_sched)
> +			spin_unlock_irq(&e->lock);
> +
>  		if (!rq) {
>  			blk_mq_put_dispatch_budget(q, budget_token);
>  			/*
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index e31da6de7764..a008e41bc861 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -466,10 +466,9 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  	struct request *rq;
>  	enum dd_prio prio;
>  
> -	spin_lock(dd->lock);
>  	rq = dd_dispatch_prio_aged_requests(dd, now);
>  	if (rq)
> -		goto unlock;
> +		return rq;
>  
>  	/*
>  	 * Next, dispatch requests in priority order. Ignore lower priority
> @@ -481,8 +480,6 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  			break;
>  	}
>  
> -unlock:
> -	spin_unlock(dd->lock);
>  	return rq;
>  }
>  


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] block, bfq: don't grab queue_lock from io path
  2025-07-23  1:52   ` Damien Le Moal
@ 2025-07-23  2:04     ` Yu Kuai
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-07-23  2:04 UTC (permalink / raw)
  To: Damien Le Moal, Yu Kuai, hare, tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi, yukuai (C)

Hi,

在 2025/07/23 9:52, Damien Le Moal 写道:
> On 7/22/25 4:24 PM, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Currently issue io can grab queue_lock three times from bfq_bio_merge(),
>> bfq_limit_depth() and bfq_prepare_request(), the queue_lock is not
>> necessary if icq is already created:
>>
>> - queue_usage_counter is already grabbed and queue won't exist;
>> - current thread won't exist;
>> - if other thread is allocating and inserting new icq to ioc->icq_tree,
>>    rcu can be used to protect lookup icq from the raidx tree, it's safe
>>    to use extracted icq until queue or current thread exit;
>>
>> If ioc or icq is not created, then bfq_prepare_request() will create it,
>> which means the task is issuing io to queue the first time, this can
>> consider a slow path and queue_lock will still be held to protect
>> inserting allocated icq to ioc->icq_tree.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   block/bfq-iosched.c | 24 +++++++-----------------
>>   block/blk-ioc.c     | 43 ++++++++++++++++++++++++++++++++++++++-----
>>   block/blk.h         |  2 +-
>>   3 files changed, 46 insertions(+), 23 deletions(-)
>>
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index 0cb1e9873aab..58d57c482acd 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -454,17 +454,13 @@ static struct bfq_io_cq *icq_to_bic(struct io_cq *icq)
>>    */
>>   static struct bfq_io_cq *bfq_bic_lookup(struct request_queue *q)
>>   {
>> -	struct bfq_io_cq *icq;
>> -	unsigned long flags;
>> -
>> -	if (!current->io_context)
>> -		return NULL;
>> +	struct io_cq *icq;
>>   
>> -	spin_lock_irqsave(&q->queue_lock, flags);
>> -	icq = icq_to_bic(ioc_lookup_icq(q));
>> -	spin_unlock_irqrestore(&q->queue_lock, flags);
>> +	rcu_read_lock();
>> +	icq = ioc_lookup_icq_rcu(q);
>> +	rcu_read_unlock();
>>   
>> -	return icq;
>> +	return icq_to_bic(icq);
> 
> icq cannot be NULL here ? If it can, that needs checking, otherwise,
> icq_to_bic() will return a bad address.

See the comments in icq_to_bic, this is fine.

static struct bfq_io_cq *icq_to_bic(struct io_cq *icq)
{
         /* bic->icq is the first member, %NULL will convert to %NULL */
         return container_of(icq, struct bfq_io_cq, icq);
}

> 
>>   }
>>   
>>   /*
>> @@ -2456,16 +2452,10 @@ static void bfq_remove_request(struct request_queue *q,
>>   static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
>>   		unsigned int nr_segs)
>>   {
>> +	/* bic will not be freed until current or elevator exit */
> 
> I would drop this comment, or move it somewhere else as having a comment in the
> declarations seems odd.

Ok, I'll drop the comment.
> 
>> +	struct bfq_io_cq *bic = bfq_bic_lookup(q);
>>   	struct bfq_data *bfqd = q->elevator->elevator_data;
>>   	struct request *free = NULL;
>> -	/*
>> -	 * bfq_bic_lookup grabs the queue_lock: invoke it now and
>> -	 * store its return value for later use, to avoid nesting
>> -	 * queue_lock inside the bfqd->lock. We assume that the bic
>> -	 * returned by bfq_bic_lookup does not go away before
>> -	 * bfqd->lock is taken.
>> -	 */
>> -	struct bfq_io_cq *bic = bfq_bic_lookup(q);
>>   	bool ret;
>>   
>>   	spin_lock_irq(&bfqd->lock);
>> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
>> index ce82770c72ab..0be097a37e22 100644
>> --- a/block/blk-ioc.c
>> +++ b/block/blk-ioc.c
>> @@ -314,7 +314,7 @@ int __copy_io(unsigned long clone_flags, struct task_struct *tsk)
>>    * Look up io_cq associated with @ioc - @q pair from @ioc.  Must be called
>>    * with @q->queue_lock held.
>>    */
>> -struct io_cq *ioc_lookup_icq(struct request_queue *q)
>> +static struct io_cq *ioc_lookup_icq(struct request_queue *q)
>>   {
>>   	struct io_context *ioc = current->io_context;
>>   	struct io_cq *icq;
>> @@ -341,7 +341,40 @@ struct io_cq *ioc_lookup_icq(struct request_queue *q)
>>   	rcu_read_unlock();
>>   	return icq;
>>   }
>> -EXPORT_SYMBOL(ioc_lookup_icq);
>> +
>> +/**
>> + * ioc_lookup_icq_rcu - lookup io_cq from ioc in io path
>> + * @q: the associated request_queue
>> + *
>> + * Look up io_cq associated with @ioc - @q pair from @ioc.  Must be called
>> + * from io path, either return NULL if current issue io to @q for the first
>> + * time, or return a valid icq.
>> + */
>> +struct io_cq *ioc_lookup_icq_rcu(struct request_queue *q)
>> +{
>> +	struct io_context *ioc = current->io_context;
>> +	struct io_cq *icq;
>> +
>> +	WARN_ON_ONCE(percpu_ref_is_zero(&q->q_usage_counter));
>> +
>> +	if (!ioc)
>> +		return NULL;
>> +
>> +	icq = rcu_dereference(ioc->icq_hint);
>> +	if (icq && icq->q == q)
>> +		return icq;
>> +
>> +	icq = radix_tree_lookup(&ioc->icq_tree, q->id);
>> +	if (!icq)
>> +		return NULL;
>> +
>> +	if (WARN_ON_ONCE(icq->q != q))
>> +		return NULL;
>> +
>> +	rcu_assign_pointer(ioc->icq_hint, icq);
>> +	return icq;
>> +}
>> +EXPORT_SYMBOL(ioc_lookup_icq_rcu);
>>   
>>   /**
>>    * ioc_create_icq - create and link io_cq
>> @@ -420,9 +453,9 @@ struct io_cq *ioc_find_get_icq(struct request_queue *q)
>>   	} else {
>>   		get_io_context(ioc);
>>   
>> -		spin_lock_irq(&q->queue_lock);
>> -		icq = ioc_lookup_icq(q);
>> -		spin_unlock_irq(&q->queue_lock);
>> +		rcu_read_lock();
>> +		icq = ioc_lookup_icq_rcu(q);
>> +		rcu_read_unlock();
>>   	}
>>   
>>   	if (!icq) {
>> diff --git a/block/blk.h b/block/blk.h
>> index 468aa83c5a22..3c078e517d59 100644
>> --- a/block/blk.h
>> +++ b/block/blk.h
>> @@ -460,7 +460,7 @@ static inline void req_set_nomerge(struct request_queue *q, struct request *req)
>>    * Internal io_context interface
>>    */
>>   struct io_cq *ioc_find_get_icq(struct request_queue *q);
>> -struct io_cq *ioc_lookup_icq(struct request_queue *q);
>> +struct io_cq *ioc_lookup_icq_rcu(struct request_queue *q);
>>   #ifdef CONFIG_BLK_ICQ
>>   void ioc_clear_queue(struct request_queue *q);
>>   #else
> 
> The blk-ioc changes should go into there own patch, to separate block layer
> changes and bfq scheduler changes. No ?

Actually bfq is the only user of blk-ioc, in order to separate changes,
should I do following?

patch 1, add helper ioc_lookup_icq_rcu
patch 2, convert bfq to use this helper
patch 3, cleanup the old helper

If so, I'll move above changes in the front of this set.

Thanks,
Kuai
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] mq-deadline: switch to use high layer elevator lock
  2025-07-23  1:46   ` Damien Le Moal
@ 2025-07-23  2:07     ` Yu Kuai
  2025-07-23  2:38       ` Damien Le Moal
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-07-23  2:07 UTC (permalink / raw)
  To: Damien Le Moal, Yu Kuai, hare, tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi, yukuai (C)

Hi,

在 2025/07/23 9:46, Damien Le Moal 写道:
> On 7/22/25 4:24 PM, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Introduce a new spinlock in elevator_queue, and switch dd->lock to
>> use the new lock. There are no functional changes.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   block/elevator.c    |  1 +
>>   block/elevator.h    |  4 ++--
>>   block/mq-deadline.c | 57 ++++++++++++++++++++++-----------------------
>>   3 files changed, 31 insertions(+), 31 deletions(-)
>>
>> diff --git a/block/elevator.c b/block/elevator.c
>> index ab22542e6cf0..91df270d9d91 100644
>> --- a/block/elevator.c
>> +++ b/block/elevator.c
>> @@ -144,6 +144,7 @@ struct elevator_queue *elevator_alloc(struct request_queue *q,
>>   	eq->type = e;
>>   	kobject_init(&eq->kobj, &elv_ktype);
>>   	mutex_init(&eq->sysfs_lock);
>> +	spin_lock_init(&eq->lock);
>>   	hash_init(eq->hash);
>>   
>>   	return eq;
>> diff --git a/block/elevator.h b/block/elevator.h
>> index a07ce773a38f..cbbac4f7825c 100644
>> --- a/block/elevator.h
>> +++ b/block/elevator.h
>> @@ -110,12 +110,12 @@ struct request *elv_rqhash_find(struct request_queue *q, sector_t offset);
>>   /*
>>    * each queue has an elevator_queue associated with it
>>    */
>> -struct elevator_queue
>> -{
>> +struct elevator_queue {
>>   	struct elevator_type *type;
>>   	void *elevator_data;
>>   	struct kobject kobj;
>>   	struct mutex sysfs_lock;
>> +	spinlock_t lock;
>>   	unsigned long flags;
>>   	DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
>>   };
> 
> I wonder if the above should not be its own patch, and the remaining below
> staying in this patch as that match exactly the commit title.

I think you mean *should be it's own patch*. I don't have preference and
I can do that in the next version :)

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method
  2025-07-23  1:59   ` Damien Le Moal
@ 2025-07-23  2:17     ` Yu Kuai
  2025-07-23  2:42       ` Damien Le Moal
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-07-23  2:17 UTC (permalink / raw)
  To: Damien Le Moal, Yu Kuai, hare, tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi, yukuai (C)

Hi,

在 2025/07/23 9:59, Damien Le Moal 写道:
> On 7/22/25 4:24 PM, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Currently, both mq-deadline and bfq have global spin lock that will be
>> grabbed inside elevator methods like dispatch_request, insert_requests,
>> and bio_merge. And the global lock is the main reason mq-deadline and
>> bfq can't scale very well.
>>
>> For dispatch_request method, current behavior is dispatching one request at
> 
> s/current/the current
> 
>> a time. In the case of multiple dispatching contexts, this behavior will
>> cause huge lock contention and messing up the requests dispatching
> 
> s/messing up/change
> 
>> order. And folloiwng patches will support requests batch dispatching to
> 
> s/folloiwng/following
> 
>> fix thoses problems.
>>
>> While dispatching request, blk_mq_get_disatpch_budget() and
>> blk_mq_get_driver_tag() must be called, and they are not ready to be
>> called inside elevator methods, hence introduce a new method like
>> dispatch_requests is not possible.
>>
>> In conclusion, this patch factor the global lock out of dispatch_request
>> method, and following patches will support request batch dispatch by
>> calling the methods multiple time while holding the lock.
> 
> You are creating a bisect problem here. This patch breaks the schedulers
> dispatch atomicity without the changes to the calls to the elevator methods in
> the block layer.

I'm not sure why there will be bisect problem, I think git checkout to
any patch in this set should work just fine. Can you please explain a
bit more?
> 
> So maybe reorganize these patches to have the block layer changes first, and
> move patch 1 and 3 after these to switch mq-deadline and bfq to using the
> higher level lock correctly, removing the locking from bfq_dispatch_request()
> and dd_dispatch_request().

Sure, I can to the reorganize.

Thanks,
Kuai

> 
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   block/bfq-iosched.c  | 3 ---
>>   block/blk-mq-sched.c | 6 ++++++
>>   block/mq-deadline.c  | 5 +----
>>   3 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index 11b81b11242c..9f8a256e43f2 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -5307,8 +5307,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
>>   	struct bfq_queue *in_serv_queue;
>>   	bool waiting_rq, idle_timer_disabled = false;
>>   
>> -	spin_lock_irq(bfqd->lock);
>> -
>>   	in_serv_queue = bfqd->in_service_queue;
>>   	waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
>>   
>> @@ -5318,7 +5316,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
>>   			waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
>>   	}
>>   
>> -	spin_unlock_irq(bfqd->lock);
>>   	bfq_update_dispatch_stats(hctx->queue, rq,
>>   			idle_timer_disabled ? in_serv_queue : NULL,
>>   				idle_timer_disabled);
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index 55a0fd105147..82c4f4eef9ed 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -98,6 +98,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>>   		max_dispatch = hctx->queue->nr_requests;
>>   
>>   	do {
>> +		bool sq_sched = blk_queue_sq_sched(q);
>>   		struct request *rq;
>>   		int budget_token;
>>   
>> @@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>>   		if (budget_token < 0)
>>   			break;
>>   
>> +		if (sq_sched)
>> +			spin_lock_irq(&e->lock);
>>   		rq = e->type->ops.dispatch_request(hctx);
>> +		if (sq_sched)
>> +			spin_unlock_irq(&e->lock);
>> +
>>   		if (!rq) {
>>   			blk_mq_put_dispatch_budget(q, budget_token);
>>   			/*
>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>> index e31da6de7764..a008e41bc861 100644
>> --- a/block/mq-deadline.c
>> +++ b/block/mq-deadline.c
>> @@ -466,10 +466,9 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>>   	struct request *rq;
>>   	enum dd_prio prio;
>>   
>> -	spin_lock(dd->lock);
>>   	rq = dd_dispatch_prio_aged_requests(dd, now);
>>   	if (rq)
>> -		goto unlock;
>> +		return rq;
>>   
>>   	/*
>>   	 * Next, dispatch requests in priority order. Ignore lower priority
>> @@ -481,8 +480,6 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>>   			break;
>>   	}
>>   
>> -unlock:
>> -	spin_unlock(dd->lock);
>>   	return rq;
>>   }
>>   
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] mq-deadline: switch to use high layer elevator lock
  2025-07-23  2:07     ` Yu Kuai
@ 2025-07-23  2:38       ` Damien Le Moal
  0 siblings, 0 replies; 19+ messages in thread
From: Damien Le Moal @ 2025-07-23  2:38 UTC (permalink / raw)
  To: Yu Kuai, hare, tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi, yukuai (C)

On 7/23/25 11:07 AM, Yu Kuai wrote:
> Hi,
> 
> 在 2025/07/23 9:46, Damien Le Moal 写道:
>> On 7/22/25 4:24 PM, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> Introduce a new spinlock in elevator_queue, and switch dd->lock to
>>> use the new lock. There are no functional changes.
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>> ---
>>>   block/elevator.c    |  1 +
>>>   block/elevator.h    |  4 ++--
>>>   block/mq-deadline.c | 57 ++++++++++++++++++++++-----------------------
>>>   3 files changed, 31 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/block/elevator.c b/block/elevator.c
>>> index ab22542e6cf0..91df270d9d91 100644
>>> --- a/block/elevator.c
>>> +++ b/block/elevator.c
>>> @@ -144,6 +144,7 @@ struct elevator_queue *elevator_alloc(struct
>>> request_queue *q,
>>>       eq->type = e;
>>>       kobject_init(&eq->kobj, &elv_ktype);
>>>       mutex_init(&eq->sysfs_lock);
>>> +    spin_lock_init(&eq->lock);
>>>       hash_init(eq->hash);
>>>         return eq;
>>> diff --git a/block/elevator.h b/block/elevator.h
>>> index a07ce773a38f..cbbac4f7825c 100644
>>> --- a/block/elevator.h
>>> +++ b/block/elevator.h
>>> @@ -110,12 +110,12 @@ struct request *elv_rqhash_find(struct request_queue
>>> *q, sector_t offset);
>>>   /*
>>>    * each queue has an elevator_queue associated with it
>>>    */
>>> -struct elevator_queue
>>> -{
>>> +struct elevator_queue {
>>>       struct elevator_type *type;
>>>       void *elevator_data;
>>>       struct kobject kobj;
>>>       struct mutex sysfs_lock;
>>> +    spinlock_t lock;
>>>       unsigned long flags;
>>>       DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
>>>   };
>>
>> I wonder if the above should not be its own patch, and the remaining below
>> staying in this patch as that match exactly the commit title.
> 
> I think you mean *should be it's own patch*. I don't have preference and

Yes, that is what I meant. Sorry about the typo.

> I can do that in the next version :)
> 
> Thanks,
> Kuai
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method
  2025-07-23  2:17     ` Yu Kuai
@ 2025-07-23  2:42       ` Damien Le Moal
  2025-07-23  2:51         ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Damien Le Moal @ 2025-07-23  2:42 UTC (permalink / raw)
  To: Yu Kuai, hare, tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi, yukuai (C)

On 7/23/25 11:17 AM, Yu Kuai wrote:
> Hi,
> 
> 在 2025/07/23 9:59, Damien Le Moal 写道:
>> On 7/22/25 4:24 PM, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> Currently, both mq-deadline and bfq have global spin lock that will be
>>> grabbed inside elevator methods like dispatch_request, insert_requests,
>>> and bio_merge. And the global lock is the main reason mq-deadline and
>>> bfq can't scale very well.
>>>
>>> For dispatch_request method, current behavior is dispatching one request at
>>
>> s/current/the current
>>
>>> a time. In the case of multiple dispatching contexts, this behavior will
>>> cause huge lock contention and messing up the requests dispatching
>>
>> s/messing up/change
>>
>>> order. And folloiwng patches will support requests batch dispatching to
>>
>> s/folloiwng/following
>>
>>> fix thoses problems.
>>>
>>> While dispatching request, blk_mq_get_disatpch_budget() and
>>> blk_mq_get_driver_tag() must be called, and they are not ready to be
>>> called inside elevator methods, hence introduce a new method like
>>> dispatch_requests is not possible.
>>>
>>> In conclusion, this patch factor the global lock out of dispatch_request
>>> method, and following patches will support request batch dispatch by
>>> calling the methods multiple time while holding the lock.
>>
>> You are creating a bisect problem here. This patch breaks the schedulers
>> dispatch atomicity without the changes to the calls to the elevator methods in
>> the block layer.
> 
> I'm not sure why there will be bisect problem, I think git checkout to
> any patch in this set should work just fine. Can you please explain a
> bit more?

If you apply this patch, stop here without applying the following patches, and
test the changes up to this point, things will break since there is no locking
during dispatch.

So you need to organize the patches so that you first have the elevator level
common locking in place and then have one patch for bfq and one patch for
mq-deadline that switch to using that new lock. Hence the suggestion to reverse
the order of your changes: change the block layer first, then have bfq and
mq-deadline use that new locking.

>>
>> So maybe reorganize these patches to have the block layer changes first, and
>> move patch 1 and 3 after these to switch mq-deadline and bfq to using the
>> higher level lock correctly, removing the locking from bfq_dispatch_request()
>> and dd_dispatch_request().
> 
> Sure, I can to the reorganize.
> 
> Thanks,
> Kuai
> 
>>
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>> ---
>>>   block/bfq-iosched.c  | 3 ---
>>>   block/blk-mq-sched.c | 6 ++++++
>>>   block/mq-deadline.c  | 5 +----
>>>   3 files changed, 7 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>> index 11b81b11242c..9f8a256e43f2 100644
>>> --- a/block/bfq-iosched.c
>>> +++ b/block/bfq-iosched.c
>>> @@ -5307,8 +5307,6 @@ static struct request *bfq_dispatch_request(struct
>>> blk_mq_hw_ctx *hctx)
>>>       struct bfq_queue *in_serv_queue;
>>>       bool waiting_rq, idle_timer_disabled = false;
>>>   -    spin_lock_irq(bfqd->lock);
>>> -
>>>       in_serv_queue = bfqd->in_service_queue;
>>>       waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
>>>   @@ -5318,7 +5316,6 @@ static struct request *bfq_dispatch_request(struct
>>> blk_mq_hw_ctx *hctx)
>>>               waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
>>>       }
>>>   -    spin_unlock_irq(bfqd->lock);
>>>       bfq_update_dispatch_stats(hctx->queue, rq,
>>>               idle_timer_disabled ? in_serv_queue : NULL,
>>>                   idle_timer_disabled);
>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>>> index 55a0fd105147..82c4f4eef9ed 100644
>>> --- a/block/blk-mq-sched.c
>>> +++ b/block/blk-mq-sched.c
>>> @@ -98,6 +98,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx
>>> *hctx)
>>>           max_dispatch = hctx->queue->nr_requests;
>>>         do {
>>> +        bool sq_sched = blk_queue_sq_sched(q);
>>>           struct request *rq;
>>>           int budget_token;
>>>   @@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct
>>> blk_mq_hw_ctx *hctx)
>>>           if (budget_token < 0)
>>>               break;
>>>   +        if (sq_sched)
>>> +            spin_lock_irq(&e->lock);
>>>           rq = e->type->ops.dispatch_request(hctx);
>>> +        if (sq_sched)
>>> +            spin_unlock_irq(&e->lock);
>>> +
>>>           if (!rq) {
>>>               blk_mq_put_dispatch_budget(q, budget_token);
>>>               /*
>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>>> index e31da6de7764..a008e41bc861 100644
>>> --- a/block/mq-deadline.c
>>> +++ b/block/mq-deadline.c
>>> @@ -466,10 +466,9 @@ static struct request *dd_dispatch_request(struct
>>> blk_mq_hw_ctx *hctx)
>>>       struct request *rq;
>>>       enum dd_prio prio;
>>>   -    spin_lock(dd->lock);
>>>       rq = dd_dispatch_prio_aged_requests(dd, now);
>>>       if (rq)
>>> -        goto unlock;
>>> +        return rq;
>>>         /*
>>>        * Next, dispatch requests in priority order. Ignore lower priority
>>> @@ -481,8 +480,6 @@ static struct request *dd_dispatch_request(struct
>>> blk_mq_hw_ctx *hctx)
>>>               break;
>>>       }
>>>   -unlock:
>>> -    spin_unlock(dd->lock);
>>>       return rq;
>>>   }
>>>   
>>
>>
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method
  2025-07-23  2:42       ` Damien Le Moal
@ 2025-07-23  2:51         ` Yu Kuai
  2025-07-23  4:34           ` Damien Le Moal
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-07-23  2:51 UTC (permalink / raw)
  To: Damien Le Moal, Yu Kuai, hare, tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi, yukuai (C)

Hi,

在 2025/07/23 10:42, Damien Le Moal 写道:
> On 7/23/25 11:17 AM, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/07/23 9:59, Damien Le Moal 写道:
>>> On 7/22/25 4:24 PM, Yu Kuai wrote:
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> Currently, both mq-deadline and bfq have global spin lock that will be
>>>> grabbed inside elevator methods like dispatch_request, insert_requests,
>>>> and bio_merge. And the global lock is the main reason mq-deadline and
>>>> bfq can't scale very well.
>>>>
>>>> For dispatch_request method, current behavior is dispatching one request at
>>>
>>> s/current/the current
>>>
>>>> a time. In the case of multiple dispatching contexts, this behavior will
>>>> cause huge lock contention and messing up the requests dispatching
>>>
>>> s/messing up/change
>>>
>>>> order. And folloiwng patches will support requests batch dispatching to
>>>
>>> s/folloiwng/following
>>>
>>>> fix thoses problems.
>>>>
>>>> While dispatching request, blk_mq_get_disatpch_budget() and
>>>> blk_mq_get_driver_tag() must be called, and they are not ready to be
>>>> called inside elevator methods, hence introduce a new method like
>>>> dispatch_requests is not possible.
>>>>
>>>> In conclusion, this patch factor the global lock out of dispatch_request
>>>> method, and following patches will support request batch dispatch by
>>>> calling the methods multiple time while holding the lock.
>>>
>>> You are creating a bisect problem here. This patch breaks the schedulers
>>> dispatch atomicity without the changes to the calls to the elevator methods in
>>> the block layer.
>>
>> I'm not sure why there will be bisect problem, I think git checkout to
>> any patch in this set should work just fine. Can you please explain a
>> bit more?
> 
> If you apply this patch, stop here without applying the following patches, and
> test the changes up to this point, things will break since there is no locking
> during dispatch.

Do you missed the following change in this patch? Dispatch do switch to
the new lock, I don't get it why there is no locking.

@@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct 
blk_mq_hw_ctx *hctx)
  		if (budget_token < 0)
  			break;

+		if (sq_sched)
+			spin_lock_irq(&e->lock);
  		rq = e->type->ops.dispatch_request(hctx);
+		if (sq_sched)
+			spin_unlock_irq(&e->lock);
+
  		if (!rq) {
  			blk_mq_put_dispatch_budget(q, budget_token);
  			/*
> 
> So you need to organize the patches so that you first have the elevator level
> common locking in place and then have one patch for bfq and one patch for
> mq-deadline that switch to using that new lock. Hence the suggestion to reverse
> the order of your changes: change the block layer first, then have bfq and
> mq-deadline use that new locking.

I think I understand what you mean, just to be sure.

1. patch 5 in this set
2. patch to introduce high level lock, and grab it during dispatch in 
block layer.
3. changes in ioc
4. changes in bfq
5. changes in deadline
6. patch 6 in this set.

Thanks,
Kuai

> 
>>>
>>> So maybe reorganize these patches to have the block layer changes first, and
>>> move patch 1 and 3 after these to switch mq-deadline and bfq to using the
>>> higher level lock correctly, removing the locking from bfq_dispatch_request()
>>> and dd_dispatch_request().
>>
>> Sure, I can to the reorganize.
>>
>> Thanks,
>> Kuai
>>
>>>
>>>>
>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>> ---
>>>>    block/bfq-iosched.c  | 3 ---
>>>>    block/blk-mq-sched.c | 6 ++++++
>>>>    block/mq-deadline.c  | 5 +----
>>>>    3 files changed, 7 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>>> index 11b81b11242c..9f8a256e43f2 100644
>>>> --- a/block/bfq-iosched.c
>>>> +++ b/block/bfq-iosched.c
>>>> @@ -5307,8 +5307,6 @@ static struct request *bfq_dispatch_request(struct
>>>> blk_mq_hw_ctx *hctx)
>>>>        struct bfq_queue *in_serv_queue;
>>>>        bool waiting_rq, idle_timer_disabled = false;
>>>>    -    spin_lock_irq(bfqd->lock);
>>>> -
>>>>        in_serv_queue = bfqd->in_service_queue;
>>>>        waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
>>>>    @@ -5318,7 +5316,6 @@ static struct request *bfq_dispatch_request(struct
>>>> blk_mq_hw_ctx *hctx)
>>>>                waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
>>>>        }
>>>>    -    spin_unlock_irq(bfqd->lock);
>>>>        bfq_update_dispatch_stats(hctx->queue, rq,
>>>>                idle_timer_disabled ? in_serv_queue : NULL,
>>>>                    idle_timer_disabled);
>>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>>>> index 55a0fd105147..82c4f4eef9ed 100644
>>>> --- a/block/blk-mq-sched.c
>>>> +++ b/block/blk-mq-sched.c
>>>> @@ -98,6 +98,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx
>>>> *hctx)
>>>>            max_dispatch = hctx->queue->nr_requests;
>>>>          do {
>>>> +        bool sq_sched = blk_queue_sq_sched(q);
>>>>            struct request *rq;
>>>>            int budget_token;
>>>>    @@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct
>>>> blk_mq_hw_ctx *hctx)
>>>>            if (budget_token < 0)
>>>>                break;
>>>>    +        if (sq_sched)
>>>> +            spin_lock_irq(&e->lock);
>>>>            rq = e->type->ops.dispatch_request(hctx);
>>>> +        if (sq_sched)
>>>> +            spin_unlock_irq(&e->lock);
>>>> +
>>>>            if (!rq) {
>>>>                blk_mq_put_dispatch_budget(q, budget_token);
>>>>                /*
>>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>>>> index e31da6de7764..a008e41bc861 100644
>>>> --- a/block/mq-deadline.c
>>>> +++ b/block/mq-deadline.c
>>>> @@ -466,10 +466,9 @@ static struct request *dd_dispatch_request(struct
>>>> blk_mq_hw_ctx *hctx)
>>>>        struct request *rq;
>>>>        enum dd_prio prio;
>>>>    -    spin_lock(dd->lock);
>>>>        rq = dd_dispatch_prio_aged_requests(dd, now);
>>>>        if (rq)
>>>> -        goto unlock;
>>>> +        return rq;
>>>>          /*
>>>>         * Next, dispatch requests in priority order. Ignore lower priority
>>>> @@ -481,8 +480,6 @@ static struct request *dd_dispatch_request(struct
>>>> blk_mq_hw_ctx *hctx)
>>>>                break;
>>>>        }
>>>>    -unlock:
>>>> -    spin_unlock(dd->lock);
>>>>        return rq;
>>>>    }
>>>>    
>>>
>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method
  2025-07-23  2:51         ` Yu Kuai
@ 2025-07-23  4:34           ` Damien Le Moal
  2025-07-23  6:10             ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Damien Le Moal @ 2025-07-23  4:34 UTC (permalink / raw)
  To: Yu Kuai, hare, tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi, yukuai (C)

On 7/23/25 11:51 AM, Yu Kuai wrote:
>> If you apply this patch, stop here without applying the following patches, and
>> test the changes up to this point, things will break since there is no locking
>> during dispatch.
> 
> Do you missed the following change in this patch? Dispatch do switch to
> the new lock, I don't get it why there is no locking.

My bad. Yes, I completely missed it. Sorry for the noise.

> @@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx
> *hctx)
>          if (budget_token < 0)
>              break;
> 
> +        if (sq_sched)
> +            spin_lock_irq(&e->lock);
>          rq = e->type->ops.dispatch_request(hctx);
> +        if (sq_sched)
> +            spin_unlock_irq(&e->lock);
> +
>          if (!rq) {
>              blk_mq_put_dispatch_budget(q, budget_token);
>              /*
>>
>> So you need to organize the patches so that you first have the elevator level
>> common locking in place and then have one patch for bfq and one patch for
>> mq-deadline that switch to using that new lock. Hence the suggestion to reverse
>> the order of your changes: change the block layer first, then have bfq and
>> mq-deadline use that new locking.
> 
> I think I understand what you mean, just to be sure.
> 
> 1. patch 5 in this set
> 2. patch to introduce high level lock, and grab it during dispatch in block layer.
> 3. changes in ioc
> 4. changes in bfq
> 5. changes in deadline
> 6. patch 6 in this set.

What about something like this:
1) Introduce the elevator common/generic lock (first part of patch 1 + middle
of patch 4 squashed together)
2) Convert deadline to use elevator generic lock (second part of patch 1 + end
of patch 4)
3) Convert bfq to use elevator generic lock (patch 3 + beginning of patch 4)
4) Patch 6

As for the ioc changes, they do not seem directly related to the elevator lock
changes, but since the code may conflict, maybe bring them as prep patches at
the beginning (0).


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method
  2025-07-23  4:34           ` Damien Le Moal
@ 2025-07-23  6:10             ` Yu Kuai
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-07-23  6:10 UTC (permalink / raw)
  To: Damien Le Moal, Yu Kuai, hare, tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi, yukuai (C)

Hi,

在 2025/07/23 12:34, Damien Le Moal 写道:
> What about something like this:
> 1) Introduce the elevator common/generic lock (first part of patch 1 + middle
> of patch 4 squashed together)
> 2) Convert deadline to use elevator generic lock (second part of patch 1 + end
> of patch 4)
> 3) Convert bfq to use elevator generic lock (patch 3 + beginning of patch 4)
> 4) Patch 6
> 
> As for the ioc changes, they do not seem directly related to the elevator lock
> changes, but since the code may conflict, maybe bring them as prep patches at
> the beginning (0).

This sounds good. BTW, the ioc changes has to be in front of step 3), to
prevent queue_lock to be nested in elevator lock.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-07-23  6:10 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-22  7:24 [PATCH 0/6] blk-mq-sched: support request batch dispatching for sq elevator Yu Kuai
2025-07-22  7:24 ` [PATCH 1/6] mq-deadline: switch to use high layer elevator lock Yu Kuai
2025-07-23  1:46   ` Damien Le Moal
2025-07-23  2:07     ` Yu Kuai
2025-07-23  2:38       ` Damien Le Moal
2025-07-22  7:24 ` [PATCH 2/6] block, bfq: don't grab queue_lock from io path Yu Kuai
2025-07-23  1:52   ` Damien Le Moal
2025-07-23  2:04     ` Yu Kuai
2025-07-22  7:24 ` [PATCH 3/6] block, bfq: switch to use elevator lock Yu Kuai
2025-07-23  1:53   ` Damien Le Moal
2025-07-22  7:24 ` [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method Yu Kuai
2025-07-23  1:59   ` Damien Le Moal
2025-07-23  2:17     ` Yu Kuai
2025-07-23  2:42       ` Damien Le Moal
2025-07-23  2:51         ` Yu Kuai
2025-07-23  4:34           ` Damien Le Moal
2025-07-23  6:10             ` Yu Kuai
2025-07-22  7:24 ` [PATCH 5/6] blk-mq-sched: refactor __blk_mq_do_dispatch_sched() Yu Kuai
2025-07-22  7:24 ` [PATCH 6/6] blk-mq-sched: support request batch dispatching for sq elevator Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).