public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT
@ 2025-03-08 16:14 Ming Lei
  2025-03-08 16:14 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei

Hello Jens,

This patchset improves loop aio perf by using IOCB_NOWAIT for avoiding to queue aio
command to workqueue context, meantime refactor lo_rw_aio() a bit.

The last patch adds MQ support, which improves perf a bit in case of multiple
IO jobs.

In my test VM, loop disk perf becomes very close to perf of the backing block
device(nvme/mq virtio-scsi).

Thanks,
Ming


Ming Lei (5):
  loop: remove 'rw' parameter from lo_rw_aio()
  loop: cleanup lo_rw_aio()
  loop: add helper loop_queue_work_prep
  loop: try to handle loop aio command via NOWAIT IO first
  loop: add module parameter of 'nr_hw_queues'

 drivers/block/loop.c | 225 ++++++++++++++++++++++++++++++-------------
 1 file changed, 156 insertions(+), 69 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] block: loop: share code of reread partitions
  2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
  2025-03-08 16:17   ` Ming Lei
  2025-03-08 16:14 ` [PATCH] loop: fallback to buffered IO in case of dio submission failure Ming Lei
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei

loop_reread_partitions() has been there for rereading partitions, so
replace the open code in __loop_clr_fd() with loop_reread_partitions()
by passing 'locked' parameter.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/loop.c | 29 ++++++++++++-----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index a943207705dd..0e08468b9ce0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -650,13 +650,17 @@ static inline void loop_update_dio(struct loop_device *lo)
 }
 
 static void loop_reread_partitions(struct loop_device *lo,
-				   struct block_device *bdev)
+				   struct block_device *bdev, bool locked)
 {
 	int rc;
 
-	mutex_lock(&bdev->bd_mutex);
-	rc = bdev_disk_changed(bdev, false);
-	mutex_unlock(&bdev->bd_mutex);
+	if (locked) {
+		rc = bdev_disk_changed(bdev, false);
+	} else {
+		mutex_lock(&bdev->bd_mutex);
+		rc = bdev_disk_changed(bdev, false);
+		mutex_unlock(&bdev->bd_mutex);
+	}
 	if (rc)
 		pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n",
 			__func__, lo->lo_number, lo->lo_file_name, rc);
@@ -754,7 +758,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
 	 */
 	fput(old_file);
 	if (partscan)
-		loop_reread_partitions(lo, bdev);
+		loop_reread_partitions(lo, bdev, false);
 	return 0;
 
 out_err:
@@ -1179,7 +1183,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	bdgrab(bdev);
 	mutex_unlock(&loop_ctl_mutex);
 	if (partscan)
-		loop_reread_partitions(lo, bdev);
+		loop_reread_partitions(lo, bdev, false);
 	if (claimed_bdev)
 		bd_abort_claiming(bdev, claimed_bdev, loop_configure);
 	return 0;
@@ -1270,16 +1274,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
 		 * must be at least one and it can only become zero when the
 		 * current holder is released.
 		 */
-		if (!release)
-			mutex_lock(&bdev->bd_mutex);
-		err = bdev_disk_changed(bdev, false);
-		if (!release)
-			mutex_unlock(&bdev->bd_mutex);
-		if (err)
-			pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
-				__func__, lo_number, err);
-		/* Device is gone, no point in returning error */
-		err = 0;
+		loop_reread_partitions(lo, bdev, release);
 	}
 
 	/*
@@ -1420,7 +1415,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 out_unlock:
 	mutex_unlock(&loop_ctl_mutex);
 	if (partscan)
-		loop_reread_partitions(lo, bdev);
+		loop_reread_partitions(lo, bdev, false);
 
 	return err;
 }
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH] loop: fallback to buffered IO in case of dio submission failure
  2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
  2025-03-08 16:14 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
  2025-03-08 16:14 ` [PATCH 1/5] loop: remove 'rw' parameter from lo_rw_aio() Ming Lei
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/loop.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 7bf4686af774..2fa15933860d 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -562,6 +562,14 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
 	lo_rw_aio_do_completion(cmd);
 }
 
+static inline int lo_call_backing_rw_iter(struct file *file,
+		struct kiocb *iocb, struct iov_iter *iter, bool rw)
+{
+	if (rw == WRITE)
+		return call_write_iter(file, iocb, iter);
+	return call_read_iter(file, iocb, iter);
+}
+
 static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 		     loff_t pos, bool rw)
 {
@@ -619,15 +627,18 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	cmd->iocb.ki_flags = IOCB_DIRECT;
 	cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
 
-	if (rw == WRITE)
-		ret = call_write_iter(file, &cmd->iocb, &iter);
-	else
-		ret = call_read_iter(file, &cmd->iocb, &iter);
+	ret = lo_call_backing_rw_iter(file, &cmd->iocb, &iter, rw);
 
 	lo_rw_aio_do_completion(cmd);
 
-	if (ret != -EIOCBQUEUED)
+	if (ret >= 0) {
 		cmd->iocb.ki_complete(&cmd->iocb, ret, 0);
+	} else if (ret != -EIOCBQUEUED) {
+		/* fallback to buffered IO */
+		cmd->iocb.ki_flags = 0;
+		cmd->ret = lo_call_backing_rw_iter(file, &cmd->iocb, &iter, rw);
+		lo_rw_aio_do_completion(cmd);
+	}
 	return 0;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 1/5] loop: remove 'rw' parameter from lo_rw_aio()
  2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
  2025-03-08 16:14 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
  2025-03-08 16:14 ` [PATCH] loop: fallback to buffered IO in case of dio submission failure Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
  2025-03-08 16:14 ` [PATCH 2/2] block: loop: delete partitions after clearing & changing fd Ming Lei
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei

lo_rw_aio() is only called for READ/WRITE operation, which can be
figured out from request directly, so remove 'rw' parameter from
lo_rw_aio(), meantime rename the local variable as 'dir' which matches
the actual use more.

Meantime merge lo_read_simple() and lo_write_simple() into
lo_rw_simple().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/loop.c | 48 ++++++++++++++++++--------------------------
 1 file changed, 19 insertions(+), 29 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 657bf53decf3..6bbbaa4aaf2c 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -239,31 +239,25 @@ static int lo_write_bvec(struct file *file, struct bio_vec *bvec, loff_t *ppos)
 	return bw;
 }
 
-static int lo_write_simple(struct loop_device *lo, struct request *rq,
-		loff_t pos)
-{
-	struct bio_vec bvec;
-	struct req_iterator iter;
-	int ret = 0;
-
-	rq_for_each_segment(bvec, rq, iter) {
-		ret = lo_write_bvec(lo->lo_backing_file, &bvec, &pos);
-		if (ret < 0)
-			break;
-		cond_resched();
-	}
-
-	return ret;
-}
-
-static int lo_read_simple(struct loop_device *lo, struct request *rq,
-		loff_t pos)
+static int lo_rw_simple(struct loop_device *lo, struct request *rq, loff_t pos)
 {
 	struct bio_vec bvec;
 	struct req_iterator iter;
 	struct iov_iter i;
 	ssize_t len;
 
+	if (req_op(rq) == REQ_OP_WRITE) {
+		int ret;
+
+		rq_for_each_segment(bvec, rq, iter) {
+			ret = lo_write_bvec(lo->lo_backing_file, &bvec, &pos);
+			if (ret < 0)
+				break;
+			cond_resched();
+		}
+		return ret;
+	}
+
 	rq_for_each_segment(bvec, rq, iter) {
 		iov_iter_bvec(&i, ITER_DEST, &bvec, 1, bvec.bv_len);
 		len = vfs_iter_read(lo->lo_backing_file, &i, &pos, 0);
@@ -400,13 +394,13 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret)
 	lo_rw_aio_do_completion(cmd);
 }
 
-static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
-		     loff_t pos, int rw)
+static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
 {
 	struct iov_iter iter;
 	struct req_iterator rq_iter;
 	struct bio_vec *bvec;
 	struct request *rq = blk_mq_rq_from_pdu(cmd);
+	int dir = (req_op(rq) == REQ_OP_READ) ? ITER_DEST : ITER_SOURCE;
 	struct bio *bio = rq->bio;
 	struct file *file = lo->lo_backing_file;
 	struct bio_vec tmp;
@@ -448,7 +442,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	}
 	atomic_set(&cmd->ref, 2);
 
-	iov_iter_bvec(&iter, rw, bvec, nr_bvec, blk_rq_bytes(rq));
+	iov_iter_bvec(&iter, dir, bvec, nr_bvec, blk_rq_bytes(rq));
 	iter.iov_offset = offset;
 
 	cmd->iocb.ki_pos = pos;
@@ -457,7 +451,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	cmd->iocb.ki_flags = IOCB_DIRECT;
 	cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
 
-	if (rw == ITER_SOURCE)
+	if (dir == ITER_SOURCE)
 		ret = file->f_op->write_iter(&cmd->iocb, &iter);
 	else
 		ret = file->f_op->read_iter(&cmd->iocb, &iter);
@@ -498,15 +492,11 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 	case REQ_OP_DISCARD:
 		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
 	case REQ_OP_WRITE:
-		if (cmd->use_aio)
-			return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
-		else
-			return lo_write_simple(lo, rq, pos);
 	case REQ_OP_READ:
 		if (cmd->use_aio)
-			return lo_rw_aio(lo, cmd, pos, ITER_DEST);
+			return lo_rw_aio(lo, cmd, pos);
 		else
-			return lo_read_simple(lo, rq, pos);
+			return lo_rw_simple(lo, rq, pos);
 	default:
 		WARN_ON_ONCE(1);
 		return -EIO;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] block: loop: delete partitions after clearing & changing fd
  2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
                   ` (2 preceding siblings ...)
  2025-03-08 16:14 ` [PATCH 1/5] loop: remove 'rw' parameter from lo_rw_aio() Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
  2025-03-08 16:14 ` [PATCH 2/5] loop: cleanup lo_rw_aio() Ming Lei
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei

After clearing fd or changing fd, we have to delete old partitions,
otherwise they may become ghost partitions.

Fix this issue by clearing GENHD_FL_NO_PART_SCAN during calling
bdev_disk_changed() which won't drop old partitions if GENHD_FL_NO_PART_SCAN
isn't set.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/loop.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0e08468b9ce0..cf71a1bbcd45 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -650,17 +650,26 @@ static inline void loop_update_dio(struct loop_device *lo)
 }
 
 static void loop_reread_partitions(struct loop_device *lo,
-				   struct block_device *bdev, bool locked)
+				   struct block_device *bdev, bool locked,
+				   bool force_scan)
 {
 	int rc;
+	bool no_scan;
 
-	if (locked) {
-		rc = bdev_disk_changed(bdev, false);
-	} else {
+	if (!locked)
 		mutex_lock(&bdev->bd_mutex);
-		rc = bdev_disk_changed(bdev, false);
+
+	no_scan = lo->lo_disk->flags & GENHD_FL_NO_PART_SCAN;
+	if (force_scan && no_scan)
+		lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
+
+	rc = bdev_disk_changed(bdev, false);
+
+	if (force_scan && no_scan)
+		lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
+
+	if (!locked)
 		mutex_unlock(&bdev->bd_mutex);
-	}
 	if (rc)
 		pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n",
 			__func__, lo->lo_number, lo->lo_file_name, rc);
@@ -758,7 +767,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
 	 */
 	fput(old_file);
 	if (partscan)
-		loop_reread_partitions(lo, bdev, false);
+		loop_reread_partitions(lo, bdev, false, true);
 	return 0;
 
 out_err:
@@ -1183,7 +1192,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	bdgrab(bdev);
 	mutex_unlock(&loop_ctl_mutex);
 	if (partscan)
-		loop_reread_partitions(lo, bdev, false);
+		loop_reread_partitions(lo, bdev, false, false);
 	if (claimed_bdev)
 		bd_abort_claiming(bdev, claimed_bdev, loop_configure);
 	return 0;
@@ -1274,7 +1283,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
 		 * must be at least one and it can only become zero when the
 		 * current holder is released.
 		 */
-		loop_reread_partitions(lo, bdev, release);
+		loop_reread_partitions(lo, bdev, release, true);
 	}
 
 	/*
@@ -1415,7 +1424,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 out_unlock:
 	mutex_unlock(&loop_ctl_mutex);
 	if (partscan)
-		loop_reread_partitions(lo, bdev, false);
+		loop_reread_partitions(lo, bdev, false, false);
 
 	return err;
 }
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] loop: cleanup lo_rw_aio()
  2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
                   ` (3 preceding siblings ...)
  2025-03-08 16:14 ` [PATCH 2/2] block: loop: delete partitions after clearing & changing fd Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
  2025-03-08 16:14 ` [PATCH 3/5] loop: add helper loop_queue_work_prep Ming Lei
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei

Cleanup lo_rw_aio() a bit by refactoring it into three parts:

- lo_cmd_nr_bvec(), for calculating how many bvecs in this request

- lo_rw_aio_prep(), for preparing loop command, which need to be called
once

- lo_submit_rw_aio(), for submitting this lo command, which can be
called multiple times

Prepare for trying to handle loop command by NOWAIT read/write IO
first.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/loop.c | 83 +++++++++++++++++++++++++++++---------------
 1 file changed, 55 insertions(+), 28 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 6bbbaa4aaf2c..eae38cd38b7b 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -394,24 +394,63 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret)
 	lo_rw_aio_do_completion(cmd);
 }
 
-static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
+static int lo_submit_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
+			    loff_t pos, int nr_bvec)
 {
-	struct iov_iter iter;
-	struct req_iterator rq_iter;
-	struct bio_vec *bvec;
 	struct request *rq = blk_mq_rq_from_pdu(cmd);
 	int dir = (req_op(rq) == REQ_OP_READ) ? ITER_DEST : ITER_SOURCE;
-	struct bio *bio = rq->bio;
 	struct file *file = lo->lo_backing_file;
-	struct bio_vec tmp;
+	struct iov_iter iter;
+	struct bio_vec *bvec;
 	unsigned int offset;
-	int nr_bvec = 0;
 	int ret;
 
+	if (rq->bio != rq->biotail) {
+		bvec = cmd->bvec;
+		offset = 0;
+	} else {
+		struct bio *bio = rq->bio;
+
+		offset = bio->bi_iter.bi_bvec_done;
+		bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
+	}
+	iov_iter_bvec(&iter, dir, bvec, nr_bvec, blk_rq_bytes(rq));
+	iter.iov_offset = offset;
+	cmd->iocb.ki_pos = pos;
+
+	atomic_set(&cmd->ref, 2);
+	if (dir == ITER_SOURCE)
+		ret = file->f_op->write_iter(&cmd->iocb, &iter);
+	else
+		ret = file->f_op->read_iter(&cmd->iocb, &iter);
+	lo_rw_aio_do_completion(cmd);
+
+	return ret;
+}
+
+static inline unsigned lo_cmd_nr_bvec(struct loop_cmd *cmd)
+{
+	struct req_iterator rq_iter;
+	struct request *rq = blk_mq_rq_from_pdu(cmd);
+	struct bio_vec tmp;
+	int nr_bvec = 0;
+
 	rq_for_each_bvec(tmp, rq, rq_iter)
 		nr_bvec++;
 
+	return nr_bvec;
+}
+
+static int lo_rw_aio_prep(struct loop_device *lo, struct loop_cmd *cmd,
+			  unsigned nr_bvec)
+{
+	struct request *rq = blk_mq_rq_from_pdu(cmd);
+	struct file *file = lo->lo_backing_file;
+
 	if (rq->bio != rq->biotail) {
+		struct req_iterator rq_iter;
+		struct bio_vec *bvec;
+		struct bio_vec tmp;
 
 		bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
 				     GFP_NOIO);
@@ -429,35 +468,23 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
 			*bvec = tmp;
 			bvec++;
 		}
-		bvec = cmd->bvec;
-		offset = 0;
-	} else {
-		/*
-		 * Same here, this bio may be started from the middle of the
-		 * 'bvec' because of bio splitting, so offset from the bvec
-		 * must be passed to iov iterator
-		 */
-		offset = bio->bi_iter.bi_bvec_done;
-		bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
 	}
-	atomic_set(&cmd->ref, 2);
-
-	iov_iter_bvec(&iter, dir, bvec, nr_bvec, blk_rq_bytes(rq));
-	iter.iov_offset = offset;
-
-	cmd->iocb.ki_pos = pos;
 	cmd->iocb.ki_filp = file;
 	cmd->iocb.ki_complete = lo_rw_aio_complete;
 	cmd->iocb.ki_flags = IOCB_DIRECT;
 	cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
 
-	if (dir == ITER_SOURCE)
-		ret = file->f_op->write_iter(&cmd->iocb, &iter);
-	else
-		ret = file->f_op->read_iter(&cmd->iocb, &iter);
+	return 0;
+}
 
-	lo_rw_aio_do_completion(cmd);
+static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
+{
+	unsigned int nr_bvec = lo_cmd_nr_bvec(cmd);
+	int ret = lo_rw_aio_prep(lo, cmd, nr_bvec);
 
+	if (ret < 0)
+		return ret;
+	ret = lo_submit_rw_aio(lo, cmd, pos, nr_bvec);
 	if (ret != -EIOCBQUEUED)
 		lo_rw_aio_complete(&cmd->iocb, ret);
 	return 0;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] loop: add helper loop_queue_work_prep
  2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
                   ` (4 preceding siblings ...)
  2025-03-08 16:14 ` [PATCH 2/5] loop: cleanup lo_rw_aio() Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
  2025-03-08 16:14 ` [PATCH 4/5] loop: try to handle loop aio command via NOWAIT IO first Ming Lei
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei

Add helper loop_queue_work_prep() for making loop_queue_rq() more
readable.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/loop.c | 38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index eae38cd38b7b..9f8d32d2dc4d 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -859,6 +859,27 @@ static inline int queue_on_root_worker(struct cgroup_subsys_state *css)
 }
 #endif
 
+static void loop_queue_work_prep(struct loop_cmd *cmd)
+{
+	struct request *rq = blk_mq_rq_from_pdu(cmd);
+
+	/* always use the first bio's css */
+	cmd->blkcg_css = NULL;
+	cmd->memcg_css = NULL;
+#ifdef CONFIG_BLK_CGROUP
+	if (rq->bio) {
+		cmd->blkcg_css = bio_blkcg_css(rq->bio);
+#ifdef CONFIG_MEMCG
+		if (cmd->blkcg_css) {
+			cmd->memcg_css =
+				cgroup_get_e_css(cmd->blkcg_css->cgroup,
+						&memory_cgrp_subsys);
+		}
+#endif
+	}
+#endif
+}
+
 static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 {
 	struct rb_node **node, *parent = NULL;
@@ -866,6 +887,8 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 	struct work_struct *work;
 	struct list_head *cmd_list;
 
+	loop_queue_work_prep(cmd);
+
 	spin_lock_irq(&lo->lo_work_lock);
 
 	if (queue_on_root_worker(cmd->blkcg_css))
@@ -1903,21 +1926,6 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 		break;
 	}
 
-	/* always use the first bio's css */
-	cmd->blkcg_css = NULL;
-	cmd->memcg_css = NULL;
-#ifdef CONFIG_BLK_CGROUP
-	if (rq->bio) {
-		cmd->blkcg_css = bio_blkcg_css(rq->bio);
-#ifdef CONFIG_MEMCG
-		if (cmd->blkcg_css) {
-			cmd->memcg_css =
-				cgroup_get_e_css(cmd->blkcg_css->cgroup,
-						&memory_cgrp_subsys);
-		}
-#endif
-	}
-#endif
 	loop_queue_work(lo, cmd);
 
 	return BLK_STS_OK;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] loop: try to handle loop aio command via NOWAIT IO first
  2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
                   ` (5 preceding siblings ...)
  2025-03-08 16:14 ` [PATCH 3/5] loop: add helper loop_queue_work_prep Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
  2025-03-08 16:14 ` [PATCH 5/5] loop: add module parameter of 'nr_hw_queues' Ming Lei
  2025-03-08 16:20 ` [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
  8 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei

Try to handle loop aio command via NOWAIT IO first, then we can avoid to
queue the aio command into workqueue.

Fallback to workqueue in case of -EAGAIN.

BLK_MQ_F_BLOCKING has to be set for calling into .read_iter() or
.write_iter() which might sleep even though it is NOWAIT.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/loop.c | 47 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 9f8d32d2dc4d..46be0c8e75a6 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -92,6 +92,8 @@ struct loop_cmd {
 #define LOOP_IDLE_WORKER_TIMEOUT (60 * HZ)
 #define LOOP_DEFAULT_HW_Q_DEPTH 128
 
+static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd);
+
 static DEFINE_IDR(loop_index_idr);
 static DEFINE_MUTEX(loop_ctl_mutex);
 static DEFINE_MUTEX(loop_validate_mutex);
@@ -380,8 +382,17 @@ static void lo_rw_aio_do_completion(struct loop_cmd *cmd)
 
 	if (!atomic_dec_and_test(&cmd->ref))
 		return;
+
+	if (cmd->ret == -EAGAIN) {
+		struct loop_device *lo = rq->q->queuedata;
+
+		loop_queue_work(lo, cmd);
+		return;
+	}
+
 	kfree(cmd->bvec);
 	cmd->bvec = NULL;
+
 	if (likely(!blk_should_fake_timeout(rq->q)))
 		blk_mq_complete_request(rq);
 }
@@ -478,16 +489,34 @@ static int lo_rw_aio_prep(struct loop_device *lo, struct loop_cmd *cmd,
 }
 
 static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
+{
+	unsigned int nr_bvec = lo_cmd_nr_bvec(cmd);
+	int ret;
+
+	cmd->iocb.ki_flags &= ~IOCB_NOWAIT;
+	ret = lo_submit_rw_aio(lo, cmd, pos, nr_bvec);
+	if (ret != -EIOCBQUEUED)
+		lo_rw_aio_complete(&cmd->iocb, ret);
+	return 0;
+}
+
+static int lo_rw_aio_nowait(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
 {
 	unsigned int nr_bvec = lo_cmd_nr_bvec(cmd);
 	int ret = lo_rw_aio_prep(lo, cmd, nr_bvec);
 
 	if (ret < 0)
 		return ret;
+
+	cmd->iocb.ki_flags |= IOCB_NOWAIT;
 	ret = lo_submit_rw_aio(lo, cmd, pos, nr_bvec);
-	if (ret != -EIOCBQUEUED)
+	if (ret == -EIOCBQUEUED)
+		return 0;
+	if (ret != -EAGAIN) {
 		lo_rw_aio_complete(&cmd->iocb, ret);
-	return 0;
+		return 0;
+	}
+	return ret;
 }
 
 static int do_req_filebacked(struct loop_device *lo, struct request *rq)
@@ -1926,6 +1955,17 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 		break;
 	}
 
+	if (cmd->use_aio) {
+		loff_t pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset;
+		int ret = lo_rw_aio_nowait(lo, cmd, pos);
+
+		if (!ret)
+			return BLK_STS_OK;
+		if (ret != -EAGAIN)
+			return BLK_STS_IOERR;
+		/* fallback to workqueue for handling aio */
+	}
+
 	loop_queue_work(lo, cmd);
 
 	return BLK_STS_OK;
@@ -2076,7 +2116,8 @@ static int loop_add(int i)
 	lo->tag_set.queue_depth = hw_queue_depth;
 	lo->tag_set.numa_node = NUMA_NO_NODE;
 	lo->tag_set.cmd_size = sizeof(struct loop_cmd);
-	lo->tag_set.flags = BLK_MQ_F_STACKING | BLK_MQ_F_NO_SCHED_BY_DEFAULT;
+	lo->tag_set.flags = BLK_MQ_F_STACKING | BLK_MQ_F_NO_SCHED_BY_DEFAULT |
+		BLK_MQ_F_BLOCKING;
 	lo->tag_set.driver_data = lo;
 
 	err = blk_mq_alloc_tag_set(&lo->tag_set);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] loop: add module parameter of 'nr_hw_queues'
  2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
                   ` (6 preceding siblings ...)
  2025-03-08 16:14 ` [PATCH 4/5] loop: try to handle loop aio command via NOWAIT IO first Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
  2025-03-08 16:20 ` [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
  8 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei

Add module parameter of 'nr_hw_queues' so that loop can support MQ,
which may reduce contention in case of too many io jobs.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/loop.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 46be0c8e75a6..6378dfee6681 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -91,6 +91,7 @@ struct loop_cmd {
 
 #define LOOP_IDLE_WORKER_TIMEOUT (60 * HZ)
 #define LOOP_DEFAULT_HW_Q_DEPTH 128
+#define LOOP_DEFAULT_NR_HW_Q 1
 
 static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd);
 
@@ -1928,6 +1929,26 @@ static const struct kernel_param_ops loop_hw_qdepth_param_ops = {
 device_param_cb(hw_queue_depth, &loop_hw_qdepth_param_ops, &hw_queue_depth, 0444);
 MODULE_PARM_DESC(hw_queue_depth, "Queue depth for each hardware queue. Default: " __stringify(LOOP_DEFAULT_HW_Q_DEPTH));
 
+static int nr_hw_queues = LOOP_DEFAULT_NR_HW_Q;
+static int loop_set_nr_hw_queues(const char *s, const struct kernel_param *p)
+{
+	int nr, ret;
+
+	ret = kstrtoint(s, 0, &nr);
+	if (ret < 0)
+		return ret;
+	if (nr < 1)
+		return -EINVAL;
+	nr_hw_queues = nr;
+	return 0;
+}
+static const struct kernel_param_ops loop_nr_hw_q_param_ops = {
+	.set	= loop_set_nr_hw_queues,
+	.get	= param_get_int,
+};
+device_param_cb(nr_hw_queues, &loop_nr_hw_q_param_ops, &nr_hw_queues, 0444);
+MODULE_PARM_DESC(nr_hw_queues, "number of hardware queues. Default: " __stringify(LOOP_DEFAULT_NR_HW_Q));
+
 MODULE_DESCRIPTION("Loopback device support");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_BLOCKDEV_MAJOR(LOOP_MAJOR);
@@ -2112,7 +2133,7 @@ static int loop_add(int i)
 	i = err;
 
 	lo->tag_set.ops = &loop_mq_ops;
-	lo->tag_set.nr_hw_queues = 1;
+	lo->tag_set.nr_hw_queues = nr_hw_queues;
 	lo->tag_set.queue_depth = hw_queue_depth;
 	lo->tag_set.numa_node = NUMA_NO_NODE;
 	lo->tag_set.cmd_size = sizeof(struct loop_cmd);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] block: loop: share code of reread partitions
  2025-03-08 16:14 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
@ 2025-03-08 16:17   ` Ming Lei
  0 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2025-03-08 16:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block

On Sun, Mar 09, 2025 at 12:14:51AM +0800, Ming Lei wrote:
> loop_reread_partitions() has been there for rereading partitions, so
> replace the open code in __loop_clr_fd() with loop_reread_partitions()
> by passing 'locked' parameter.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---

oops, please ignore this one.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT
  2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
                   ` (7 preceding siblings ...)
  2025-03-08 16:14 ` [PATCH 5/5] loop: add module parameter of 'nr_hw_queues' Ming Lei
@ 2025-03-08 16:20 ` Ming Lei
  8 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2025-03-08 16:20 UTC (permalink / raw)
  To: Jens Axboe, linux-block

On Sun, Mar 09, 2025 at 12:14:50AM +0800, Ming Lei wrote:
> Hello Jens,
> 
> This patchset improves loop aio perf by using IOCB_NOWAIT for avoiding to queue aio
> command to workqueue context, meantime refactor lo_rw_aio() a bit.
> 
> The last patch adds MQ support, which improves perf a bit in case of multiple
> IO jobs.
> 
> In my test VM, loop disk perf becomes very close to perf of the backing block
> device(nvme/mq virtio-scsi).
> 
> Thanks,
> Ming

Please ignore this patchset, since several unrelated patches are included
accidentally.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-03-08 16:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
2025-03-08 16:14 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
2025-03-08 16:17   ` Ming Lei
2025-03-08 16:14 ` [PATCH] loop: fallback to buffered IO in case of dio submission failure Ming Lei
2025-03-08 16:14 ` [PATCH 1/5] loop: remove 'rw' parameter from lo_rw_aio() Ming Lei
2025-03-08 16:14 ` [PATCH 2/2] block: loop: delete partitions after clearing & changing fd Ming Lei
2025-03-08 16:14 ` [PATCH 2/5] loop: cleanup lo_rw_aio() Ming Lei
2025-03-08 16:14 ` [PATCH 3/5] loop: add helper loop_queue_work_prep Ming Lei
2025-03-08 16:14 ` [PATCH 4/5] loop: try to handle loop aio command via NOWAIT IO first Ming Lei
2025-03-08 16:14 ` [PATCH 5/5] loop: add module parameter of 'nr_hw_queues' Ming Lei
2025-03-08 16:20 ` [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox