* [PATCH 1/2] block: loop: share code of reread partitions
2020-07-07 8:45 [PATCH 0/2] block: loop: delete partitions after clearing & changing fd Ming Lei
@ 2020-07-07 8:45 ` Ming Lei
2020-07-07 17:49 ` Christoph Hellwig
0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2020-07-07 8:45 UTC (permalink / raw)
To: Jens Axboe, Christoph Hellwig; +Cc: linux-block, Ming Lei
loop_reread_partitions() has been there for rereading partitions, so
replace the open code in __loop_clr_fd() with loop_reread_partitions()
by passing 'locked' parameter.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/loop.c | 29 ++++++++++++-----------------
1 file changed, 12 insertions(+), 17 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index a943207705dd..0e08468b9ce0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -650,13 +650,17 @@ static inline void loop_update_dio(struct loop_device *lo)
}
static void loop_reread_partitions(struct loop_device *lo,
- struct block_device *bdev)
+ struct block_device *bdev, bool locked)
{
int rc;
- mutex_lock(&bdev->bd_mutex);
- rc = bdev_disk_changed(bdev, false);
- mutex_unlock(&bdev->bd_mutex);
+ if (locked) {
+ rc = bdev_disk_changed(bdev, false);
+ } else {
+ mutex_lock(&bdev->bd_mutex);
+ rc = bdev_disk_changed(bdev, false);
+ mutex_unlock(&bdev->bd_mutex);
+ }
if (rc)
pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n",
__func__, lo->lo_number, lo->lo_file_name, rc);
@@ -754,7 +758,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
*/
fput(old_file);
if (partscan)
- loop_reread_partitions(lo, bdev);
+ loop_reread_partitions(lo, bdev, false);
return 0;
out_err:
@@ -1179,7 +1183,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
bdgrab(bdev);
mutex_unlock(&loop_ctl_mutex);
if (partscan)
- loop_reread_partitions(lo, bdev);
+ loop_reread_partitions(lo, bdev, false);
if (claimed_bdev)
bd_abort_claiming(bdev, claimed_bdev, loop_configure);
return 0;
@@ -1270,16 +1274,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
* must be at least one and it can only become zero when the
* current holder is released.
*/
- if (!release)
- mutex_lock(&bdev->bd_mutex);
- err = bdev_disk_changed(bdev, false);
- if (!release)
- mutex_unlock(&bdev->bd_mutex);
- if (err)
- pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
- __func__, lo_number, err);
- /* Device is gone, no point in returning error */
- err = 0;
+ loop_reread_partitions(lo, bdev, release);
}
/*
@@ -1420,7 +1415,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
out_unlock:
mutex_unlock(&loop_ctl_mutex);
if (partscan)
- loop_reread_partitions(lo, bdev);
+ loop_reread_partitions(lo, bdev, false);
return err;
}
--
2.25.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] block: loop: share code of reread partitions
2020-07-07 8:45 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
@ 2020-07-07 17:49 ` Christoph Hellwig
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2020-07-07 17:49 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block
On Tue, Jul 07, 2020 at 04:45:51PM +0800, Ming Lei wrote:
> loop_reread_partitions() has been there for rereading partitions, so
> replace the open code in __loop_clr_fd() with loop_reread_partitions()
> by passing 'locked' parameter.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> drivers/block/loop.c | 29 ++++++++++++-----------------
> 1 file changed, 12 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index a943207705dd..0e08468b9ce0 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -650,13 +650,17 @@ static inline void loop_update_dio(struct loop_device *lo)
> }
>
> static void loop_reread_partitions(struct loop_device *lo,
> - struct block_device *bdev)
> + struct block_device *bdev, bool locked)
> {
> int rc;
>
> - mutex_lock(&bdev->bd_mutex);
> - rc = bdev_disk_changed(bdev, false);
> - mutex_unlock(&bdev->bd_mutex);
> + if (locked) {
> + rc = bdev_disk_changed(bdev, false);
> + } else {
> + mutex_lock(&bdev->bd_mutex);
> + rc = bdev_disk_changed(bdev, false);
> + mutex_unlock(&bdev->bd_mutex);
> + }
functions with an argument based locking context are a really bad
idea. And there is absolutely no reason to add them just for
a shared printk.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT
@ 2025-03-08 16:14 Ming Lei
2025-03-08 16:14 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
` (8 more replies)
0 siblings, 9 replies; 13+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Ming Lei
Hello Jens,
This patchset improves loop aio perf by using IOCB_NOWAIT for avoiding to queue aio
command to workqueue context, meantime refactor lo_rw_aio() a bit.
The last patch adds MQ support, which improves perf a bit in case of multiple
IO jobs.
In my test VM, loop disk perf becomes very close to perf of the backing block
device(nvme/mq virtio-scsi).
Thanks,
Ming
Ming Lei (5):
loop: remove 'rw' parameter from lo_rw_aio()
loop: cleanup lo_rw_aio()
loop: add helper loop_queue_work_prep
loop: try to handle loop aio command via NOWAIT IO first
loop: add module parameter of 'nr_hw_queues'
drivers/block/loop.c | 225 ++++++++++++++++++++++++++++++-------------
1 file changed, 156 insertions(+), 69 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] block: loop: share code of reread partitions
2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
2025-03-08 16:17 ` Ming Lei
2025-03-08 16:14 ` [PATCH] loop: fallback to buffered IO in case of dio submission failure Ming Lei
` (7 subsequent siblings)
8 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Ming Lei
loop_reread_partitions() has been there for rereading partitions, so
replace the open code in __loop_clr_fd() with loop_reread_partitions()
by passing 'locked' parameter.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/loop.c | 29 ++++++++++++-----------------
1 file changed, 12 insertions(+), 17 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index a943207705dd..0e08468b9ce0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -650,13 +650,17 @@ static inline void loop_update_dio(struct loop_device *lo)
}
static void loop_reread_partitions(struct loop_device *lo,
- struct block_device *bdev)
+ struct block_device *bdev, bool locked)
{
int rc;
- mutex_lock(&bdev->bd_mutex);
- rc = bdev_disk_changed(bdev, false);
- mutex_unlock(&bdev->bd_mutex);
+ if (locked) {
+ rc = bdev_disk_changed(bdev, false);
+ } else {
+ mutex_lock(&bdev->bd_mutex);
+ rc = bdev_disk_changed(bdev, false);
+ mutex_unlock(&bdev->bd_mutex);
+ }
if (rc)
pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n",
__func__, lo->lo_number, lo->lo_file_name, rc);
@@ -754,7 +758,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
*/
fput(old_file);
if (partscan)
- loop_reread_partitions(lo, bdev);
+ loop_reread_partitions(lo, bdev, false);
return 0;
out_err:
@@ -1179,7 +1183,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
bdgrab(bdev);
mutex_unlock(&loop_ctl_mutex);
if (partscan)
- loop_reread_partitions(lo, bdev);
+ loop_reread_partitions(lo, bdev, false);
if (claimed_bdev)
bd_abort_claiming(bdev, claimed_bdev, loop_configure);
return 0;
@@ -1270,16 +1274,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
* must be at least one and it can only become zero when the
* current holder is released.
*/
- if (!release)
- mutex_lock(&bdev->bd_mutex);
- err = bdev_disk_changed(bdev, false);
- if (!release)
- mutex_unlock(&bdev->bd_mutex);
- if (err)
- pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
- __func__, lo_number, err);
- /* Device is gone, no point in returning error */
- err = 0;
+ loop_reread_partitions(lo, bdev, release);
}
/*
@@ -1420,7 +1415,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
out_unlock:
mutex_unlock(&loop_ctl_mutex);
if (partscan)
- loop_reread_partitions(lo, bdev);
+ loop_reread_partitions(lo, bdev, false);
return err;
}
--
2.25.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH] loop: fallback to buffered IO in case of dio submission failure
2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
2025-03-08 16:14 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
2025-03-08 16:14 ` [PATCH 1/5] loop: remove 'rw' parameter from lo_rw_aio() Ming Lei
` (6 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Ming Lei
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/loop.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 7bf4686af774..2fa15933860d 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -562,6 +562,14 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
lo_rw_aio_do_completion(cmd);
}
+static inline int lo_call_backing_rw_iter(struct file *file,
+ struct kiocb *iocb, struct iov_iter *iter, bool rw)
+{
+ if (rw == WRITE)
+ return call_write_iter(file, iocb, iter);
+ return call_read_iter(file, iocb, iter);
+}
+
static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
loff_t pos, bool rw)
{
@@ -619,15 +627,18 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
cmd->iocb.ki_flags = IOCB_DIRECT;
cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
- if (rw == WRITE)
- ret = call_write_iter(file, &cmd->iocb, &iter);
- else
- ret = call_read_iter(file, &cmd->iocb, &iter);
+ ret = lo_call_backing_rw_iter(file, &cmd->iocb, &iter, rw);
lo_rw_aio_do_completion(cmd);
- if (ret != -EIOCBQUEUED)
+ if (ret >= 0) {
cmd->iocb.ki_complete(&cmd->iocb, ret, 0);
+ } else if (ret != -EIOCBQUEUED) {
+ /* fallback to buffered IO */
+ cmd->iocb.ki_flags = 0;
+ cmd->ret = lo_call_backing_rw_iter(file, &cmd->iocb, &iter, rw);
+ lo_rw_aio_do_completion(cmd);
+ }
return 0;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 1/5] loop: remove 'rw' parameter from lo_rw_aio()
2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
2025-03-08 16:14 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
2025-03-08 16:14 ` [PATCH] loop: fallback to buffered IO in case of dio submission failure Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
2025-03-08 16:14 ` [PATCH 2/2] block: loop: delete partitions after clearing & changing fd Ming Lei
` (5 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Ming Lei
lo_rw_aio() is only called for READ/WRITE operation, which can be
figured out from request directly, so remove 'rw' parameter from
lo_rw_aio(), meantime rename the local variable as 'dir' which matches
the actual use more.
Meantime merge lo_read_simple() and lo_write_simple() into
lo_rw_simple().
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/loop.c | 48 ++++++++++++++++++--------------------------
1 file changed, 19 insertions(+), 29 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 657bf53decf3..6bbbaa4aaf2c 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -239,31 +239,25 @@ static int lo_write_bvec(struct file *file, struct bio_vec *bvec, loff_t *ppos)
return bw;
}
-static int lo_write_simple(struct loop_device *lo, struct request *rq,
- loff_t pos)
-{
- struct bio_vec bvec;
- struct req_iterator iter;
- int ret = 0;
-
- rq_for_each_segment(bvec, rq, iter) {
- ret = lo_write_bvec(lo->lo_backing_file, &bvec, &pos);
- if (ret < 0)
- break;
- cond_resched();
- }
-
- return ret;
-}
-
-static int lo_read_simple(struct loop_device *lo, struct request *rq,
- loff_t pos)
+static int lo_rw_simple(struct loop_device *lo, struct request *rq, loff_t pos)
{
struct bio_vec bvec;
struct req_iterator iter;
struct iov_iter i;
ssize_t len;
+ if (req_op(rq) == REQ_OP_WRITE) {
+ int ret;
+
+ rq_for_each_segment(bvec, rq, iter) {
+ ret = lo_write_bvec(lo->lo_backing_file, &bvec, &pos);
+ if (ret < 0)
+ break;
+ cond_resched();
+ }
+ return ret;
+ }
+
rq_for_each_segment(bvec, rq, iter) {
iov_iter_bvec(&i, ITER_DEST, &bvec, 1, bvec.bv_len);
len = vfs_iter_read(lo->lo_backing_file, &i, &pos, 0);
@@ -400,13 +394,13 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret)
lo_rw_aio_do_completion(cmd);
}
-static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
- loff_t pos, int rw)
+static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
{
struct iov_iter iter;
struct req_iterator rq_iter;
struct bio_vec *bvec;
struct request *rq = blk_mq_rq_from_pdu(cmd);
+ int dir = (req_op(rq) == REQ_OP_READ) ? ITER_DEST : ITER_SOURCE;
struct bio *bio = rq->bio;
struct file *file = lo->lo_backing_file;
struct bio_vec tmp;
@@ -448,7 +442,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
}
atomic_set(&cmd->ref, 2);
- iov_iter_bvec(&iter, rw, bvec, nr_bvec, blk_rq_bytes(rq));
+ iov_iter_bvec(&iter, dir, bvec, nr_bvec, blk_rq_bytes(rq));
iter.iov_offset = offset;
cmd->iocb.ki_pos = pos;
@@ -457,7 +451,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
cmd->iocb.ki_flags = IOCB_DIRECT;
cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
- if (rw == ITER_SOURCE)
+ if (dir == ITER_SOURCE)
ret = file->f_op->write_iter(&cmd->iocb, &iter);
else
ret = file->f_op->read_iter(&cmd->iocb, &iter);
@@ -498,15 +492,11 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
case REQ_OP_DISCARD:
return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
case REQ_OP_WRITE:
- if (cmd->use_aio)
- return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
- else
- return lo_write_simple(lo, rq, pos);
case REQ_OP_READ:
if (cmd->use_aio)
- return lo_rw_aio(lo, cmd, pos, ITER_DEST);
+ return lo_rw_aio(lo, cmd, pos);
else
- return lo_read_simple(lo, rq, pos);
+ return lo_rw_simple(lo, rq, pos);
default:
WARN_ON_ONCE(1);
return -EIO;
--
2.47.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] block: loop: delete partitions after clearing & changing fd
2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
` (2 preceding siblings ...)
2025-03-08 16:14 ` [PATCH 1/5] loop: remove 'rw' parameter from lo_rw_aio() Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
2025-03-08 16:14 ` [PATCH 2/5] loop: cleanup lo_rw_aio() Ming Lei
` (4 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Ming Lei
After clearing fd or changing fd, we have to delete old partitions,
otherwise they may become ghost partitions.
Fix this issue by clearing GENHD_FL_NO_PART_SCAN during calling
bdev_disk_changed() which won't drop old partitions if GENHD_FL_NO_PART_SCAN
isn't set.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/loop.c | 29 +++++++++++++++++++----------
1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0e08468b9ce0..cf71a1bbcd45 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -650,17 +650,26 @@ static inline void loop_update_dio(struct loop_device *lo)
}
static void loop_reread_partitions(struct loop_device *lo,
- struct block_device *bdev, bool locked)
+ struct block_device *bdev, bool locked,
+ bool force_scan)
{
int rc;
+ bool no_scan;
- if (locked) {
- rc = bdev_disk_changed(bdev, false);
- } else {
+ if (!locked)
mutex_lock(&bdev->bd_mutex);
- rc = bdev_disk_changed(bdev, false);
+
+ no_scan = lo->lo_disk->flags & GENHD_FL_NO_PART_SCAN;
+ if (force_scan && no_scan)
+ lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
+
+ rc = bdev_disk_changed(bdev, false);
+
+ if (force_scan && no_scan)
+ lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
+
+ if (!locked)
mutex_unlock(&bdev->bd_mutex);
- }
if (rc)
pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n",
__func__, lo->lo_number, lo->lo_file_name, rc);
@@ -758,7 +767,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
*/
fput(old_file);
if (partscan)
- loop_reread_partitions(lo, bdev, false);
+ loop_reread_partitions(lo, bdev, false, true);
return 0;
out_err:
@@ -1183,7 +1192,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
bdgrab(bdev);
mutex_unlock(&loop_ctl_mutex);
if (partscan)
- loop_reread_partitions(lo, bdev, false);
+ loop_reread_partitions(lo, bdev, false, false);
if (claimed_bdev)
bd_abort_claiming(bdev, claimed_bdev, loop_configure);
return 0;
@@ -1274,7 +1283,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
* must be at least one and it can only become zero when the
* current holder is released.
*/
- loop_reread_partitions(lo, bdev, release);
+ loop_reread_partitions(lo, bdev, release, true);
}
/*
@@ -1415,7 +1424,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
out_unlock:
mutex_unlock(&loop_ctl_mutex);
if (partscan)
- loop_reread_partitions(lo, bdev, false);
+ loop_reread_partitions(lo, bdev, false, false);
return err;
}
--
2.25.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/5] loop: cleanup lo_rw_aio()
2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
` (3 preceding siblings ...)
2025-03-08 16:14 ` [PATCH 2/2] block: loop: delete partitions after clearing & changing fd Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
2025-03-08 16:14 ` [PATCH 3/5] loop: add helper loop_queue_work_prep Ming Lei
` (3 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Ming Lei
Cleanup lo_rw_aio() a bit by refactoring it into three parts:
- lo_cmd_nr_bvec(), for calculating how many bvecs in this request
- lo_rw_aio_prep(), for preparing loop command, which need to be called
once
- lo_submit_rw_aio(), for submitting this lo command, which can be
called multiple times
Prepare for trying to handle loop command by NOWAIT read/write IO
first.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/loop.c | 83 +++++++++++++++++++++++++++++---------------
1 file changed, 55 insertions(+), 28 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 6bbbaa4aaf2c..eae38cd38b7b 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -394,24 +394,63 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret)
lo_rw_aio_do_completion(cmd);
}
-static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
+static int lo_submit_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
+ loff_t pos, int nr_bvec)
{
- struct iov_iter iter;
- struct req_iterator rq_iter;
- struct bio_vec *bvec;
struct request *rq = blk_mq_rq_from_pdu(cmd);
int dir = (req_op(rq) == REQ_OP_READ) ? ITER_DEST : ITER_SOURCE;
- struct bio *bio = rq->bio;
struct file *file = lo->lo_backing_file;
- struct bio_vec tmp;
+ struct iov_iter iter;
+ struct bio_vec *bvec;
unsigned int offset;
- int nr_bvec = 0;
int ret;
+ if (rq->bio != rq->biotail) {
+ bvec = cmd->bvec;
+ offset = 0;
+ } else {
+ struct bio *bio = rq->bio;
+
+ offset = bio->bi_iter.bi_bvec_done;
+ bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
+ }
+ iov_iter_bvec(&iter, dir, bvec, nr_bvec, blk_rq_bytes(rq));
+ iter.iov_offset = offset;
+ cmd->iocb.ki_pos = pos;
+
+ atomic_set(&cmd->ref, 2);
+ if (dir == ITER_SOURCE)
+ ret = file->f_op->write_iter(&cmd->iocb, &iter);
+ else
+ ret = file->f_op->read_iter(&cmd->iocb, &iter);
+ lo_rw_aio_do_completion(cmd);
+
+ return ret;
+}
+
+static inline unsigned lo_cmd_nr_bvec(struct loop_cmd *cmd)
+{
+ struct req_iterator rq_iter;
+ struct request *rq = blk_mq_rq_from_pdu(cmd);
+ struct bio_vec tmp;
+ int nr_bvec = 0;
+
rq_for_each_bvec(tmp, rq, rq_iter)
nr_bvec++;
+ return nr_bvec;
+}
+
+static int lo_rw_aio_prep(struct loop_device *lo, struct loop_cmd *cmd,
+ unsigned nr_bvec)
+{
+ struct request *rq = blk_mq_rq_from_pdu(cmd);
+ struct file *file = lo->lo_backing_file;
+
if (rq->bio != rq->biotail) {
+ struct req_iterator rq_iter;
+ struct bio_vec *bvec;
+ struct bio_vec tmp;
bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
GFP_NOIO);
@@ -429,35 +468,23 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
*bvec = tmp;
bvec++;
}
- bvec = cmd->bvec;
- offset = 0;
- } else {
- /*
- * Same here, this bio may be started from the middle of the
- * 'bvec' because of bio splitting, so offset from the bvec
- * must be passed to iov iterator
- */
- offset = bio->bi_iter.bi_bvec_done;
- bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
}
- atomic_set(&cmd->ref, 2);
-
- iov_iter_bvec(&iter, dir, bvec, nr_bvec, blk_rq_bytes(rq));
- iter.iov_offset = offset;
-
- cmd->iocb.ki_pos = pos;
cmd->iocb.ki_filp = file;
cmd->iocb.ki_complete = lo_rw_aio_complete;
cmd->iocb.ki_flags = IOCB_DIRECT;
cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
- if (dir == ITER_SOURCE)
- ret = file->f_op->write_iter(&cmd->iocb, &iter);
- else
- ret = file->f_op->read_iter(&cmd->iocb, &iter);
+ return 0;
+}
- lo_rw_aio_do_completion(cmd);
+static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
+{
+ unsigned int nr_bvec = lo_cmd_nr_bvec(cmd);
+ int ret = lo_rw_aio_prep(lo, cmd, nr_bvec);
+ if (ret < 0)
+ return ret;
+ ret = lo_submit_rw_aio(lo, cmd, pos, nr_bvec);
if (ret != -EIOCBQUEUED)
lo_rw_aio_complete(&cmd->iocb, ret);
return 0;
--
2.47.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/5] loop: add helper loop_queue_work_prep
2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
` (4 preceding siblings ...)
2025-03-08 16:14 ` [PATCH 2/5] loop: cleanup lo_rw_aio() Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
2025-03-08 16:14 ` [PATCH 4/5] loop: try to handle loop aio command via NOWAIT IO first Ming Lei
` (2 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Ming Lei
Add helper loop_queue_work_prep() for making loop_queue_rq() more
readable.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/loop.c | 38 +++++++++++++++++++++++---------------
1 file changed, 23 insertions(+), 15 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index eae38cd38b7b..9f8d32d2dc4d 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -859,6 +859,27 @@ static inline int queue_on_root_worker(struct cgroup_subsys_state *css)
}
#endif
+static void loop_queue_work_prep(struct loop_cmd *cmd)
+{
+ struct request *rq = blk_mq_rq_from_pdu(cmd);
+
+ /* always use the first bio's css */
+ cmd->blkcg_css = NULL;
+ cmd->memcg_css = NULL;
+#ifdef CONFIG_BLK_CGROUP
+ if (rq->bio) {
+ cmd->blkcg_css = bio_blkcg_css(rq->bio);
+#ifdef CONFIG_MEMCG
+ if (cmd->blkcg_css) {
+ cmd->memcg_css =
+ cgroup_get_e_css(cmd->blkcg_css->cgroup,
+ &memory_cgrp_subsys);
+ }
+#endif
+ }
+#endif
+}
+
static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
{
struct rb_node **node, *parent = NULL;
@@ -866,6 +887,8 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
struct work_struct *work;
struct list_head *cmd_list;
+ loop_queue_work_prep(cmd);
+
spin_lock_irq(&lo->lo_work_lock);
if (queue_on_root_worker(cmd->blkcg_css))
@@ -1903,21 +1926,6 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
break;
}
- /* always use the first bio's css */
- cmd->blkcg_css = NULL;
- cmd->memcg_css = NULL;
-#ifdef CONFIG_BLK_CGROUP
- if (rq->bio) {
- cmd->blkcg_css = bio_blkcg_css(rq->bio);
-#ifdef CONFIG_MEMCG
- if (cmd->blkcg_css) {
- cmd->memcg_css =
- cgroup_get_e_css(cmd->blkcg_css->cgroup,
- &memory_cgrp_subsys);
- }
-#endif
- }
-#endif
loop_queue_work(lo, cmd);
return BLK_STS_OK;
--
2.47.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/5] loop: try to handle loop aio command via NOWAIT IO first
2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
` (5 preceding siblings ...)
2025-03-08 16:14 ` [PATCH 3/5] loop: add helper loop_queue_work_prep Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
2025-03-08 16:14 ` [PATCH 5/5] loop: add module parameter of 'nr_hw_queues' Ming Lei
2025-03-08 16:20 ` [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
8 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Ming Lei
Try to handle loop aio command via NOWAIT IO first, then we can avoid to
queue the aio command into workqueue.
Fallback to workqueue in case of -EAGAIN.
BLK_MQ_F_BLOCKING has to be set for calling into .read_iter() or
.write_iter() which might sleep even though it is NOWAIT.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/loop.c | 47 +++++++++++++++++++++++++++++++++++++++++---
1 file changed, 44 insertions(+), 3 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 9f8d32d2dc4d..46be0c8e75a6 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -92,6 +92,8 @@ struct loop_cmd {
#define LOOP_IDLE_WORKER_TIMEOUT (60 * HZ)
#define LOOP_DEFAULT_HW_Q_DEPTH 128
+static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd);
+
static DEFINE_IDR(loop_index_idr);
static DEFINE_MUTEX(loop_ctl_mutex);
static DEFINE_MUTEX(loop_validate_mutex);
@@ -380,8 +382,17 @@ static void lo_rw_aio_do_completion(struct loop_cmd *cmd)
if (!atomic_dec_and_test(&cmd->ref))
return;
+
+ if (cmd->ret == -EAGAIN) {
+ struct loop_device *lo = rq->q->queuedata;
+
+ loop_queue_work(lo, cmd);
+ return;
+ }
+
kfree(cmd->bvec);
cmd->bvec = NULL;
+
if (likely(!blk_should_fake_timeout(rq->q)))
blk_mq_complete_request(rq);
}
@@ -478,16 +489,34 @@ static int lo_rw_aio_prep(struct loop_device *lo, struct loop_cmd *cmd,
}
static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
+{
+ unsigned int nr_bvec = lo_cmd_nr_bvec(cmd);
+ int ret;
+
+ cmd->iocb.ki_flags &= ~IOCB_NOWAIT;
+ ret = lo_submit_rw_aio(lo, cmd, pos, nr_bvec);
+ if (ret != -EIOCBQUEUED)
+ lo_rw_aio_complete(&cmd->iocb, ret);
+ return 0;
+}
+
+static int lo_rw_aio_nowait(struct loop_device *lo, struct loop_cmd *cmd, loff_t pos)
{
unsigned int nr_bvec = lo_cmd_nr_bvec(cmd);
int ret = lo_rw_aio_prep(lo, cmd, nr_bvec);
if (ret < 0)
return ret;
+
+ cmd->iocb.ki_flags |= IOCB_NOWAIT;
ret = lo_submit_rw_aio(lo, cmd, pos, nr_bvec);
- if (ret != -EIOCBQUEUED)
+ if (ret == -EIOCBQUEUED)
+ return 0;
+ if (ret != -EAGAIN) {
lo_rw_aio_complete(&cmd->iocb, ret);
- return 0;
+ return 0;
+ }
+ return ret;
}
static int do_req_filebacked(struct loop_device *lo, struct request *rq)
@@ -1926,6 +1955,17 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
break;
}
+ if (cmd->use_aio) {
+ loff_t pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset;
+ int ret = lo_rw_aio_nowait(lo, cmd, pos);
+
+ if (!ret)
+ return BLK_STS_OK;
+ if (ret != -EAGAIN)
+ return BLK_STS_IOERR;
+ /* fallback to workqueue for handling aio */
+ }
+
loop_queue_work(lo, cmd);
return BLK_STS_OK;
@@ -2076,7 +2116,8 @@ static int loop_add(int i)
lo->tag_set.queue_depth = hw_queue_depth;
lo->tag_set.numa_node = NUMA_NO_NODE;
lo->tag_set.cmd_size = sizeof(struct loop_cmd);
- lo->tag_set.flags = BLK_MQ_F_STACKING | BLK_MQ_F_NO_SCHED_BY_DEFAULT;
+ lo->tag_set.flags = BLK_MQ_F_STACKING | BLK_MQ_F_NO_SCHED_BY_DEFAULT |
+ BLK_MQ_F_BLOCKING;
lo->tag_set.driver_data = lo;
err = blk_mq_alloc_tag_set(&lo->tag_set);
--
2.47.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 5/5] loop: add module parameter of 'nr_hw_queues'
2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
` (6 preceding siblings ...)
2025-03-08 16:14 ` [PATCH 4/5] loop: try to handle loop aio command via NOWAIT IO first Ming Lei
@ 2025-03-08 16:14 ` Ming Lei
2025-03-08 16:20 ` [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
8 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-03-08 16:14 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Ming Lei
Add module parameter of 'nr_hw_queues' so that loop can support MQ,
which may reduce contention in case of too many io jobs.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/loop.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 46be0c8e75a6..6378dfee6681 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -91,6 +91,7 @@ struct loop_cmd {
#define LOOP_IDLE_WORKER_TIMEOUT (60 * HZ)
#define LOOP_DEFAULT_HW_Q_DEPTH 128
+#define LOOP_DEFAULT_NR_HW_Q 1
static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd);
@@ -1928,6 +1929,26 @@ static const struct kernel_param_ops loop_hw_qdepth_param_ops = {
device_param_cb(hw_queue_depth, &loop_hw_qdepth_param_ops, &hw_queue_depth, 0444);
MODULE_PARM_DESC(hw_queue_depth, "Queue depth for each hardware queue. Default: " __stringify(LOOP_DEFAULT_HW_Q_DEPTH));
+static int nr_hw_queues = LOOP_DEFAULT_NR_HW_Q;
+static int loop_set_nr_hw_queues(const char *s, const struct kernel_param *p)
+{
+ int nr, ret;
+
+ ret = kstrtoint(s, 0, &nr);
+ if (ret < 0)
+ return ret;
+ if (nr < 1)
+ return -EINVAL;
+ nr_hw_queues = nr;
+ return 0;
+}
+static const struct kernel_param_ops loop_nr_hw_q_param_ops = {
+ .set = loop_set_nr_hw_queues,
+ .get = param_get_int,
+};
+device_param_cb(nr_hw_queues, &loop_nr_hw_q_param_ops, &nr_hw_queues, 0444);
+MODULE_PARM_DESC(nr_hw_queues, "number of hardware queues. Default: " __stringify(LOOP_DEFAULT_NR_HW_Q));
+
MODULE_DESCRIPTION("Loopback device support");
MODULE_LICENSE("GPL");
MODULE_ALIAS_BLOCKDEV_MAJOR(LOOP_MAJOR);
@@ -2112,7 +2133,7 @@ static int loop_add(int i)
i = err;
lo->tag_set.ops = &loop_mq_ops;
- lo->tag_set.nr_hw_queues = 1;
+ lo->tag_set.nr_hw_queues = nr_hw_queues;
lo->tag_set.queue_depth = hw_queue_depth;
lo->tag_set.numa_node = NUMA_NO_NODE;
lo->tag_set.cmd_size = sizeof(struct loop_cmd);
--
2.47.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] block: loop: share code of reread partitions
2025-03-08 16:14 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
@ 2025-03-08 16:17 ` Ming Lei
0 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-03-08 16:17 UTC (permalink / raw)
To: Jens Axboe, linux-block
On Sun, Mar 09, 2025 at 12:14:51AM +0800, Ming Lei wrote:
> loop_reread_partitions() has been there for rereading partitions, so
> replace the open code in __loop_clr_fd() with loop_reread_partitions()
> by passing 'locked' parameter.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
oops, please ignore this one.
Thanks,
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT
2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
` (7 preceding siblings ...)
2025-03-08 16:14 ` [PATCH 5/5] loop: add module parameter of 'nr_hw_queues' Ming Lei
@ 2025-03-08 16:20 ` Ming Lei
8 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-03-08 16:20 UTC (permalink / raw)
To: Jens Axboe, linux-block
On Sun, Mar 09, 2025 at 12:14:50AM +0800, Ming Lei wrote:
> Hello Jens,
>
> This patchset improves loop aio perf by using IOCB_NOWAIT for avoiding to queue aio
> command to workqueue context, meantime refactor lo_rw_aio() a bit.
>
> The last patch adds MQ support, which improves perf a bit in case of multiple
> IO jobs.
>
> In my test VM, loop disk perf becomes very close to perf of the backing block
> device(nvme/mq virtio-scsi).
>
> Thanks,
> Ming
Please ignore this patchset, since several unrelated patches are included
accidentally.
Thanks,
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-03-08 16:20 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-08 16:14 [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
2025-03-08 16:14 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
2025-03-08 16:17 ` Ming Lei
2025-03-08 16:14 ` [PATCH] loop: fallback to buffered IO in case of dio submission failure Ming Lei
2025-03-08 16:14 ` [PATCH 1/5] loop: remove 'rw' parameter from lo_rw_aio() Ming Lei
2025-03-08 16:14 ` [PATCH 2/2] block: loop: delete partitions after clearing & changing fd Ming Lei
2025-03-08 16:14 ` [PATCH 2/5] loop: cleanup lo_rw_aio() Ming Lei
2025-03-08 16:14 ` [PATCH 3/5] loop: add helper loop_queue_work_prep Ming Lei
2025-03-08 16:14 ` [PATCH 4/5] loop: try to handle loop aio command via NOWAIT IO first Ming Lei
2025-03-08 16:14 ` [PATCH 5/5] loop: add module parameter of 'nr_hw_queues' Ming Lei
2025-03-08 16:20 ` [PATCH 0/5] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
-- strict thread matches above, loose matches on Subject: below --
2020-07-07 8:45 [PATCH 0/2] block: loop: delete partitions after clearing & changing fd Ming Lei
2020-07-07 8:45 ` [PATCH 1/2] block: loop: share code of reread partitions Ming Lei
2020-07-07 17:49 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox