From: Jens Axboe <axboe@kernel.dk>
To: Kanchan Joshi <joshi.k@samsung.com>
Cc: hch@lst.de, kbusch@kernel.org, asml.silence@gmail.com,
io-uring@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-block@vger.kernel.org, gost.dev@samsung.com
Subject: Re: [PATCH for-next v3 0/4] fixed-buffer for uring-cmd/passthrough
Date: Fri, 2 Sep 2022 15:25:33 -0600 [thread overview]
Message-ID: <c62c977d-9e81-c84c-e17c-e057295c071e@kernel.dk> (raw)
In-Reply-To: <48856ca4-5158-154e-a1f5-124aadc9780f@kernel.dk>
[-- Attachment #1: Type: text/plain, Size: 3491 bytes --]
On 9/2/22 1:32 PM, Jens Axboe wrote:
> On 9/2/22 12:46 PM, Kanchan Joshi wrote:
>> On Fri, Sep 02, 2022 at 10:32:16AM -0600, Jens Axboe wrote:
>>> On 9/2/22 10:06 AM, Jens Axboe wrote:
>>>> On 9/2/22 9:16 AM, Kanchan Joshi wrote:
>>>>> Hi,
>>>>>
>>>>> Currently uring-cmd lacks the ability to leverage the pre-registered
>>>>> buffers. This series adds the support in uring-cmd, and plumbs
>>>>> nvme passthrough to work with it.
>>>>>
>>>>> Using registered-buffers showed peak-perf hike from 1.85M to 2.17M IOPS
>>>>> in my setup.
>>>>>
>>>>> Without fixedbufs
>>>>> *****************
>>>>> # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1
>>>>> submitter=0, tid=5256, file=/dev/ng0n1, node=-1
>>>>> polled=0, fixedbufs=0/0, register_files=1, buffered=1, QD=128
>>>>> Engine=io_uring, sq_ring=128, cq_ring=128
>>>>> IOPS=1.85M, BW=904MiB/s, IOS/call=32/31
>>>>> IOPS=1.85M, BW=903MiB/s, IOS/call=32/32
>>>>> IOPS=1.85M, BW=902MiB/s, IOS/call=32/32
>>>>> ^CExiting on signal
>>>>> Maximum IOPS=1.85M
>>>>
>>>> With the poll support queued up, I ran this one as well. tldr is:
>>>>
>>>> bdev (non pt)??? 122M IOPS
>>>> irq driven??? 51-52M IOPS
>>>> polled??????? 71M IOPS
>>>> polled+fixed??? 78M IOPS
>>
>> except first one, rest three entries are for passthru? somehow I didn't
>> see that big of a gap. I will try to align my setup in coming days.
>
> Right, sorry it was badly labeled. First one is bdev with polling,
> registered buffers, etc. The others are all the passthrough mode. polled
> goes to 74M with the caching fix, so it's about a 74M -> 82M bump using
> registered buffers with passthrough and polling.
>
>>> polled+fixed??? 82M
>>>
>>> I suspect the remainder is due to the lack of batching on the request
>>> freeing side, at least some of it. Haven't really looked deeper yet.
>>>
>>> One issue I saw - try and use passthrough polling without having any
>>> poll queues defined and it'll stall just spinning on completions. You
>>> need to ensure that these are processed as well - look at how the
>>> non-passthrough io_uring poll path handles it.
>>
>> Had tested this earlier, and it used to run fine. And it does not now.
>> I see that io are getting completed, irq-completion is arriving in nvme
>> and it is triggering task-work based completion (by calling
>> io_uring_cmd_complete_in_task). But task-work never got called and
>> therefore no completion happened.
>>
>> io_uring_cmd_complete_in_task -> io_req_task_work_add -> __io_req_task_work_add
>>
>> Seems task work did not get added. Something about newly added
>> IORING_SETUP_DEFER_TASKRUN changes the scenario.
>>
>> static inline void __io_req_task_work_add(struct io_kiocb *req, bool allow_local)
>> {
>> ?????? struct io_uring_task *tctx = req->task->io_uring;
>> ?????? struct io_ring_ctx *ctx = req->ctx;
>> ?????? struct llist_node *node;
>>
>> ?????? if (allow_local && ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
>> ?????????????? io_req_local_work_add(req);
>> ?????????????? return;
>> ?????? }
>> ????....
>>
>> To confirm, I commented that in t/io_uring and it runs fine.
>> Please see if that changes anything for you? I will try to find the
>> actual fix tomorow.
>
> Ah gotcha, yes that actually makes a lot of sense. I wonder if regular
> polling is then also broken without poll queues if
> IORING_SETUP_DEFER_TASKRUN is set. It should be, I'll check into
> io_iopoll_check().
A mix of fixes and just cleanups, here's what I got.
--
Jens Axboe
[-- Attachment #2: 0001-io_uring-cleanly-separate-request-types-for-iopoll.patch --]
[-- Type: text/x-patch, Size: 1663 bytes --]
From 50155186644a352b290b72c61e738f62640d566a Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Fri, 2 Sep 2022 15:16:29 -0600
Subject: [PATCH 1/3] io_uring: cleanly separate request types for iopoll
After the addition of iopoll support for passthrough, there's a bit of
a mixup here. Clean it up and get rid of the casting for the passthrough
command type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
io_uring/rw.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 9698a789b3d5..3f03b6d2a5a3 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -994,7 +994,7 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
wq_list_for_each(pos, start, &ctx->iopoll_list) {
struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
- struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+ struct file *file = req->file;
int ret;
/*
@@ -1006,12 +1006,15 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
break;
if (req->opcode == IORING_OP_URING_CMD) {
- struct io_uring_cmd *ioucmd = (struct io_uring_cmd *)rw;
+ struct io_uring_cmd *ioucmd;
- ret = req->file->f_op->uring_cmd_iopoll(ioucmd);
- } else
- ret = rw->kiocb.ki_filp->f_op->iopoll(&rw->kiocb, &iob,
- poll_flags);
+ ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
+ ret = file->f_op->uring_cmd_iopoll(ioucmd, poll_flags);
+ } else {
+ struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+
+ ret = file->f_op->iopoll(&rw->kiocb, &iob, poll_flags);
+ }
if (unlikely(ret < 0))
return ret;
else if (ret)
--
2.35.1
[-- Attachment #3: 0002-nvme-use-separate-end-IO-handler-for-IOPOLL.patch --]
[-- Type: text/x-patch, Size: 2457 bytes --]
From 0bc78c843b8636dcdfe45dd07328ca826fa67f9b Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Fri, 2 Sep 2022 15:17:30 -0600
Subject: [PATCH 2/3] nvme: use separate end IO handler for IOPOLL
Don't need to rely on the cookie or request type, set the right handler
based on how we're handling the IO.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
drivers/nvme/host/ioctl.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 7756b439a688..f34abe95821e 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -385,25 +385,36 @@ static void nvme_uring_task_cb(struct io_uring_cmd *ioucmd)
io_uring_cmd_done(ioucmd, status, result);
}
-static void nvme_uring_cmd_end_io(struct request *req, blk_status_t err)
+static void nvme_uring_iopoll_cmd_end_io(struct request *req, blk_status_t err)
{
struct io_uring_cmd *ioucmd = req->end_io_data;
struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);
/* extract bio before reusing the same field for request */
struct bio *bio = pdu->bio;
- void *cookie = READ_ONCE(ioucmd->cookie);
pdu->req = req;
req->bio = bio;
/*
* For iopoll, complete it directly.
- * Otherwise, move the completion to task work.
*/
- if (cookie != NULL && blk_rq_is_poll(req))
- nvme_uring_task_cb(ioucmd);
- else
- io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb);
+ nvme_uring_task_cb(ioucmd);
+}
+
+static void nvme_uring_cmd_end_io(struct request *req, blk_status_t err)
+{
+ struct io_uring_cmd *ioucmd = req->end_io_data;
+ struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);
+ /* extract bio before reusing the same field for request */
+ struct bio *bio = pdu->bio;
+
+ pdu->req = req;
+ req->bio = bio;
+
+ /*
+ * Move the completion to task work.
+ */
+ io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb);
}
static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
@@ -464,7 +475,10 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
blk_flags);
if (IS_ERR(req))
return PTR_ERR(req);
- req->end_io = nvme_uring_cmd_end_io;
+ if (issue_flags & IO_URING_F_IOPOLL)
+ req->end_io = nvme_uring_iopoll_cmd_end_io;
+ else
+ req->end_io = nvme_uring_cmd_end_io;
req->end_io_data = ioucmd;
if (issue_flags & IO_URING_F_IOPOLL && rq_flags & REQ_POLLED) {
--
2.35.1
[-- Attachment #4: 0003-fs-add-batch-and-poll-flags-to-the-uring_cmd_iopoll-.patch --]
[-- Type: text/x-patch, Size: 4261 bytes --]
From 9daa39b146f3a8f412196df5eb9f9686f308e5cc Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Fri, 2 Sep 2022 15:18:05 -0600
Subject: [PATCH 3/3] fs: add batch and poll flags to the uring_cmd_iopoll()
handler
We need the poll_flags to know how to poll for the IO, and we should
have the batch structure in preparation for supporting batched
completions with iopoll.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
drivers/nvme/host/ioctl.c | 12 ++++++++----
drivers/nvme/host/nvme.h | 6 ++++--
include/linux/fs.h | 3 ++-
io_uring/rw.c | 3 ++-
4 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index f34abe95821e..7a0b12ef49ae 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -637,7 +637,9 @@ int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags)
return nvme_ns_uring_cmd(ns, ioucmd, issue_flags);
}
-int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
+int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+ struct io_comp_batch *iob,
+ unsigned int poll_flags)
{
struct bio *bio;
int ret = 0;
@@ -650,7 +652,7 @@ int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
struct nvme_ns, cdev);
q = ns->queue;
if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio && bio->bi_bdev)
- ret = bio_poll(bio, NULL, 0);
+ ret = bio_poll(bio, iob, poll_flags);
rcu_read_unlock();
return ret;
}
@@ -736,7 +738,9 @@ int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd,
return ret;
}
-int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
+int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+ struct io_comp_batch *iob,
+ unsigned int poll_flags)
{
struct cdev *cdev = file_inode(ioucmd->file)->i_cdev;
struct nvme_ns_head *head = container_of(cdev, struct nvme_ns_head, cdev);
@@ -752,7 +756,7 @@ int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
q = ns->queue;
if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio
&& bio->bi_bdev)
- ret = bio_poll(bio, NULL, 0);
+ ret = bio_poll(bio, iob, poll_flags);
rcu_read_unlock();
}
srcu_read_unlock(&head->srcu, srcu_idx);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index fdcbc93dea21..216acbe953b3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -821,8 +821,10 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd,
unsigned long arg);
long nvme_dev_ioctl(struct file *file, unsigned int cmd,
unsigned long arg);
-int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd);
-int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd);
+int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+ struct io_comp_batch *iob, unsigned int poll_flags);
+int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+ struct io_comp_batch *iob, unsigned int poll_flags);
int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd,
unsigned int issue_flags);
int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d6badd19784f..01681d061a6a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2132,7 +2132,8 @@ struct file_operations {
loff_t len, unsigned int remap_flags);
int (*fadvise)(struct file *, loff_t, loff_t, int);
int (*uring_cmd)(struct io_uring_cmd *ioucmd, unsigned int issue_flags);
- int (*uring_cmd_iopoll)(struct io_uring_cmd *ioucmd);
+ int (*uring_cmd_iopoll)(struct io_uring_cmd *, struct io_comp_batch *,
+ unsigned int poll_flags);
} __randomize_layout;
struct inode_operations {
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 3f03b6d2a5a3..4a061326c664 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -1009,7 +1009,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
struct io_uring_cmd *ioucmd;
ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
- ret = file->f_op->uring_cmd_iopoll(ioucmd, poll_flags);
+ ret = file->f_op->uring_cmd_iopoll(ioucmd, &iob,
+ poll_flags);
} else {
struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
--
2.35.1
next prev parent reply other threads:[~2022-09-02 21:25 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20220902152701epcas5p1d4aca8eebc90fb96ac7ed5a8270816cf@epcas5p1.samsung.com>
2022-09-02 15:16 ` [PATCH for-next v3 0/4] fixed-buffer for uring-cmd/passthrough Kanchan Joshi
2022-09-02 15:16 ` [PATCH for-next v3 1/4] io_uring: introduce io_uring_cmd_import_fixed Kanchan Joshi
2022-09-02 15:16 ` [PATCH for-next v3 2/4] io_uring: introduce fixed buffer support for io_uring_cmd Kanchan Joshi
2022-09-02 23:13 ` Jens Axboe
2022-09-02 15:16 ` [PATCH for-next v3 3/4] block: add helper to map bvec iterator for passthrough Kanchan Joshi
2022-09-02 23:14 ` Jens Axboe
2022-09-02 15:16 ` [PATCH for-next v3 4/4] nvme: wire up fixed buffer support for nvme passthrough Kanchan Joshi
2022-09-02 16:06 ` [PATCH for-next v3 0/4] fixed-buffer for uring-cmd/passthrough Jens Axboe
2022-09-02 16:32 ` Jens Axboe
2022-09-02 18:46 ` Kanchan Joshi
2022-09-02 19:32 ` Jens Axboe
2022-09-02 21:25 ` Jens Axboe [this message]
2022-09-03 9:34 ` Kanchan Joshi
2022-09-03 17:00 ` Jens Axboe
2022-09-04 17:01 ` Kanchan Joshi
2022-09-04 20:17 ` Jens Axboe
2022-09-05 5:52 ` Kanchan Joshi
2022-09-05 17:48 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c62c977d-9e81-c84c-e17c-e057295c071e@kernel.dk \
--to=axboe@kernel.dk \
--cc=asml.silence@gmail.com \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=io-uring@vger.kernel.org \
--cc=joshi.k@samsung.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.