From: Pavel Begunkov <asml.silence@gmail.com>
To: Ming Lei <ming.lei@redhat.com>, Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org, linux-block@vger.kernel.org,
Uday Shankar <ushankar@purestorage.com>,
Akilesh Kailash <akailash@google.com>,
Kevin Wolf <kwolf@redhat.com>
Subject: Re: [PATCH V8 4/7] io_uring: support SQE group
Date: Tue, 29 Oct 2024 16:38:55 +0000 [thread overview]
Message-ID: <2bcce6c2-29c5-4b36-9840-7db516a40c41@gmail.com> (raw)
In-Reply-To: <ZyA_dbiU0ho5IJYA@fedora>
On 10/29/24 01:50, Ming Lei wrote:
> On Mon, Oct 28, 2024 at 06:12:34PM -0600, Jens Axboe wrote:
>> On 10/25/24 6:22 AM, Ming Lei wrote:
>>> SQE group is defined as one chain of SQEs starting with the first SQE that
>>> has IOSQE_SQE_GROUP set, and ending with the first subsequent SQE that
>>> doesn't have it set, and it is similar with chain of linked SQEs.
>>>
>>> Not like linked SQEs, each sqe is issued after the previous one is
>>> completed. All SQEs in one group can be submitted in parallel. To simplify
>>> the implementation from beginning, all members are queued after the leader
>>> is completed, however, this way may be changed and leader and members may
>>> be issued concurrently in future.
>>>
>>> The 1st SQE is group leader, and the other SQEs are group member. The whole
>>> group share single IOSQE_IO_LINK and IOSQE_IO_DRAIN from group leader, and
>>> the two flags can't be set for group members. For the sake of
>>> simplicity, IORING_OP_LINK_TIMEOUT is disallowed for SQE group now.
>>>
>>> When the group is in one link chain, this group isn't submitted until the
>>> previous SQE or group is completed. And the following SQE or group can't
>>> be started if this group isn't completed. Failure from any group member will
>>> fail the group leader, then the link chain can be terminated.
>>>
>>> When IOSQE_IO_DRAIN is set for group leader, all requests in this group and
>>> previous requests submitted are drained. Given IOSQE_IO_DRAIN can be set for
>>> group leader only, we respect IO_DRAIN by always completing group leader as
>>> the last one in the group. Meantime it is natural to post leader's CQE
>>> as the last one from application viewpoint.
>>>
>>> Working together with IOSQE_IO_LINK, SQE group provides flexible way to
>>> support N:M dependency, such as:
>>>
>>> - group A is chained with group B together
>>> - group A has N SQEs
>>> - group B has M SQEs
>>>
>>> then M SQEs in group B depend on N SQEs in group A.
>>>
>>> N:M dependency can support some interesting use cases in efficient way:
>>>
>>> 1) read from multiple files, then write the read data into single file
>>>
>>> 2) read from single file, and write the read data into multiple files
>>>
>>> 3) write same data into multiple files, and read data from multiple files and
>>> compare if correct data is written
>>>
>>> Also IOSQE_SQE_GROUP takes the last bit in sqe->flags, but we still can
>>> extend sqe->flags with io_uring context flag, such as use __pad3 for
>>> non-uring_cmd OPs and part of uring_cmd_flags for uring_cmd OP.
>>
>> Since it's taking the last flag, maybe a better idea to have the last
>> flag mean "more flags in (for example) __pad3" and put the new flag
>> there? Not sure you mean in terms of "io_uring context flag", would it
>> be an enter flag? Ring required to be setup with a certain flag? Neither
>> of those seem super encouraging, imho.
>
> I meant:
>
> If "more flags in __pad3" is enabled in future we may claim it as one
> feature to userspace, such as IORING_FEAT_EXT_FLAG.
>
> Will improve the above commit log.
And we can't take it in either case. The field is in a union, and
other opcodes use that part of the SQE. Enabling a generic feature
for a subset of requests only is not a good idea.
--
Pavel Begunkov
next prev parent reply other threads:[~2024-10-29 16:38 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 12:22 [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf Ming Lei
2024-10-25 12:22 ` [PATCH V8 1/7] io_uring: add io_link_req() helper Ming Lei
2024-10-25 12:22 ` [PATCH V8 2/7] io_uring: add io_submit_fail_link() helper Ming Lei
2024-10-25 12:22 ` [PATCH V8 3/7] io_uring: add helper of io_req_commit_cqe() Ming Lei
2024-10-25 12:22 ` [PATCH V8 4/7] io_uring: support SQE group Ming Lei
2024-10-29 0:12 ` Jens Axboe
2024-10-29 1:50 ` Ming Lei
2024-10-29 16:38 ` Pavel Begunkov [this message]
2024-10-31 21:24 ` Jens Axboe
2024-10-31 21:39 ` Jens Axboe
2024-11-01 0:00 ` Jens Axboe
2024-10-25 12:22 ` [PATCH V8 5/7] io_uring: support leased group buffer with REQ_F_GROUP_KBUF Ming Lei
2024-10-29 16:47 ` Pavel Begunkov
2024-10-30 0:45 ` Ming Lei
2024-10-30 1:25 ` Pavel Begunkov
2024-10-30 2:04 ` Ming Lei
2024-10-31 13:16 ` Pavel Begunkov
2024-11-01 1:04 ` Ming Lei
2024-11-03 22:31 ` Pavel Begunkov
2024-11-04 0:16 ` Ming Lei
2024-11-04 1:08 ` Pavel Begunkov
2024-11-04 1:21 ` Ming Lei
2024-11-04 12:23 ` Pavel Begunkov
2024-11-04 13:08 ` Ming Lei
2024-11-04 13:24 ` Pavel Begunkov
2024-11-04 13:35 ` Ming Lei
2024-11-04 16:38 ` Pavel Begunkov
2024-11-05 3:37 ` Ming Lei
2024-10-25 12:22 ` [PATCH V8 6/7] io_uring/uring_cmd: support leasing device kernel buffer to io_uring Ming Lei
2024-10-25 12:22 ` [PATCH V8 7/7] ublk: support leasing io " Ming Lei
2024-10-29 17:01 ` [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf Pavel Begunkov
2024-10-29 17:04 ` Jens Axboe
2024-10-29 19:18 ` Jens Axboe
2024-10-29 20:06 ` Jens Axboe
2024-10-29 21:26 ` Jens Axboe
2024-10-30 2:03 ` Ming Lei
2024-10-30 2:43 ` Jens Axboe
2024-10-30 3:08 ` Ming Lei
2024-10-30 4:11 ` Ming Lei
2024-10-30 13:20 ` Jens Axboe
2024-10-31 2:53 ` Ming Lei
2024-10-31 13:35 ` Jens Axboe
2024-10-31 15:07 ` Jens Axboe
2024-11-01 2:57 ` Ming Lei
2024-11-01 1:39 ` Ming Lei
2024-10-31 13:42 ` Pavel Begunkov
2024-10-30 13:18 ` Jens Axboe
2024-10-31 13:25 ` Pavel Begunkov
2024-10-31 14:29 ` Jens Axboe
2024-10-31 15:25 ` Pavel Begunkov
2024-10-31 15:42 ` Jens Axboe
2024-10-31 16:29 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2bcce6c2-29c5-4b36-9840-7db516a40c41@gmail.com \
--to=asml.silence@gmail.com \
--cc=akailash@google.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=kwolf@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=ushankar@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.