Re: [PATCH V8 4/7] io_uring: support SQE group

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Pavel Begunkov <asml.silence@gmail.com>
To: Ming Lei <ming.lei@redhat.com>, Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org, linux-block@vger.kernel.org,
	Uday Shankar <ushankar@purestorage.com>,
	Akilesh Kailash <akailash@google.com>,
	Kevin Wolf <kwolf@redhat.com>
Subject: Re: [PATCH V8 4/7] io_uring: support SQE group
Date: Tue, 29 Oct 2024 16:38:55 +0000	[thread overview]
Message-ID: <2bcce6c2-29c5-4b36-9840-7db516a40c41@gmail.com> (raw)
In-Reply-To: <ZyA_dbiU0ho5IJYA@fedora>

On 10/29/24 01:50, Ming Lei wrote:
> On Mon, Oct 28, 2024 at 06:12:34PM -0600, Jens Axboe wrote:
>> On 10/25/24 6:22 AM, Ming Lei wrote:
>>> SQE group is defined as one chain of SQEs starting with the first SQE that
>>> has IOSQE_SQE_GROUP set, and ending with the first subsequent SQE that
>>> doesn't have it set, and it is similar with chain of linked SQEs.
>>>
>>> Not like linked SQEs, each sqe is issued after the previous one is
>>> completed. All SQEs in one group can be submitted in parallel. To simplify
>>> the implementation from beginning, all members are queued after the leader
>>> is completed, however, this way may be changed and leader and members may
>>> be issued concurrently in future.
>>>
>>> The 1st SQE is group leader, and the other SQEs are group member. The whole
>>> group share single IOSQE_IO_LINK and IOSQE_IO_DRAIN from group leader, and
>>> the two flags can't be set for group members. For the sake of
>>> simplicity, IORING_OP_LINK_TIMEOUT is disallowed for SQE group now.
>>>
>>> When the group is in one link chain, this group isn't submitted until the
>>> previous SQE or group is completed. And the following SQE or group can't
>>> be started if this group isn't completed. Failure from any group member will
>>> fail the group leader, then the link chain can be terminated.
>>>
>>> When IOSQE_IO_DRAIN is set for group leader, all requests in this group and
>>> previous requests submitted are drained. Given IOSQE_IO_DRAIN can be set for
>>> group leader only, we respect IO_DRAIN by always completing group leader as
>>> the last one in the group. Meantime it is natural to post leader's CQE
>>> as the last one from application viewpoint.
>>>
>>> Working together with IOSQE_IO_LINK, SQE group provides flexible way to
>>> support N:M dependency, such as:
>>>
>>> - group A is chained with group B together
>>> - group A has N SQEs
>>> - group B has M SQEs
>>>
>>> then M SQEs in group B depend on N SQEs in group A.
>>>
>>> N:M dependency can support some interesting use cases in efficient way:
>>>
>>> 1) read from multiple files, then write the read data into single file
>>>
>>> 2) read from single file, and write the read data into multiple files
>>>
>>> 3) write same data into multiple files, and read data from multiple files and
>>> compare if correct data is written
>>>
>>> Also IOSQE_SQE_GROUP takes the last bit in sqe->flags, but we still can
>>> extend sqe->flags with io_uring context flag, such as use __pad3 for
>>> non-uring_cmd OPs and part of uring_cmd_flags for uring_cmd OP.
>>
>> Since it's taking the last flag, maybe a better idea to have the last
>> flag mean "more flags in (for example) __pad3" and put the new flag
>> there? Not sure you mean in terms of "io_uring context flag", would it
>> be an enter flag? Ring required to be setup with a certain flag? Neither
>> of those seem super encouraging, imho.
> 
> I meant:
> 
> If "more flags in __pad3" is enabled in future we may claim it as one
> feature to userspace, such as IORING_FEAT_EXT_FLAG.
> 
> Will improve the above commit log.

And we can't take it in either case. The field is in a union, and
other opcodes use that part of the SQE. Enabling a generic feature
for a subset of requests only is not a good idea.

-- 
Pavel Begunkov

next prev parent reply	other threads:[~2024-10-29 16:38 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25 12:22 [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf Ming Lei
2024-10-25 12:22 ` [PATCH V8 1/7] io_uring: add io_link_req() helper Ming Lei
2024-10-25 12:22 ` [PATCH V8 2/7] io_uring: add io_submit_fail_link() helper Ming Lei
2024-10-25 12:22 ` [PATCH V8 3/7] io_uring: add helper of io_req_commit_cqe() Ming Lei
2024-10-25 12:22 ` [PATCH V8 4/7] io_uring: support SQE group Ming Lei
2024-10-29  0:12   ` Jens Axboe
2024-10-29  1:50     ` Ming Lei
2024-10-29 16:38       ` Pavel Begunkov [this message]
2024-10-31 21:24   ` Jens Axboe
2024-10-31 21:39     ` Jens Axboe
2024-11-01  0:00       ` Jens Axboe
2024-10-25 12:22 ` [PATCH V8 5/7] io_uring: support leased group buffer with REQ_F_GROUP_KBUF Ming Lei
2024-10-29 16:47   ` Pavel Begunkov
2024-10-30  0:45     ` Ming Lei
2024-10-30  1:25       ` Pavel Begunkov
2024-10-30  2:04         ` Ming Lei
2024-10-31 13:16           ` Pavel Begunkov
2024-11-01  1:04             ` Ming Lei
2024-11-03 22:31               ` Pavel Begunkov
2024-11-04  0:16                 ` Ming Lei
2024-11-04  1:08                   ` Pavel Begunkov
2024-11-04  1:21                     ` Ming Lei
2024-11-04 12:23                       ` Pavel Begunkov
2024-11-04 13:08                         ` Ming Lei
2024-11-04 13:24                           ` Pavel Begunkov
2024-11-04 13:35                             ` Ming Lei
2024-11-04 16:38                               ` Pavel Begunkov
2024-11-05  3:37                                 ` Ming Lei
2024-10-25 12:22 ` [PATCH V8 6/7] io_uring/uring_cmd: support leasing device kernel buffer to io_uring Ming Lei
2024-10-25 12:22 ` [PATCH V8 7/7] ublk: support leasing io " Ming Lei
2024-10-29 17:01 ` [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf Pavel Begunkov
2024-10-29 17:04   ` Jens Axboe
2024-10-29 19:18     ` Jens Axboe
2024-10-29 20:06       ` Jens Axboe
2024-10-29 21:26         ` Jens Axboe
2024-10-30  2:03           ` Ming Lei
2024-10-30  2:43             ` Jens Axboe
2024-10-30  3:08               ` Ming Lei
2024-10-30  4:11                 ` Ming Lei
2024-10-30 13:20                   ` Jens Axboe
2024-10-31  2:53                     ` Ming Lei
2024-10-31 13:35                       ` Jens Axboe
2024-10-31 15:07                         ` Jens Axboe
2024-11-01  2:57                           ` Ming Lei
2024-11-01  1:39                         ` Ming Lei
2024-10-31 13:42                       ` Pavel Begunkov
2024-10-30 13:18                 ` Jens Axboe
2024-10-31 13:25               ` Pavel Begunkov
2024-10-31 14:29                 ` Jens Axboe
2024-10-31 15:25                   ` Pavel Begunkov
2024-10-31 15:42                     ` Jens Axboe
2024-10-31 16:29                       ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2bcce6c2-29c5-4b36-9840-7db516a40c41@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=akailash@google.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=kwolf@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).