From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org, linux-block@vger.kernel.org,
Pavel Begunkov <asml.silence@gmail.com>,
Kevin Wolf <kwolf@redhat.com>
Subject: Re: [PATCH 5/9] io_uring: support SQE group
Date: Wed, 24 Apr 2024 08:46:41 +0800 [thread overview]
Message-ID: <ZihWcV8+3rfyYxGI@fedora> (raw)
In-Reply-To: <e36cc8de-3726-4479-8fbd-f54fd21465a2@kernel.dk>
On Mon, Apr 22, 2024 at 12:27:28PM -0600, Jens Axboe wrote:
> On 4/7/24 7:03 PM, Ming Lei wrote:
> > SQE group is defined as one chain of SQEs starting with the first sqe that
> > has IOSQE_EXT_SQE_GROUP set, and ending with the first subsequent sqe that
> > doesn't have it set, and it is similar with chain of linked sqes.
> >
> > The 1st SQE is group leader, and the other SQEs are group member. The group
> > leader is always freed after all members are completed. Group members
> > aren't submitted until the group leader is completed, and there isn't any
> > dependency among group members, and IOSQE_IO_LINK can't be set for group
> > members, same with IOSQE_IO_DRAIN.
> >
> > Typically the group leader provides or makes resource, and the other members
> > consume the resource, such as scenario of multiple backup, the 1st SQE is to
> > read data from source file into fixed buffer, the other SQEs write data from
> > the same buffer into other destination files. SQE group provides very
> > efficient way to complete this task: 1) fs write SQEs and fs read SQE can be
> > submitted in single syscall, no need to submit fs read SQE first, and wait
> > until read SQE is completed, 2) no need to link all write SQEs together, then
> > write SQEs can be submitted to files concurrently. Meantime application is
> > simplified a lot in this way.
> >
> > Another use case is to for supporting generic device zero copy:
> >
> > - the lead SQE is for providing device buffer, which is owned by device or
> > kernel, can't be cross userspace, otherwise easy to cause leak for devil
> > application or panic
> >
> > - member SQEs reads or writes concurrently against the buffer provided by lead
> > SQE
>
> In concept, this looks very similar to "sqe bundles" that I played with
> in the past:
>
> https://git.kernel.dk/cgit/linux/log/?h=io_uring-bundle
Indeed, so looks it is something which io_uring needs.
>
> Didn't look too closely yet at the implementation, but in spirit it's
> about the same in that the first entry is processed first, and there's
> no ordering implied between the test of the members of the bundle /
> group.
Yeah.
>
> I do think that's a flexible thing to support, particularly if:
>
> 1) We can do it more efficiently than links, which are pretty horrible.
Agree, link is hard to use in async/.await of modern language per my
experience.
Also sqe group won't break link, and the group is thought as a whole
wrt. linking.
> 2) It enables new worthwhile use cases
> 3) It's done cleanly
> 4) It's easily understandable and easy to document, so that users will
> actually understand what this is and what use cases it enable. Part
> of that is actually naming, it should be readily apparent what a
> group is, what the lead is, and what the members are. Using your
> terminology here, definitely worth spending some time on that to get
> it just right and self evident.
All are nice suggestions, and I will follow above and make them in V2.
Thanks,
Ming
next prev parent reply other threads:[~2024-04-24 0:46 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-08 1:03 [RFC PATCH 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
2024-04-08 1:03 ` [PATCH 1/9] io_uring: net: don't check sqe->__pad2[0] for send zc Ming Lei
2024-04-08 1:03 ` [PATCH 2/9] io_uring: support user sqe ext flags Ming Lei
2024-04-22 18:16 ` Jens Axboe
2024-04-23 13:57 ` Ming Lei
2024-04-29 15:24 ` Pavel Begunkov
2024-04-30 3:43 ` Ming Lei
2024-04-30 12:00 ` Pavel Begunkov
2024-04-30 12:56 ` Ming Lei
2024-04-30 14:10 ` Pavel Begunkov
2024-04-30 15:46 ` Ming Lei
2024-05-02 14:22 ` Pavel Begunkov
2024-05-04 1:19 ` Ming Lei
2024-04-08 1:03 ` [PATCH 3/9] io_uring: add helper for filling cqes in __io_submit_flush_completions() Ming Lei
2024-04-08 1:03 ` [PATCH 4/9] io_uring: add one output argument to io_submit_sqe Ming Lei
2024-04-08 1:03 ` [PATCH 5/9] io_uring: support SQE group Ming Lei
2024-04-22 18:27 ` Jens Axboe
2024-04-23 13:08 ` Kevin Wolf
2024-04-24 1:39 ` Ming Lei
2024-04-25 9:27 ` Kevin Wolf
2024-04-26 7:53 ` Ming Lei
2024-04-26 17:05 ` Kevin Wolf
2024-04-29 3:34 ` Ming Lei
2024-04-29 15:48 ` Pavel Begunkov
2024-04-30 3:07 ` Ming Lei
2024-04-29 15:32 ` Pavel Begunkov
2024-04-30 3:03 ` Ming Lei
2024-04-30 12:27 ` Pavel Begunkov
2024-04-30 15:00 ` Ming Lei
2024-05-02 14:09 ` Pavel Begunkov
2024-05-04 1:56 ` Ming Lei
2024-05-02 14:28 ` Pavel Begunkov
2024-04-24 0:46 ` Ming Lei [this message]
2024-04-08 1:03 ` [PATCH 6/9] io_uring: support providing sqe group buffer Ming Lei
2024-04-08 1:03 ` [PATCH 7/9] io_uring/uring_cmd: support provide group kernel buffer Ming Lei
2024-04-08 1:03 ` [PATCH 8/9] ublk: support provide io buffer Ming Lei
2024-04-08 1:03 ` [RFC PATCH 9/9] liburing: support sqe ext_flags & sqe group Ming Lei
2024-04-19 0:55 ` [RFC PATCH 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZihWcV8+3rfyYxGI@fedora \
--to=ming.lei@redhat.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=kwolf@redhat.com \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.