From: Ming Lei <ming.lei@redhat.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
io-uring@vger.kernel.org, linux-block@vger.kernel.org,
Kevin Wolf <kwolf@redhat.com>,
ming.lei@redhat.com
Subject: Re: [PATCH V4 4/8] io_uring: support SQE group
Date: Tue, 6 Aug 2024 16:38:08 +0800 [thread overview]
Message-ID: <ZrHg8LUOeM23318x@fedora> (raw)
In-Reply-To: <5fd602d8-0c0b-418a-82bc-955ab0444b1e@gmail.com>
On Mon, Jul 29, 2024 at 02:58:58PM +0100, Pavel Begunkov wrote:
> On 7/25/24 11:33, Ming Lei wrote:
> > On Wed, Jul 24, 2024 at 02:41:38PM +0100, Pavel Begunkov wrote:
> > > On 7/23/24 15:34, Ming Lei wrote:
> ...
> > > > But grp_refs is dropped after io-wq request reference drops to
> > > > zero, then both io-wq and nor-io-wq code path can be unified
> > > > wrt. dealing with grp_refs, meantime it needn't to be updated
> > > > in extra(io-wq) context.
> > >
> > > Let's try to describe how it can work. First, I'm only describing
> > > the dep mode for simplicity. And for the argument's sake we can say
> > > that all CQEs are posted via io_submit_flush_completions.
> > >
> > > io_req_complete_post() {
> > > if (flags & GROUP) {
> > > req->io_task_work.func = io_req_task_complete;
> > > io_req_task_work_add(req);
> > > return;
> > > }
> > > ...
> > > }
> >
> > OK.
> >
> > io_wq_free_work() still need to change to not deal with
> > next link & ignoring skip_cqe, because group handling(
>
> No, it doesn't need to know about all that.
>
> > cqe posting, link advance) is completely moved into
> > io_submit_flush_completions().
>
> It has never been guaranteed that io_req_complete_post()
> will be the one completing the request,
> io_submit_flush_completions() can always happen.
>
>
> struct io_wq_work *io_wq_free_work(struct io_wq_work *work)
> {
> ...
> if (req_ref_put_and_test(req)) {
> nxt = io_req_find_next(req);
> io_free_req();
> }
> }
>
> We queue linked requests only when all refs are dropped, and
> the group handling in my snippet is done before we drop the
> owner's reference.
>
> IOW, you won't hit io_free_req() in io_wq_free_work() for a
> leader unless all members in its group got completed and
> the leader already went through the code dropping those shared
> ublk buffers.
If io_free_req() won't be called for leader, leader won't be added
to ->compl_reqs, and it has to be generated when all members are
completed in __io_submit_flush_completions().
For !io_wq, we can align to this way by not completing leader in
io_req_complete_defer().
The above implementation looks simpler, and more readable.
>
>
> > > You can do it this way, nobody would ever care, and it shouldn't
> > > affect performance. Otherwise everything down below can probably
> > > be extended to io_req_complete_post().
> > >
> > > To avoid confusion in terminology, what I call a member below doesn't
> > > include a leader. IOW, a group leader request is not a member.
> > >
> > > At the init we have:
> > > grp_refs = nr_members; /* doesn't include the leader */
> > >
> > > Let's also say that the group leader can and always goes
> > > through io_submit_flush_completions() twice, just how it's
> > > with your patches.
> > >
> > > 1) The first time we see the leader in io_submit_flush_completions()
> > > is when it's done with resource preparation. For example, it was
> > > doing some IO into a buffer, and now is ready to give that buffer
> > > with data to group members. At this point it should queue up all group
> > > members. And we also drop 1 grp_ref. There will also be no
> > > io_issue_sqe() for it anymore.
> >
> > Ok, then it is just the case with dependency.
> >
> > >
> > > 2) Members are executed and completed, in io_submit_flush_completions()
> > > they drop 1 grp_leader->grp_ref reference each.
> > >
> > > 3) When all members complete, leader's grp_ref becomes 0. Here
> > > the leader is queued for io_submit_flush_completions() a second time,
> > > at which point it drops ublk buffers and such and gets destroyed.
> > >
> > > You can post a CQE in 1) and then set CQE_SKIP. Can also be fitted
> > > into 3). A pseudo code for when we post it in step 1)
> >
> > This way should work, but it confuses application because
> > the leader is completed before all members:
> >
> > - leader usually provide resources in group wide
> > - member consumes this resource
> > - leader is supposed to be completed after all consumer(member) are
> > done.
> >
> > Given it is UAPI, we have to be careful with CQE posting order.
>
> That's exactly what I was telling about the uapi previously,
> I don't believe we want to inverse the natural CQE order.
>
> Regardless, the order can be inversed like this:
>
> __io_flush_completions() {
> ...
> if (req->flags & (SKIP_CQE|GROUP)) {
> if (req->flags & SKIP_CQE)
> continue;
> // post a CQE only when we see it for a 2nd time in io_flush_completion();
> if (is_leader() && !(req->flags & ALREADY_SEEN))
> continue;
I am afraid that ALREADY_SEEN can't work, since leader can only be added
to ->compl_reqs once.
Thanks,
Ming
next prev parent reply other threads:[~2024-08-06 8:38 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-06 3:09 [PATCH V4 0/8] io_uring: support sqe group and provide group kbuf Ming Lei
2024-07-06 3:09 ` [PATCH V4 1/8] io_uring: add io_link_req() helper Ming Lei
2024-07-06 3:09 ` [PATCH V4 2/8] io_uring: add io_submit_fail_link() helper Ming Lei
2024-07-06 3:09 ` [PATCH V4 3/8] io_uring: add helper of io_req_commit_cqe() Ming Lei
2024-07-06 3:09 ` [PATCH V4 4/8] io_uring: support SQE group Ming Lei
2024-07-22 15:33 ` Pavel Begunkov
2024-07-23 14:34 ` Ming Lei
2024-07-24 13:41 ` Pavel Begunkov
2024-07-24 14:54 ` Pavel Begunkov
2024-07-25 10:33 ` Ming Lei
2024-07-29 13:58 ` Pavel Begunkov
2024-08-06 8:38 ` Ming Lei [this message]
2024-08-06 14:26 ` Ming Lei
2024-08-07 3:26 ` Ming Lei
2024-07-06 3:09 ` [PATCH V4 5/8] io_uring: support sqe group with members depending on leader Ming Lei
2024-07-06 3:09 ` [PATCH V4 6/8] io_uring: support providing sqe group buffer Ming Lei
2024-07-06 3:09 ` [PATCH V4 7/8] io_uring/uring_cmd: support provide group kernel buffer Ming Lei
2024-07-06 3:09 ` [PATCH V4 8/8] ublk: support provide io buffer Ming Lei
2024-07-19 1:03 ` [PATCH V4 0/8] io_uring: support sqe group and provide group kbuf Ming Lei
2024-07-19 12:49 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZrHg8LUOeM23318x@fedora \
--to=ming.lei@redhat.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=kwolf@redhat.com \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.