From: Pavel Begunkov <asml.silence@gmail.com>
To: Hao Xu <haoxu@linux.alibaba.com>, Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org, Joseph Qi <joseph.qi@linux.alibaba.com>
Subject: Re: [PATCH 1/6] io_uring: enhance flush completion logic
Date: Fri, 3 Sep 2021 13:27:41 +0100 [thread overview]
Message-ID: <5dd28d14-24b5-e2dc-aa55-a68cb5d9f4e8@gmail.com> (raw)
In-Reply-To: <302430f6-f83e-4096-448e-9d35f8f4303e@linux.alibaba.com>
On 9/3/21 1:08 PM, Hao Xu wrote:
> 在 2021/9/3 下午7:42, Pavel Begunkov 写道:
>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>> Though currently refcount of a req is always one when we flush inline
>>
>> It can be refcounted and != 1. E.g. poll requests, or consider
> It seems poll requests don't leverage comp cache, do I miss anything?
Hmm, looks so. Not great that it doesn't, but probably it's because
of trying to submit next reqs right in io_poll_task_func().
I'll be pushing for some changes around tw, with it should be easy
to hook poll completion batching with no drawbacks. Would be great
if you will be willing to take a shot on it.
>> that tw also flushes, and you may have a read that goes to apoll
>> and then get tw resubmitted from io_async_task_func(). And other
> when it goes to apoll, (say no double wait entry) ref is 1, then read
> completes inline and then the only ref is droped by flush.
Yep, but some might have double entry. It also will have elevated
refs if there was a linked timeout. Another case is io-wq, which
takes a reference, and if iowq doesn't put it for a while (e.g. got
rescheduled) but requests goes to tw (resubmit, poll, whatever)
you have the same picture.
>> cases.
>>
>>> completions, but still a chance there will be exception in the future.
>>> Enhance the flush logic to make sure we maintain compl_nr correctly.
>>
>> See below
>>
>>>
>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>> ---
>>>
>>> we need to either removing the if check to claim clearly that the req's
>>> refcount is 1 or adding this patch's logic.
>>>
>>> fs/io_uring.c | 6 ++++--
>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 2bde732a1183..c48d43207f57 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -2291,7 +2291,7 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
>>> __must_hold(&ctx->uring_lock)
>>> {
>>> struct io_submit_state *state = &ctx->submit_state;
>>> - int i, nr = state->compl_nr;
>>> + int i, nr = state->compl_nr, remain = 0;
>>> struct req_batch rb;
>>> spin_lock(&ctx->completion_lock);
>>> @@ -2311,10 +2311,12 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
>>> if (req_ref_put_and_test(req))
>>> io_req_free_batch(&rb, req, &ctx->submit_state);
>>> + else
>>> + state->compl_reqs[remain++] = state->compl_reqs[i];
>>
>> Our ref is dropped, and something else holds another reference. That
>> "something else" is responsible to free the request once it put the last
>> reference. This chunk would make the following io_submit_flush_completions()
>> to underflow refcount and double free.
> True, I see. Thanks Pavel.
>>
>>> }
>>> io_req_free_batch_finish(ctx, &rb);
>>> - state->compl_nr = 0;
>>> + state->compl_nr = remain;
>>> }
>>> /*
>>>
>>
>
--
Pavel Begunkov
next prev parent reply other threads:[~2021-09-03 12:28 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-03 11:00 [RFC 0/6] fast poll multishot mode Hao Xu
2021-09-03 11:00 ` [PATCH 1/6] io_uring: enhance flush completion logic Hao Xu
2021-09-03 11:42 ` Pavel Begunkov
2021-09-03 12:08 ` Hao Xu
2021-09-03 12:27 ` Pavel Begunkov [this message]
2021-09-03 13:38 ` Hao Xu
2021-09-17 18:49 ` Hao Xu
2021-09-03 11:00 ` [PATCH 2/6] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
2021-09-03 11:00 ` [PATCH 3/6] io_uring: add REQ_F_APOLL_MULTISHOT for requests Hao Xu
2021-09-03 11:00 ` [PATCH 4/6] io_uring: let fast poll support multishot Hao Xu
2021-09-06 15:56 ` Pavel Begunkov
2021-09-06 17:40 ` Hao Xu
2021-09-06 19:09 ` Pavel Begunkov
2021-09-07 6:38 ` Hao Xu
2021-09-06 19:04 ` Pavel Begunkov
2021-09-07 6:48 ` Hao Xu
2021-09-08 11:21 ` Hao Xu
2021-09-08 12:03 ` Pavel Begunkov
2021-09-08 13:13 ` Pavel Begunkov
2021-09-09 7:01 ` Hao Xu
2021-09-09 8:29 ` Hao Xu
2021-09-11 10:49 ` Pavel Begunkov
2021-09-11 20:19 ` Hao Xu
2021-09-03 11:00 ` [PATCH 5/6] io_uring: implement multishot mode for accept Hao Xu
2021-09-04 22:39 ` Pavel Begunkov
2021-09-04 22:40 ` Pavel Begunkov
2021-09-06 15:34 ` Pavel Begunkov
2021-09-03 11:00 ` [PATCH 6/6] io_uring: enable " Hao Xu
2021-09-03 16:29 ` Jens Axboe
2021-09-04 15:34 ` Hao Xu
2021-09-04 18:40 ` Jens Axboe
2021-09-04 22:46 ` Pavel Begunkov
2021-09-05 7:29 ` Hao Xu
2021-09-05 19:44 ` Jens Axboe
2021-09-06 8:26 ` Hao Xu
2021-09-06 8:28 ` Hao Xu
2021-09-06 13:24 ` Jens Axboe
2021-09-06 12:35 ` Hao Xu
2021-09-06 13:31 ` Jens Axboe
2021-09-06 15:00 ` Hao Xu
2021-09-06 15:32 ` Pavel Begunkov
2021-09-06 16:42 ` Jens Axboe
2021-09-04 22:43 ` Pavel Begunkov
2021-09-05 6:25 ` Hao Xu
2021-09-05 8:27 ` Pavel Begunkov
2021-09-03 11:02 ` [RFC 0/6] fast poll multishot mode Hao Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5dd28d14-24b5-e2dc-aa55-a68cb5d9f4e8@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=haoxu@linux.alibaba.com \
--cc=io-uring@vger.kernel.org \
--cc=joseph.qi@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.