From: Pavel Begunkov <asml.silence@gmail.com>
To: Usama Arif <usama.arif@bytedance.com>,
Jens Axboe <axboe@kernel.dk>,
io-uring@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: fam.zheng@bytedance.com
Subject: Re: [External] Re: [RFC] io_uring: avoid ring quiesce while registering/unregistering eventfd
Date: Thu, 3 Feb 2022 15:44:21 +0000 [thread overview]
Message-ID: <6cce16d3-e2ca-ca1e-1209-e6e243241231@gmail.com> (raw)
In-Reply-To: <1494b8f0-2f48-0aa1-214c-a02bbc4b05eb@bytedance.com>
On 2/3/22 15:14, Usama Arif wrote:
> On 02/02/2022 19:18, Jens Axboe wrote:
>> On 2/2/22 9:57 AM, Jens Axboe wrote:
>>> On 2/2/22 8:59 AM, Usama Arif wrote:
>>>> Acquire completion_lock at the start of __io_uring_register before
>>>> registering/unregistering eventfd and release it at the end. Hence
>>>> all calls to io_cqring_ev_posted which adds to the eventfd counter
>>>> will finish before acquiring the spin_lock in io_uring_register, and
>>>> all new calls will wait till the eventfd is registered. This avoids
>>>> ring quiesce which is much more expensive than acquiring the
>>>> spin_lock.
>>>>
>>>> On the system tested with this patch, io_uring_reigster with
>>>> IORING_REGISTER_EVENTFD takes less than 1ms, compared to 15ms before.
>>>
>>> This seems like optimizing for the wrong thing, so I've got a few
>>> questions. Are you doing a lot of eventfd registrations (and
>>> unregister) in your workload? Or is it just the initial pain of
>>> registering one? In talking to Pavel, he suggested that RCU might be a
>>> good use case here, and I think so too. That would still remove the
>>> need to quiesce, and the posted side just needs a fairly cheap rcu
>>> read lock/unlock around it.
>>
>> Totally untested, but perhaps can serve as a starting point or
>> inspiration.
>>
>
> Hi,
>
> Thank you for the replies and comments. My usecase registers only one eventfd at the start.
Then it's overkill. Update io_register_op_must_quiesce(), set ->cq_ev_fd
on registration with WRITE_ONCE(), read it in io_cqring_ev_posted* with
READ_ONCE() and you're set.
There is a caveat, ->cq_ev_fd won't be immediately visible to already
inflight requests, but we can say it's the responsibility of the
userspace to wait for a grace period, i.e. for all inflight requests
submitted before registration io_cqring_ev_posted* might or might not
see updated ->cq_ev_fd, which works perfectly if there was no requests
in the first place. Of course it changes the behaviour so will need
a new register opcode.
--
Pavel Begunkov
next prev parent reply other threads:[~2022-02-03 15:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-02 15:59 [RFC] io_uring: avoid ring quiesce while registering/unregistering eventfd Usama Arif
2022-02-02 16:57 ` Jens Axboe
2022-02-02 18:32 ` Pavel Begunkov
2022-02-02 18:39 ` Pavel Begunkov
2022-02-02 19:18 ` Jens Axboe
2022-02-03 15:14 ` [External] " Usama Arif
2022-02-03 15:44 ` Pavel Begunkov [this message]
2022-02-03 15:55 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6cce16d3-e2ca-ca1e-1209-e6e243241231@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=fam.zheng@bytedance.com \
--cc=io-uring@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=usama.arif@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.