Re: io_uring and spurious wake-ups from eventfd

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Mark Papadakis <markuspapadakis@icloud.com>
Cc: io-uring@vger.kernel.org
Subject: Re: io_uring and spurious wake-ups from eventfd
Date: Wed, 8 Jan 2020 09:24:28 -0700	[thread overview]
Message-ID: <d949ea3a-bd24-e597-b230-89b7075544cc@kernel.dk> (raw)
In-Reply-To: <4DED8D2F-8F0B-46FB-800D-FEC3F2A5B553@icloud.com>

On 1/8/20 12:36 AM, Mark Papadakis wrote:
> 
> 
>> On 7 Jan 2020, at 10:34 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 1/7/20 1:26 PM, Jens Axboe wrote:
>>> On 1/7/20 8:55 AM, Mark Papadakis wrote:
>>>> This is perhaps an odd request, but if it’s trivial to implement
>>>> support for this described feature, it could help others like it ‘d
>>>> help me (I ‘ve been experimenting with io_uring for some time now).
>>>>
>>>> Being able to register an eventfd with an io_uring context is very
>>>> handy, if you e.g have some sort of reactor thread multiplexing I/O
>>>> using epoll etc, where you want to be notified when there are pending
>>>> CQEs to drain. The problem, such as it is, is that this can result in
>>>> un-necessary/spurious wake-ups.
>>>>
>>>> If, for example, you are monitoring some sockets for EPOLLIN, and when
>>>> poll says you have pending bytes to read from their sockets, and said
>>>> sockets are non-blocking, and for each some reported event you reserve
>>>> an SQE for preadv() to read that data and then you io_uring_enter to
>>>> submit the SQEs, because the data is readily available, as soon as
>>>> io_uring_enter returns, you will have your completions available -
>>>> which you can process.  The “problem” is that poll will wake up
>>>> immediately thereafter in the next reactor loop iteration because
>>>> eventfd was tripped (which is reasonable but un-necessary).
>>>>
>>>> What if there was a flag for io_uring_setup() so that the eventfd
>>>> would only be tripped for CQEs that were processed asynchronously, or,
>>>> if that’s non-trivial, only for CQEs that reference file FDs?
>>>>
>>>> That’d help with that spurious wake-up.
>>>
>>> One easy way to do that would be for the application to signal that it
>>> doesn't want eventfd notifications for certain requests. Like using an
>>> IOSQE_ flag for that. Then you could set that on the requests you submit
>>> in response to triggering an eventfd event.
>>
> 
> 
> Thanks Jens,
> 
> This is great, but perhaps there is a somewhat slightly more optimal
> way to do this.  Ideally, io_uring should trip the eventfd if there
> are any new completions available, that haven’t been produced In the
> context of an io_uring_enter(). That is to say, if any SQEs can be
> immediately served (because data is readily available in
> Buffers/caches in the kernel), then their respective CQEs will be
> produced in the context of that io_uring_enter() that submitted said
> SQEs(and thus the CQEs can be processed immediately after
> io_uring_enter() returns).  So, if any CQEs are placed in the
> respective ring at any other time, but not during an io_uring_enter()
> call, then it means those completions were produced asynchronously,
> and thus the eventfd can be tripped, otherwise, there is no need to
> trip the eventfd at all.
> 
> e.g (pseudocode):
> void produce_completion(cfq_ctx *ctx, const bool in_io_uring_enter_ctx) {
>         cqe_ring_push(cqe_from_ctx(ctx));
>         if (false == in_io_uring_enter_ctx && eventfd_registered()) {
>                 trip_iouring_eventfd();
>         } else {
>                 // don't bother
>         }
> }

I see what you're saying, so essentially only trigger eventfd
notifications if the completions happen async. That does make a lot of
sense, and it would be cleaner than having to flag this per request as
well. I think we'd still need to make that opt-in as it changes the
behavior of it.

The best way to do that would be to add IORING_REGISTER_EVENTFD_ASYNC or
something like that. Does the exact same thing as
IORING_REGISTER_EVENTFD, but only triggers it if completions happen
async.

What do you think?

-- 
Jens Axboe

next prev parent reply	other threads:[~2020-01-08 16:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-07 15:55 io_uring and spurious wake-ups from eventfd Mark Papadakis
2020-01-07 20:26 ` Jens Axboe
2020-01-07 20:34   ` Jens Axboe
2020-01-08  7:36     ` Mark Papadakis
2020-01-08 16:24       ` Jens Axboe [this message]
2020-01-08 16:46         ` Mark Papadakis
2020-01-08 16:50           ` Jens Axboe
2020-01-08 17:20             ` Jens Axboe
2020-01-08 18:08               ` Jens Axboe
2020-01-09  6:09         ` Daurnimator
2020-01-09 15:14           ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d949ea3a-bd24-e597-b230-89b7075544cc@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=markuspapadakis@icloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.