From: Stefan Hajnoczi <stefanha@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCHSET v3 0/5] Add support for epoll min_wait
Date: Tue, 8 Nov 2022 11:10:09 -0500 [thread overview]
Message-ID: <Y2p/YcUFhFDUnLGq@fedora> (raw)
In-Reply-To: <93fa2da5-c81a-d7f8-115c-511ed14dcdbb@kernel.dk>
[-- Attachment #1: Type: text/plain, Size: 3085 bytes --]
On Tue, Nov 08, 2022 at 07:09:30AM -0700, Jens Axboe wrote:
> On 11/8/22 7:00 AM, Stefan Hajnoczi wrote:
> > On Mon, Nov 07, 2022 at 02:38:52PM -0700, Jens Axboe wrote:
> >> On 11/7/22 1:56 PM, Stefan Hajnoczi wrote:
> >>> Hi Jens,
> >>> NICs and storage controllers have interrupt mitigation/coalescing
> >>> mechanisms that are similar.
> >>
> >> Yep
> >>
> >>> NVMe has an Aggregation Time (timeout) and an Aggregation Threshold
> >>> (counter) value. When a completion occurs, the device waits until the
> >>> timeout or until the completion counter value is reached.
> >>>
> >>> If I've read the code correctly, min_wait is computed at the beginning
> >>> of epoll_wait(2). NVMe's Aggregation Time is computed from the first
> >>> completion.
> >>>
> >>> It makes me wonder which approach is more useful for applications. With
> >>> the Aggregation Time approach applications can control how much extra
> >>> latency is added. What do you think about that approach?
> >>
> >> We only tested the current approach, which is time noted from entry, not
> >> from when the first event arrives. I suspect the nvme approach is better
> >> suited to the hw side, the epoll timeout helps ensure that we batch
> >> within xx usec rather than xx usec + whatever the delay until the first
> >> one arrives. Which is why it's handled that way currently. That gives
> >> you a fixed batch latency.
> >
> > min_wait is fine when the goal is just maximizing throughput without any
> > latency targets.
>
> That's not true at all, I think you're in different time scales than
> this would be used for.
>
> > The min_wait approach makes it hard to set a useful upper bound on
> > latency because unlucky requests that complete early experience much
> > more latency than requests that complete later.
>
> As mentioned in the cover letter or the main patch, this is most useful
> for the medium load kind of scenarios. For high load, the min_wait time
> ends up not mattering because you will hit maxevents first anyway. For
> the testing that we did, the target was 2-300 usec, and 200 usec was
> used for the actual test. Depending on what the kind of traffic the
> server is serving, that's usually not much of a concern. From your
> reply, I'm guessing you're thinking of much higher min_wait numbers. I
> don't think those would make sense. If your rate of arrival is low
> enough that min_wait needs to be high to make a difference, then the
> load is low enough anyway that it doesn't matter. Hence I'd argue that
> it is indeed NOT hard to set a useful upper bound on latency, because
> that is very much what min_wait is.
>
> I'm happy to argue merits of one approach over another, but keep in mind
> that this particular approach was not pulled out of thin air AND it has
> actually been tested and verified successfully on a production workload.
> This isn't a hypothetical benchmark kind of setup.
Fair enough. I just wanted to make sure the syscall interface that gets
merged is as useful as possible.
Thanks,
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2022-11-08 16:11 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-30 22:01 [PATCHSET v3 0/5] Add support for epoll min_wait Jens Axboe
2022-10-30 22:01 ` [PATCH 1/6] eventpoll: cleanup branches around sleeping for events Jens Axboe
2022-10-30 22:01 ` [PATCH 2/6] eventpoll: don't pass in 'timed_out' to ep_busy_loop() Jens Axboe
2022-10-30 22:02 ` [PATCH 3/6] eventpoll: split out wait handling Jens Axboe
2022-10-30 22:02 ` [PATCH 4/6] eventpoll: move expires to epoll_wq Jens Axboe
2022-10-30 22:02 ` [PATCH 5/6] eventpoll: move file checking earlier for epoll_ctl() Jens Axboe
2022-10-30 22:02 ` [PATCH 6/6] eventpoll: add support for min-wait Jens Axboe
2022-11-08 22:14 ` Soheil Hassas Yeganeh
2022-11-08 22:20 ` Jens Axboe
2022-11-08 22:25 ` Willem de Bruijn
2022-11-08 22:29 ` Jens Axboe
2022-11-08 22:44 ` Willem de Bruijn
2022-11-08 22:41 ` Soheil Hassas Yeganeh
2022-12-01 18:00 ` Jens Axboe
2022-12-01 18:39 ` Soheil Hassas Yeganeh
2022-12-01 18:41 ` Jens Axboe
2022-11-02 17:46 ` [PATCHSET v3 0/5] Add support for epoll min_wait Willem de Bruijn
2022-11-02 17:54 ` Jens Axboe
2022-11-02 23:09 ` Willem de Bruijn
2022-11-02 23:37 ` Jens Axboe
2022-11-02 23:51 ` Willem de Bruijn
2022-11-02 23:57 ` Jens Axboe
2022-11-05 17:39 ` Jens Axboe
2022-11-05 18:05 ` Willem de Bruijn
2022-11-05 18:46 ` Jens Axboe
2022-11-07 13:25 ` Willem de Bruijn
2022-11-07 14:19 ` Jens Axboe
2022-11-07 10:10 ` David Laight
2022-11-07 20:56 ` Stefan Hajnoczi
2022-11-07 21:38 ` Jens Axboe
2022-11-08 14:00 ` Stefan Hajnoczi
2022-11-08 14:09 ` Jens Axboe
2022-11-08 16:10 ` Stefan Hajnoczi [this message]
2022-11-08 16:15 ` Jens Axboe
2022-11-08 17:24 ` Stefan Hajnoczi
2022-11-08 17:28 ` Jens Axboe
2022-11-08 20:29 ` Stefan Hajnoczi
2022-11-09 10:09 ` David Laight
2022-11-10 10:13 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2p/YcUFhFDUnLGq@fedora \
--to=stefanha@redhat.com \
--cc=axboe@kernel.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).