From: Martin Karsten <mkarsten@uwaterloo.ca>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Joe Damato <jdamato@fastly.com>
Cc: Samiullah Khawaja <skhawaja@google.com>,
Stanislav Fomichev <sdf@fomichev.me>,
netdev@vger.kernel.org, amritha.nambiar@intel.com,
sridhar.samudrala@intel.com,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Breno Leitao <leitao@debian.org>,
Christian Brauner <brauner@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Jan Kara <jack@suse.cz>,
Jiri Pirko <jiri@resnulli.us>,
Johannes Berg <johannes.berg@intel.com>,
Jonathan Corbet <corbet@lwn.net>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
"open list:FILESYSTEMS (VFS and infrastructure)"
<linux-fsdevel@vger.kernel.org>,
open list <linux-kernel@vger.kernel.org>,
Lorenzo Bianconi <lorenzo@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [RFC net-next 0/5] Suspend IRQs during preferred busy poll
Date: Sun, 18 Aug 2024 10:51:04 -0400 [thread overview]
Message-ID: <4dc65899-e599-43e3-8f95-585d3489b424@uwaterloo.ca> (raw)
In-Reply-To: <66c1ef2a2e94c_362202942d@willemb.c.googlers.com.notmuch>
On 2024-08-18 08:55, Willem de Bruijn wrote:
>>>>>> The value may not be obvious, but guidance (in the form of
>>>>>> documentation) can be provided.
>>>>>
>>>>> Okay. Could you share a stab at what that would look like?
>>>>
>>>> The timeout needs to be large enough that an application can get a
>>>> meaningful number of incoming requests processed without softirq
>>>> interference. At the same time, the timeout value determines the
>>>> worst-case delivery delay that a concurrent application using the same
>>>> queue(s) might experience. Please also see my response to Samiullah
>>>> quoted above. The specific circumstances and trade-offs might vary,
>>>> that's why a simple constant likely won't do.
>>>
>>> Thanks. I really do mean this as an exercise of what documentation in
>>> Documentation/networking/napi.rst will look like. That helps makes the
>>> case that the interface is reasonably ease to use (even if only
>>> targeting advanced users).
>>>
>>> How does a user measure how much time a process will spend on
>>> processing a meaningful number of incoming requests, for instance.
>>> In practice, probably just a hunch?
>>
>> As an example, we measure around 1M QPS in our experiments, fully
>> utilizing 8 cores and knowing that memcached is quite scalable. Thus we
>> can conclude a single request takes about 8 us processing time on
>> average. That has led us to a 20 us small timeout (gro_flush_timeout),
>> enough to make sure that a single request is likely not interfered with,
>> but otherwise as small as possible. If multiple requests arrive, the
>> system will quickly switch back to polling mode.
>>
>> At the other end, we have picked a very large irq_suspend_timeout of
>> 20,000 us to demonstrate that it does not negatively impact latency.
>> This would cover 2,500 requests, which is likely excessive, but was
>> chosen for demonstration purposes. One can easily measure the
>> distribution of epoll_wait batch sizes and batch sizes as low as 64 are
>> already very efficient, even in high-load situations.
>
> Overall Ack on both your and Joe's responses.
>
> epoll_wait disables the suspend if no events are found and ep_poll
> would go to sleep. As the paper also hints, the timeout is only there
> for misbehaving applications that stop calling epoll_wait, correct?
> If so, then picking a value is not that critical, as long as not too
> low to do meaningful work.
Correct.
>> Also see next paragraph.
>>
>>> Playing devil's advocate some more: given that ethtool usecs have to
>>> be chosen with a similar trade-off between latency and efficiency,
>>> could a multiplicative factor of this (or gro_flush_timeout, same
>>> thing) be sufficient and easier to choose? The documentation does
>>> state that the value chosen must be >= gro_flush_timeout.
>>
>> I believe this would take away flexibility without gaining much. You'd
>> still want some sort of admin-controlled 'enable' flag, so you'd still
>> need some kind of parameter.
>>
>> When using our scheme, the factor between gro_flush_timeout and
>> irq_suspend_timeout should *roughly* correspond to the maximum batch
>> size that an application would process in one go (orders of magnitude,
>> see above). This determines both the target application's worst-case
>> latency as well as the worst-case latency of concurrent applications, if
>> any, as mentioned previously.
>
> Oh is concurrent applications the argument against a very high
> timeout?
Only in the error case. If suspend_irq_timeout is large enough as you
point out above, then as long as the target application behaves well,
its batching settings are the determining factor.
Thanks,
Martin
next prev parent reply other threads:[~2024-08-18 14:51 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-12 12:57 [RFC net-next 0/5] Suspend IRQs during preferred busy poll Joe Damato
2024-08-12 12:57 ` [RFC net-next 4/5] eventpoll: Trigger napi_busy_loop, if prefer_busy_poll is set Joe Damato
2024-08-12 13:19 ` Christoph Hellwig
2024-08-12 16:17 ` Matthew Wilcox
2024-08-12 17:49 ` Joe Damato
2024-08-12 17:46 ` Joe Damato
2024-08-12 12:57 ` [RFC net-next 5/5] eventpoll: Control irq suspension for prefer_busy_poll Joe Damato
2024-08-12 20:20 ` Stanislav Fomichev
2024-08-12 21:47 ` Martin Karsten
2024-08-12 20:19 ` [RFC net-next 0/5] Suspend IRQs during preferred busy poll Stanislav Fomichev
2024-08-12 21:46 ` Martin Karsten
2024-08-12 23:03 ` Stanislav Fomichev
2024-08-13 0:04 ` Martin Karsten
2024-08-13 1:54 ` Stanislav Fomichev
2024-08-13 2:35 ` Martin Karsten
2024-08-13 4:07 ` Stanislav Fomichev
2024-08-13 13:18 ` Martin Karsten
2024-08-14 3:16 ` Willem de Bruijn
2024-08-14 14:19 ` Joe Damato
2024-08-14 15:08 ` Willem de Bruijn
2024-08-14 15:46 ` Joe Damato
2024-08-14 19:53 ` Samiullah Khawaja
2024-08-14 20:42 ` Martin Karsten
2024-08-16 14:27 ` Willem de Bruijn
2024-08-16 14:59 ` Willem de Bruijn
2024-08-16 15:25 ` Joe Damato
2024-08-16 17:01 ` Willem de Bruijn
2024-08-16 20:03 ` Martin Karsten
2024-08-16 20:58 ` Willem de Bruijn
2024-08-17 18:15 ` Martin Karsten
2024-08-18 12:55 ` Willem de Bruijn
2024-08-18 14:51 ` Martin Karsten [this message]
2024-08-20 2:36 ` Jakub Kicinski
2024-08-20 14:28 ` Martin Karsten
2024-08-17 10:00 ` Joe Damato
2024-08-14 0:10 ` Jakub Kicinski
2024-08-14 1:14 ` Martin Karsten
2024-08-20 2:07 ` Jakub Kicinski
2024-08-20 14:27 ` Martin Karsten
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4dc65899-e599-43e3-8f95-585d3489b424@uwaterloo.ca \
--to=mkarsten@uwaterloo.ca \
--cc=aleksander.lobakin@intel.com \
--cc=amritha.nambiar@intel.com \
--cc=bigeasy@linutronix.de \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jack@suse.cz \
--cc=jdamato@fastly.com \
--cc=jiri@resnulli.us \
--cc=johannes.berg@intel.com \
--cc=kuba@kernel.org \
--cc=leitao@debian.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lorenzo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=skhawaja@google.com \
--cc=sridhar.samudrala@intel.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).