From: Jakub Kicinski <kuba@kernel.org>
To: Martin Karsten <mkarsten@uwaterloo.ca>
Cc: Stanislav Fomichev <sdf@fomichev.me>,
netdev@vger.kernel.org, Joe Damato <jdamato@fastly.com>,
amritha.nambiar@intel.com, sridhar.samudrala@intel.com,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Breno Leitao <leitao@debian.org>,
Christian Brauner <brauner@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>, Jan Kara <jack@suse.cz>,
Jiri Pirko <jiri@resnulli.us>,
Johannes Berg <johannes.berg@intel.com>,
Jonathan Corbet <corbet@lwn.net>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
"open list:FILESYSTEMS (VFS and infrastructure)"
<linux-fsdevel@vger.kernel.org>,
open list <linux-kernel@vger.kernel.org>,
Lorenzo Bianconi <lorenzo@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [RFC net-next 0/5] Suspend IRQs during preferred busy poll
Date: Tue, 13 Aug 2024 17:10:15 -0700 [thread overview]
Message-ID: <20240813171015.425f239e@kernel.org> (raw)
In-Reply-To: <2bb121dd-3dcd-4142-ab87-02ccf4afd469@uwaterloo.ca>
On Mon, 12 Aug 2024 17:46:42 -0400 Martin Karsten wrote:
> >> Here's how it is intended to work:
> >> - An administrator sets the existing sysfs parameters for
> >> defer_hard_irqs and gro_flush_timeout to enable IRQ deferral.
> >>
> >> - An administrator sets the new sysfs parameter irq_suspend_timeout
> >> to a larger value than gro-timeout to enable IRQ suspension.
> >
> > Can you expand more on what's the problem with the existing gro_flush_timeout?
> > Is it defer_hard_irqs_count? Or you want a separate timeout only for the
> > perfer_busy_poll case(why?)? Because looking at the first two patches,
> > you essentially replace all usages of gro_flush_timeout with a new variable
> > and I don't see how it helps.
>
> gro-flush-timeout (in combination with defer-hard-irqs) is the default
> irq deferral mechanism and as such, always active when configured. Its
> static periodic softirq processing leads to a situation where:
>
> - A long gro-flush-timeout causes high latencies when load is
> sufficiently below capacity, or
>
> - a short gro-flush-timeout causes overhead when softirq execution
> asynchronously competes with application processing at high load.
>
> The shortcomings of this are documented (to some extent) by our
> experiments. See defer20 working well at low load, but having problems
> at high load, while defer200 having higher latency at low load.
>
> irq-suspend-timeout is only active when an application uses
> prefer-busy-polling and in that case, produces a nice alternating
> pattern of application processing and networking processing (similar to
> what we describe in the paper). This then works well with both low and
> high load.
What about NIC interrupt coalescing. defer_hard_irqs_count was supposed
to be used with NICs which either don't have IRQ coalescing or have a
broken implementation. The timeout of 200usec should be perfectly within
range of what NICs can support.
If the NIC IRQ coalescing works, instead of adding a new timeout value
we could add a new deferral control (replacing defer_hard_irqs_count)
which would always kick in after seeing prefer_busy_poll() but also
not kick in if the busy poll harvested 0 packets.
> > Maybe expand more on what code paths are we trying to improve? Existing
> > busy polling code is not super readable, so would be nice to simplify
> > it a bit in the process (if possible) instead of adding one more tunable.
>
> There are essentially three possible loops for network processing:
>
> 1) hardirq -> softirq -> napi poll; this is the baseline functionality
>
> 2) timer -> softirq -> napi poll; this is deferred irq processing scheme
> with the shortcomings described above
>
> 3) epoll -> busy-poll -> napi poll
>
> If a system is configured for 1), not much can be done, as it is
> difficult to interject anything into this loop without adding state and
> side effects. This is what we tried for the paper, but it ended up being
> a hack.
>
> If however the system is configured for irq deferral, Loops 2) and 3)
> "wrestle" with each other for control. Injecting the larger
> irq-suspend-timeout for 'timer' in Loop 2) essentially tilts this in
> favour of Loop 3) and creates the nice pattern describe above.
next prev parent reply other threads:[~2024-08-14 0:10 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-12 12:57 [RFC net-next 0/5] Suspend IRQs during preferred busy poll Joe Damato
2024-08-12 12:57 ` [RFC net-next 4/5] eventpoll: Trigger napi_busy_loop, if prefer_busy_poll is set Joe Damato
2024-08-12 13:19 ` Christoph Hellwig
2024-08-12 16:17 ` Matthew Wilcox
2024-08-12 17:49 ` Joe Damato
2024-08-12 17:46 ` Joe Damato
2024-08-12 12:57 ` [RFC net-next 5/5] eventpoll: Control irq suspension for prefer_busy_poll Joe Damato
2024-08-12 20:20 ` Stanislav Fomichev
2024-08-12 21:47 ` Martin Karsten
2024-08-12 20:19 ` [RFC net-next 0/5] Suspend IRQs during preferred busy poll Stanislav Fomichev
2024-08-12 21:46 ` Martin Karsten
2024-08-12 23:03 ` Stanislav Fomichev
2024-08-13 0:04 ` Martin Karsten
2024-08-13 1:54 ` Stanislav Fomichev
2024-08-13 2:35 ` Martin Karsten
2024-08-13 4:07 ` Stanislav Fomichev
2024-08-13 13:18 ` Martin Karsten
2024-08-14 3:16 ` Willem de Bruijn
2024-08-14 14:19 ` Joe Damato
2024-08-14 15:08 ` Willem de Bruijn
2024-08-14 15:46 ` Joe Damato
2024-08-14 19:53 ` Samiullah Khawaja
2024-08-14 20:42 ` Martin Karsten
2024-08-16 14:27 ` Willem de Bruijn
2024-08-16 14:59 ` Willem de Bruijn
2024-08-16 15:25 ` Joe Damato
2024-08-16 17:01 ` Willem de Bruijn
2024-08-16 20:03 ` Martin Karsten
2024-08-16 20:58 ` Willem de Bruijn
2024-08-17 18:15 ` Martin Karsten
2024-08-18 12:55 ` Willem de Bruijn
2024-08-18 14:51 ` Martin Karsten
2024-08-20 2:36 ` Jakub Kicinski
2024-08-20 14:28 ` Martin Karsten
2024-08-17 10:00 ` Joe Damato
2024-08-14 0:10 ` Jakub Kicinski [this message]
2024-08-14 1:14 ` Martin Karsten
2024-08-20 2:07 ` Jakub Kicinski
2024-08-20 14:27 ` Martin Karsten
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240813171015.425f239e@kernel.org \
--to=kuba@kernel.org \
--cc=aleksander.lobakin@intel.com \
--cc=amritha.nambiar@intel.com \
--cc=bigeasy@linutronix.de \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jack@suse.cz \
--cc=jdamato@fastly.com \
--cc=jiri@resnulli.us \
--cc=johannes.berg@intel.com \
--cc=leitao@debian.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lorenzo@kernel.org \
--cc=mkarsten@uwaterloo.ca \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=sridhar.samudrala@intel.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).