From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Martin Karsten <mkarsten@uwaterloo.ca>,
Samiullah Khawaja <skhawaja@google.com>
Cc: Stanislav Fomichev <sdf@fomichev.me>,
netdev@vger.kernel.org, Joe Damato <jdamato@fastly.com>,
amritha.nambiar@intel.com, sridhar.samudrala@intel.com,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Breno Leitao <leitao@debian.org>,
Christian Brauner <brauner@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Jan Kara <jack@suse.cz>,
Jiri Pirko <jiri@resnulli.us>,
Johannes Berg <johannes.berg@intel.com>,
Jonathan Corbet <corbet@lwn.net>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
"open list:FILESYSTEMS (VFS and infrastructure)"
<linux-fsdevel@vger.kernel.org>,
open list <linux-kernel@vger.kernel.org>,
Lorenzo Bianconi <lorenzo@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [RFC net-next 0/5] Suspend IRQs during preferred busy poll
Date: Fri, 16 Aug 2024 10:27:32 -0400 [thread overview]
Message-ID: <66bf61d4ed578_17ec4b294ba@willemb.c.googlers.com.notmuch> (raw)
In-Reply-To: <d63dd3e8-c9e2-45d6-b240-0b91c827cc2f@uwaterloo.ca>
Martin Karsten wrote:
> On 2024-08-14 15:53, Samiullah Khawaja wrote:
> > On Tue, Aug 13, 2024 at 6:19 AM Martin Karsten <mkarsten@uwaterloo.ca> wrote:
> >>
> >> On 2024-08-13 00:07, Stanislav Fomichev wrote:
> >>> On 08/12, Martin Karsten wrote:
> >>>> On 2024-08-12 21:54, Stanislav Fomichev wrote:
> >>>>> On 08/12, Martin Karsten wrote:
> >>>>>> On 2024-08-12 19:03, Stanislav Fomichev wrote:
> >>>>>>> On 08/12, Martin Karsten wrote:
> >>>>>>>> On 2024-08-12 16:19, Stanislav Fomichev wrote:
> >>>>>>>>> On 08/12, Joe Damato wrote:
> >>>>>>>>>> Greetings:
>
> [snip]
>
> >>>>>> Note that napi_suspend_irqs/napi_resume_irqs is needed even for the sake of
> >>>>>> an individual queue or application to make sure that IRQ suspension is
> >>>>>> enabled/disabled right away when the state of the system changes from busy
> >>>>>> to idle and back.
> >>>>>
> >>>>> Can we not handle everything in napi_busy_loop? If we can mark some napi
> >>>>> contexts as "explicitly polled by userspace with a larger defer timeout",
> >>>>> we should be able to do better compared to current NAPI_F_PREFER_BUSY_POLL
> >>>>> which is more like "this particular napi_poll call is user busy polling".
> >>>>
> >>>> Then either the application needs to be polling all the time (wasting cpu
> >>>> cycles) or latencies will be determined by the timeout.
> > But if I understand correctly, this means that if the application
> > thread that is supposed
> > to do napi busy polling gets busy doing work on the new data/events in
> > userspace, napi polling
> > will not be done until the suspend_timeout triggers? Do you dispatch
> > work to a separate worker
> > threads, in userspace, from the thread that is doing epoll_wait?
>
> Yes, napi polling is suspended while the application is busy between
> epoll_wait calls. That's where the benefits are coming from.
>
> The consequences depend on the nature of the application and overall
> preferences for the system. If there's a "dominant" application for a
> number of queues and cores, the resulting latency for other background
> applications using the same queues might not be a problem at all.
>
> One other simple mitigation is limiting the number of events that each
> epoll_wait call accepts. Note that this batch size also determines the
> worst-case latency for the application in question, so there is a
> natural incentive to keep it limited.
>
> A more complex application design, like you suggest, might also be an
> option.
>
> >>>> Only when switching back and forth between polling and interrupts is it
> >>>> possible to get low latencies across a large spectrum of offered loads
> >>>> without burning cpu cycles at 100%.
> >>>
> >>> Ah, I see what you're saying, yes, you're right. In this case ignore my comment
> >>> about ep_suspend_napi_irqs/napi_resume_irqs.
> >>
> >> Thanks for probing and double-checking everything! Feedback is important
> >> for us to properly document our proposal.
> >>
> >>> Let's see how other people feel about per-dev irq_suspend_timeout. Properly
> >>> disabling napi during busy polling is super useful, but it would still
> >>> be nice to plumb irq_suspend_timeout via epoll context or have it set on
> >>> a per-napi basis imho.
> > I agree, this would allow each napi queue to tune itself based on
> > heuristics. But I think
> > doing it through epoll independent interface makes more sense as Stan
> > suggested earlier.
>
> The question is whether to add a useful mechanism (one sysfs parameter
> and a few lines of code) that is optional, but with demonstrable and
> significant performance/efficiency improvements for an important class
> of applications - or wait for an uncertain future?
The issue is that this one little change can never be removed, as it
becomes ABI.
Let's get the right API from the start.
Not sure that a global variable, or sysfs as API, is the right one.
> Note that adding our mechanism in no way precludes switching the control
> parameters from per-device to per-napi as Joe alluded to earlier. In
> fact, it increases the incentive for doing so.
>
> After working on this for quite a while, I am skeptical that anything
> fundamentally different could be done without re-architecting the entire
> napi control flow.
>
> Thanks,
> Martin
>
next prev parent reply other threads:[~2024-08-16 14:27 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-12 12:57 [RFC net-next 0/5] Suspend IRQs during preferred busy poll Joe Damato
2024-08-12 12:57 ` [RFC net-next 4/5] eventpoll: Trigger napi_busy_loop, if prefer_busy_poll is set Joe Damato
2024-08-12 13:19 ` Christoph Hellwig
2024-08-12 16:17 ` Matthew Wilcox
2024-08-12 17:49 ` Joe Damato
2024-08-12 17:46 ` Joe Damato
2024-08-12 12:57 ` [RFC net-next 5/5] eventpoll: Control irq suspension for prefer_busy_poll Joe Damato
2024-08-12 20:20 ` Stanislav Fomichev
2024-08-12 21:47 ` Martin Karsten
2024-08-12 20:19 ` [RFC net-next 0/5] Suspend IRQs during preferred busy poll Stanislav Fomichev
2024-08-12 21:46 ` Martin Karsten
2024-08-12 23:03 ` Stanislav Fomichev
2024-08-13 0:04 ` Martin Karsten
2024-08-13 1:54 ` Stanislav Fomichev
2024-08-13 2:35 ` Martin Karsten
2024-08-13 4:07 ` Stanislav Fomichev
2024-08-13 13:18 ` Martin Karsten
2024-08-14 3:16 ` Willem de Bruijn
2024-08-14 14:19 ` Joe Damato
2024-08-14 15:08 ` Willem de Bruijn
2024-08-14 15:46 ` Joe Damato
2024-08-14 19:53 ` Samiullah Khawaja
2024-08-14 20:42 ` Martin Karsten
2024-08-16 14:27 ` Willem de Bruijn [this message]
2024-08-16 14:59 ` Willem de Bruijn
2024-08-16 15:25 ` Joe Damato
2024-08-16 17:01 ` Willem de Bruijn
2024-08-16 20:03 ` Martin Karsten
2024-08-16 20:58 ` Willem de Bruijn
2024-08-17 18:15 ` Martin Karsten
2024-08-18 12:55 ` Willem de Bruijn
2024-08-18 14:51 ` Martin Karsten
2024-08-20 2:36 ` Jakub Kicinski
2024-08-20 14:28 ` Martin Karsten
2024-08-17 10:00 ` Joe Damato
2024-08-14 0:10 ` Jakub Kicinski
2024-08-14 1:14 ` Martin Karsten
2024-08-20 2:07 ` Jakub Kicinski
2024-08-20 14:27 ` Martin Karsten
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=66bf61d4ed578_17ec4b294ba@willemb.c.googlers.com.notmuch \
--to=willemdebruijn.kernel@gmail.com \
--cc=aleksander.lobakin@intel.com \
--cc=amritha.nambiar@intel.com \
--cc=bigeasy@linutronix.de \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jack@suse.cz \
--cc=jdamato@fastly.com \
--cc=jiri@resnulli.us \
--cc=johannes.berg@intel.com \
--cc=kuba@kernel.org \
--cc=leitao@debian.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lorenzo@kernel.org \
--cc=mkarsten@uwaterloo.ca \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=skhawaja@google.com \
--cc=sridhar.samudrala@intel.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).