From: Joe Damato <jdamato@fastly.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, corbet@lwn.net, hdanton@sina.com,
bagasdotme@gmail.com, pabeni@redhat.com, namangulati@google.com,
edumazet@google.com, amritha.nambiar@intel.com,
sridhar.samudrala@intel.com, sdf@fomichev.me, peter@typeblog.net,
m2shafiei@uwaterloo.ca, bjorn@rivosinc.com, hch@infradead.org,
willy@infradead.org, willemdebruijn.kernel@gmail.com,
skhawaja@google.com, Martin Karsten <mkarsten@uwaterloo.ca>,
"David S. Miller" <davem@davemloft.net>,
Simon Horman <horms@kernel.org>, David Ahern <dsahern@kernel.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Lorenzo Bianconi <lorenzo@kernel.org>,
Alexander Lobakin <aleksander.lobakin@intel.com>,
open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH net-next v6 2/7] net: Suspend softirq when prefer_busy_poll is set
Date: Wed, 6 Nov 2024 08:52:00 -0800 [thread overview]
Message-ID: <ZyuesOyJLI3U0C5e@LQ3V64L9R2> (raw)
In-Reply-To: <20241105210338.5364375d@kernel.org>
On Tue, Nov 05, 2024 at 09:03:38PM -0800, Jakub Kicinski wrote:
> On Mon, 4 Nov 2024 21:55:26 +0000 Joe Damato wrote:
> > From: Martin Karsten <mkarsten@uwaterloo.ca>
> >
> > When NAPI_F_PREFER_BUSY_POLL is set during busy_poll_stop and the
> > irq_suspend_timeout is nonzero, this timeout is used to defer softirq
> > scheduling, potentially longer than gro_flush_timeout. This can be used
> > to effectively suspend softirq processing during the time it takes for
> > an application to process data and return to the next busy-poll.
> >
> > The call to napi->poll in busy_poll_stop might lead to an invocation of
>
> The call to napi->poll when we're arming the timer is counter
> productive, right? Maybe we can take this opportunity to add
> the seemingly missing logic to skip over it?
It seems like the call to napi->poll in busy_poll_stop is counter
productive and we're not opposed to making an optimization like that
in the future.
When we tried it, it triggered several bugs/system hangs, so we left
as much of the original code in place as possible.
The existing patch works and streamlining busy_poll_stop to skip the
call to napi->poll is an optimization that can be added as a later
series that focuses solely on when/where/how napi->poll is called.
Our focus was on:
- Not breaking any of the existing mechanisms
- Adding a new mechanism
I think we should avoid pulling the optimization you suggest into
this particular series and save that for the future.
> > napi_complete_done, but the prefer-busy flag is still set at that time,
> > so the same logic is used to defer softirq scheduling for
> > irq_suspend_timeout.
> >
> > Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
> > Co-developed-by: Joe Damato <jdamato@fastly.com>
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > Tested-by: Joe Damato <jdamato@fastly.com>
> > Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
> > Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> > Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > ---
> > v3:
> > - Removed reference to non-existent sysfs parameter from commit
> > message. No functional/code changes.
> >
> > net/core/dev.c | 17 +++++++++++++----
> > 1 file changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 4d910872963f..51d88f758e2e 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -6239,7 +6239,12 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
> > timeout = napi_get_gro_flush_timeout(n);
> > n->defer_hard_irqs_count = napi_get_defer_hard_irqs(n);
> > }
> > - if (n->defer_hard_irqs_count > 0) {
> > + if (napi_prefer_busy_poll(n)) {
> > + timeout = napi_get_irq_suspend_timeout(n);
>
> Why look at the suspend timeout in napi_complete_done()?
> We are unlikely to be exiting busy poll here.
The idea is similar to commit 7fd3253a7de6 ("net: Introduce
preferred busy-polling"); continue to defer IRQs as long as forward
progress is being made. In this case, napi->poll ran, called
napi_complete_done -- the system is moving forward with processing
so prevent IRQs from interrupting us.
epoll_wait will re-enable IRQs (by calling napi_schedule) if
there are no events ready for processing.
> Is it because we need more time than gro_flush_timeout
> for the application to take over the polling?
That's right; we want the application to retain control of packet
processing. That's why we connected this to the "prefer_busy_poll"
flag.
> > + if (timeout)
> > + ret = false;
> > + }
> > + if (ret && n->defer_hard_irqs_count > 0) {
> > n->defer_hard_irqs_count--;
> > timeout = napi_get_gro_flush_timeout(n);
> > if (timeout)
> > @@ -6375,9 +6380,13 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
> > bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
> >
> > if (flags & NAPI_F_PREFER_BUSY_POLL) {
> > - napi->defer_hard_irqs_count = napi_get_defer_hard_irqs(napi);
> > - timeout = napi_get_gro_flush_timeout(napi);
> > - if (napi->defer_hard_irqs_count && timeout) {
> > + timeout = napi_get_irq_suspend_timeout(napi);
>
> Even here I'm not sure if we need to trigger suspend.
> I don't know the eventpoll code well but it seems like you suspend
> and resume based on events when exiting epoll. Why also here?
There's two questions wrapped up here and an overall point to make:
1. Suspend and resume based on events when exiting epoll - that's
right and as you'll see in those patches that happens by:
- arming the suspend timer (via a call to napi_suspend_irqs)
when a positive number of events are retrieved
- calling napi_schedule (via napi_resume_irqs) when there are
no events or the epoll context is being freed.
2. Why defer the suspend timer here in busy_poll_stop? Note that the
original code would set the timer to gro_flush_timeout, which
would introduce the trade offs we mention in the cover letter
(latency for large values, IRQ interruption for small values).
We don't want the gro_flush_timeout to take over yet because we
want to avoid these tradeoffs up until the point where epoll_wait
finds no events for processing.
Does that make sense? If we skipped the IRQ suspend deferral
here, we'd be giving packet processing control back to
gro_flush_timeout and napi_defer_hard_irqs, but the system might
still have packets that can be processed in the next call to
epoll_wait.
The overall point to make is that: the suspend timer is used to
prevent misbehaving userland applications from taking too long. It's
essentially a backstop and, as long as the app is making forward
progress, allows the app to continue running its busy poll loop
undisturbed (via napi_complete_done preventing the driver from
enabling IRQs).
Does that make sense?
> > + if (!timeout) {
> > + napi->defer_hard_irqs_count = napi_get_defer_hard_irqs(napi);
> > + if (napi->defer_hard_irqs_count)
> > + timeout = napi_get_gro_flush_timeout(napi);
> > + }
> > + if (timeout) {
> > hrtimer_start(&napi->timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED);
> > skip_schedule = true;
> > }
>
next prev parent reply other threads:[~2024-11-06 16:52 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-04 21:55 [PATCH net-next v6 0/7] Suspend IRQs during application busy periods Joe Damato
2024-11-04 21:55 ` [PATCH net-next v6 1/7] net: Add napi_struct parameter irq_suspend_timeout Joe Damato
2024-11-04 21:55 ` [PATCH net-next v6 2/7] net: Suspend softirq when prefer_busy_poll is set Joe Damato
2024-11-06 5:03 ` Jakub Kicinski
2024-11-06 16:52 ` Joe Damato [this message]
2024-11-06 23:31 ` Jakub Kicinski
2024-11-07 3:24 ` Joe Damato
2024-11-07 21:01 ` Joe Damato
2024-11-08 3:52 ` Jakub Kicinski
2024-11-04 21:55 ` [PATCH net-next v6 3/7] net: Add control functions for irq suspension Joe Damato
2024-11-04 21:55 ` [PATCH net-next v6 4/7] eventpoll: Trigger napi_busy_loop, if prefer_busy_poll is set Joe Damato
2024-11-04 21:55 ` [PATCH net-next v6 5/7] eventpoll: Control irq suspension for prefer_busy_poll Joe Damato
2024-11-04 21:55 ` [PATCH net-next v6 6/7] selftests: net: Add busy_poll_test Joe Damato
2024-11-05 3:50 ` Stanislav Fomichev
2024-11-05 17:49 ` Joe Damato
2024-11-04 21:55 ` [PATCH net-next v6 7/7] docs: networking: Describe irq suspension Joe Damato
2024-11-05 1:12 ` Bagas Sanjaya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZyuesOyJLI3U0C5e@LQ3V64L9R2 \
--to=jdamato@fastly.com \
--cc=aleksander.lobakin@intel.com \
--cc=amritha.nambiar@intel.com \
--cc=bagasdotme@gmail.com \
--cc=bigeasy@linutronix.de \
--cc=bjorn@rivosinc.com \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=hch@infradead.org \
--cc=hdanton@sina.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lorenzo@kernel.org \
--cc=m2shafiei@uwaterloo.ca \
--cc=mkarsten@uwaterloo.ca \
--cc=namangulati@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=peter@typeblog.net \
--cc=sdf@fomichev.me \
--cc=skhawaja@google.com \
--cc=sridhar.samudrala@intel.com \
--cc=willemdebruijn.kernel@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox