From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Paolo Abeni <pabeni@redhat.com>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Wander Lairson Costa <wander@redhat.com>
Subject: Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI.
Date: Wed, 20 Sep 2023 17:57:54 +0200 [thread overview]
Message-ID: <20230920155754.KzYGXMWh@linutronix.de> (raw)
In-Reply-To: <0a842574fd0acc113ef925c48d2ad9e67aa0e101.camel@redhat.com>
On 2023-08-23 15:35:41 [+0200], Paolo Abeni wrote:
> On Mon, 2023-08-14 at 11:35 +0200, Sebastian Andrzej Siewior wrote:
> > @@ -4781,7 +4733,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
> > * We can use non atomic operation since we own the queue lock
> > */
> > if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state))
> > - napi_schedule_rps(sd);
> > + __napi_schedule_irqoff(&sd->backlog);
> > goto enqueue;
> > }
> > reason = SKB_DROP_REASON_CPU_BACKLOG;
>
> I *think* that the above could be quite dangerous when cpu ==
> smp_processor_id() - that is, with plain veth usage.
>
> Currently, each packet runs into the rx path just after
> enqueue_to_backlog()/tx completes.
>
> With this patch there will be a burst effect, where the backlog thread
> will run after a few (several) packets will be enqueued, when the
> process scheduler will decide - note that the current CPU is already
> hosting a running process, the tx thread.
>
> The above can cause packet drops (due to limited buffering) or very
> high latency (due to long burst), even in non overload situation, quite
> hard to debug.
>
> I think the above needs to be an opt-in, but I guess that even RT
> deployments doing some packet forwarding will not be happy with this
> on.
I've been looking at this again and have been thinking what you said
here. I think part of the problem is that we lack a policy/ mechanism
when a DoS is happening and what to do.
Before commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its
job"") when a lot of network packets are processed then processing is
moved to ksoftirqd and continues based on how the scheduler schedules
the SCHED_OTHER ksoftirqd task. This avoids lock-ups of the system and
it can do something else in between. Any interrupt will not continue the
outstanding softirq backlog but wait for ksoftirqd. So it basically
avoids the networking overload. It throttles the throughput if needed.
This isn't the case after that commit. Now, the CPU can be stuck with
processing networking packets if the packets come in fast enough. Even
if ksoftirqd is woken up, the next interrupt (say the timer) will
continue with at least one round.
By using NAPI-threads it is possible to give the control back to the
scheduler which can throttle the NAPI processing in favour of other
threads that ask for CPU. As you pointed out, waking the thread does not
guarantee that it will immediately do the NAPI work. It can be delayed
based on current load on the system.
This could be influenced by assigning the NAPI-thread a SCHED_FIFO
priority. Based on the priority it could be ensured that the thread
starts right away or "later" if something else is more important.
However, this opens the DoS window again: The scheduler will put the
NAPI thread on CPU as long as it asks for it with no throttling.
If we could somehow define a DoS condition once we are overwhelmed with
packets, then we could act on it and throttle it. This in turn would
allow a SCHED_FIFO priority without the fear of a lockup if the system
is flooded with packets.
> Cheers,
>
> Paolo
Sebastian
next prev parent reply other threads:[~2023-09-20 15:57 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-14 9:35 [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI Sebastian Andrzej Siewior
2023-08-14 9:35 ` [RFC PATCH net-next 1/2] " Sebastian Andrzej Siewior
2023-08-21 8:32 ` kernel test robot
2023-08-23 13:35 ` Paolo Abeni
2023-09-20 15:57 ` Sebastian Andrzej Siewior [this message]
2023-09-21 10:41 ` Ferenc Fejes
2023-09-22 7:26 ` Sebastian Andrzej Siewior
2023-09-22 9:38 ` Paolo Abeni
2023-08-14 9:35 ` [RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush() Sebastian Andrzej Siewior
2023-08-15 12:08 ` Jesper Dangaard Brouer
2023-08-15 22:31 ` Yan Zhai
2023-08-16 14:48 ` Jesper Dangaard Brouer
2023-08-16 15:15 ` Yan Zhai
2023-08-16 21:02 ` Jesper Dangaard Brouer
2023-08-18 15:49 ` Yan Zhai
2023-08-16 15:22 ` Sebastian Andrzej Siewior
2023-08-14 18:24 ` [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI Jakub Kicinski
2023-08-17 13:16 ` Sebastian Andrzej Siewior
2023-08-17 15:30 ` Jakub Kicinski
2023-08-18 9:03 ` Sebastian Andrzej Siewior
2023-08-18 14:43 ` Yan Zhai
2023-08-18 14:57 ` Sebastian Andrzej Siewior
2023-08-18 16:21 ` Jakub Kicinski
2023-08-18 16:40 ` Eric Dumazet
2023-08-23 6:57 ` Sebastian Andrzej Siewior
2023-08-18 16:56 ` Yan Zhai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230920155754.KzYGXMWh@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=wander@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).