From: Jakub Kicinski <kuba@kernel.org>
To: Martin Habets <habetsm.xilinx@gmail.com>
Cc: "Íñigo Huguet" <ihuguet@redhat.com>,
ecree.xilinx@gmail.com, davem@davemloft.net, edumazet@google.com,
pabeni@redhat.com, netdev@vger.kernel.org,
linux-net-drivers@amd.com, "Fei Liu" <feliu@redhat.com>
Subject: Re: [PATCH net] sfc: use budget for TX completions
Date: Wed, 14 Jun 2023 10:27:44 -0700 [thread overview]
Message-ID: <20230614102744.71c91f20@kernel.org> (raw)
In-Reply-To: <ZIl0OYvze+iTehWX@gmail.com>
On Wed, 14 Jun 2023 09:03:05 +0100 Martin Habets wrote:
> On Mon, Jun 12, 2023 at 04:42:54PM +0200, Íñigo Huguet wrote:
> > When running workloads heavy unbalanced towards TX (high TX, low RX
> > traffic), sfc driver can retain the CPU during too long times. Although
> > in many cases this is not enough to be visible, it can affect
> > performance and system responsiveness.
> >
> > A way to reproduce it is to use a debug kernel and run some parallel
> > netperf TX tests. In some systems, this will lead to this message being
> > logged:
> > kernel:watchdog: BUG: soft lockup - CPU#12 stuck for 22s!
> >
> > The reason is that sfc driver doesn't account any NAPI budget for the TX
> > completion events work. With high-TX/low-RX traffic, this makes that the
> > CPU is held for long time for NAPI poll.
> >
> > Documentations says "drivers can process completions for any number of Tx
> > packets but should only process up to budget number of Rx packets".
> > However, many drivers do limit the amount of TX completions that they
> > process in a single NAPI poll.
>
> I think your work and what other drivers do shows that the documentation is
> no longer correct. I haven't checked when that was written, but maybe it
> was years ago when link speeds were lower.
> Clearly for drivers that support higher link speeds this is an issue, so we
> should update the documentation. Not sure what constitutes a high link speed,
> with current CPUs for me it's anything >= 50G.
The documentation is pretty recent. I haven't seen this lockup once
in production or testing. Do multiple queues complete on the same CPU
for SFC or something weird like that?
next prev parent reply other threads:[~2023-06-14 17:27 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-12 14:42 [PATCH net] sfc: use budget for TX completions Íñigo Huguet
2023-06-12 16:04 ` Maciej Fijalkowski
2023-06-13 14:42 ` Íñigo Huguet
2023-06-14 8:10 ` Martin Habets
2023-06-14 10:13 ` Íñigo Huguet
2023-06-14 17:31 ` Jakub Kicinski
2023-06-15 8:08 ` Martin Habets
2023-06-14 8:03 ` Martin Habets
2023-06-14 17:27 ` Jakub Kicinski [this message]
2023-06-15 0:36 ` Edward Cree
2023-06-15 4:04 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230614102744.71c91f20@kernel.org \
--to=kuba@kernel.org \
--cc=davem@davemloft.net \
--cc=ecree.xilinx@gmail.com \
--cc=edumazet@google.com \
--cc=feliu@redhat.com \
--cc=habetsm.xilinx@gmail.com \
--cc=ihuguet@redhat.com \
--cc=linux-net-drivers@amd.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).