From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Matthias Tafelmeier <matthias.tafelmeier@gmx.net>
Cc: "Dave Taht" <dave@taht.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"Joel Wirāmu Pauling" <joel@aenertia.net>,
"David Ahern" <dsa@cumulusnetworks.com>,
"Tariq Toukan" <tariqt@mellanox.com>,
brouer@redhat.com, "Björn Töpel" <bjorn.topel@intel.com>
Subject: Re: [Bloat] Linux network is damn fast, need more use XDP (Was: DC behaviors today)
Date: Thu, 7 Dec 2017 09:33:43 +0100 [thread overview]
Message-ID: <20171207093343.071083ff@redhat.com> (raw)
In-Reply-To: <77f6a9fe-6f95-f149-4cec-170d864f1c06@gmx.net>
[-- Attachment #1: Type: text/plain, Size: 3650 bytes --]
(Removed bloat-lists to avoid cross ML-posting)
On Mon, 4 Dec 2017 18:19:09 +0100 Matthias Tafelmeier <matthias.tafelmeier@gmx.net> wrote:
> Hello,
> > Scaling up to more CPUs and TCP-stream, Tariq[1] and I have showed the
> > Linux kernel network stack scales to 94Gbit/s (linerate minus overhead).
> > But when the drivers page-recycler fails, we hit bottlenecks in the
> > page-allocator, that cause negative scaling to around 43Gbit/s.
> >
> > [1] http://lkml.kernel.org/r/cef85936-10b2-5d76-9f97-cb03b418fd94@mellanox.com
> >
> > Linux have for a _long_ time been doing 10Gbit/s TCP-stream easily, on
> > a SINGLE CPU. This is mostly thanks to TSO/GRO aggregating packets,
> > but last couple of years the network stack have been optimized (with
> > UDP workloads), and as a result we can do 10G without TSO/GRO on a
> > single-CPU. This is "only" 812Kpps with MTU size frames.
>
> Cannot find the reference anymore, but there was once some workshop held
> by you during some netdev where you were stating that you're practially
> in rigorous exchange with NIC vendors as to having them tremendously
> increase the RX/TX rings(queues) numbers.
You are mis-quoting me. I have not recommended tremendously increasing
the RX/TX rings(queues) numbers. Actually, we should likely decrease
number of RX-rings, per recommendation of Eric Dumazet[1], to increase
the chance of packet aggregation/bulking during NAPI-loop. And use
something like CPUMAP[2] to re-distribute load on CPUs.
[1] https://www.netdevconf.org/2.1/papers/BusyPollingNextGen.pdf
[2] https://git.kernel.org/torvalds/c/452606d6c9cd
You might have heard/seen me talk about increasing the ring queue size.
that is the frames/pages available per RX-ring queue[3][4]. I generally
don't recommend increasing that too much, as it hurts cache-usage. The
real reason it sometimes helps to increase the RX-ring size on the
Intel based NICs is because they intermix page-recycling into their
RX-ring, which I now added a counter for when it fails[5].
[3] http://netoptimizer.blogspot.dk/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html
[4] http://netoptimizer.blogspot.dk/2014/06/pktgen-for-network-overload-testing.html
[5] https://git.kernel.org/torvalds/c/86e23494222f3
> Further, that there are hardly
> any limits to the number other than FPGA magic/physical HW - up to
> millions is viable was coined back then. May I ask were this ended up?
> Wouldn't that be key for massive parallelization either - With having a
> queue(producer), a CPU (consumer) - vice versa - per flow at the
> extreme? Did this end up in this SMART-NIC thingummy? The latter is
> rather trageted at XDP, no?
I do have future plans for (wanting drivers to support) dynamically
adding more RX-TX-queue-pairs. The general idea is to have NIC HW to
filter packets per application into specific NIC queue number, which
can be mapped directly into an application (and I want a queue-pair to
allow the app to TX also).
I actually imagine that we can do the application steering via
XDP_REDIRECT. And by having application register user-pages, like
AF_PACKET-V4, we can achieve zero-copy into userspace from XDP. A
subtle trick here is that zero-copy only occurs if the RX-queue number
match (XDP operating at driver ring level could know), meaning that NIC
HW filter setup could happen async (but premapping userspace pages
still have to happen upfront, before starting app/socket).
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]
next prev parent reply other threads:[~2017-12-07 8:34 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAA93jw43M=dhPOFhMJo7f-qOq=k=kKS6ppq4o9=hsTEKoBdUpA@mail.gmail.com>
[not found] ` <92906bd8-7bad-945d-83c8-a2f9598aac2c@lackof.org>
[not found] ` <CAA93jw5pRMcZmZQmRwSi_1dETEjTHhmg2iJ3A-ijuOMi+mg4+Q@mail.gmail.com>
[not found] ` <CAKiAkGT54RPLQ4f1tzCj9wcW=mnK7+=uJfaotw9G+H_JEy_hqQ@mail.gmail.com>
[not found] ` <87bmjff7l6.fsf_-_@nemesis.taht.net>
[not found] ` <87bmjff7l6.fsf_-_-DEcvNJsl3XAMlNlIB+YWUg@public.gmane.org>
2017-12-04 10:56 ` Linux network is damn fast, need more use XDP (Was: DC behaviors today) Jesper Dangaard Brouer
2017-12-04 17:00 ` [Bloat] " Dave Taht
[not found] ` <CAA93jw4yOz2KoJGz4t9KqFrr=Zx+=N_r-c_W9iQCpGCBCgDVgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-04 20:49 ` Joel Wirāmu Pauling
[not found] ` <20171204110923.3a213986-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-12-04 17:19 ` Matthias Tafelmeier
2017-12-07 8:33 ` Jesper Dangaard Brouer [this message]
2017-12-07 18:50 ` [Bloat] " Matthias Tafelmeier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171207093343.071083ff@redhat.com \
--to=brouer@redhat.com \
--cc=bjorn.topel@intel.com \
--cc=dave@taht.net \
--cc=dsa@cumulusnetworks.com \
--cc=joel@aenertia.net \
--cc=matthias.tafelmeier@gmx.net \
--cc=netdev@vger.kernel.org \
--cc=tariqt@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).