Re: Expected performance w/ bonding

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Ian Kumlien <ian.kumlien@gmail.com>
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>,
	andy@greyhouse.net, vfalico@gmail.com
Subject: Re: Expected performance w/ bonding
Date: Wed, 18 Jan 2023 08:43:01 -0800	[thread overview]
Message-ID: <10437.1674060181@famine> (raw)
In-Reply-To: <CAA85sZtrSWcZkFf=0P2iQptre0j2c=OCXRHU8Tiqm_Lpb-ttNQ@mail.gmail.com>

Ian Kumlien <ian.kumlien@gmail.com> wrote:

>Hi,
>
>I was doing some tests with some of the bigger AMD machines, both
>using PCIE-4 Mellanox connectx-5 nics
>
>They have 2x100gbit links to the same switches (running in VLT - ie as
>"one"), but iperf3 seems to hit a limit at 27gbit max...
>(with 10 threads in parallel) but generally somewhere at 25gbit - so
>my question is if there is a limit at ~25gbit for bonding
>using 802.3ad and layer2+3 hashing.
>
>It's a little bit difficult to do proper measurements since most
>systems are in production - but i'm kinda running out of clues =)
>
>If anyone has any ideas, it would be very interesting to see if they would help.

	If by "bigger AMD machines" you mean ROME or similar, then what
you may be seeing is the effects of (a) NUMA, and (b) the AMD CCX cache
architecture (in which a small-ish number of CPUs share an L3 cache).
We've seen similar effects on these systems with similar configurations,
particularly on versions with smaller numbers of CPUs per CCX (as I
recall, one is 4 per CCX).

	For testing purposes, you can pin the iperf tasks to CPUs in the
same CCX as one another, and on the same NUMA node as the network
device.  If your bond utilizes interfaces on separate NUMA nodes, there
may be additional randomness in the results, as data may or may not
cross a NUMA boundary depending on the flow hash.  For testing, this can
be worked around by disabling one interface in the bond (i.e., a bond
with just one active interface), and insuring the iperf tasks are pinned
to the correct NUMA node.

	There is a mechanism in bonding to do flow -> queue -> interface
assignments (described in Documentation/networking/bonding.rst), but
it's nontrivial, and still needs the processes to be resident on the
same NUMA node (and on the AMD systems, also within the same CCX
domain).

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

     prev parent reply	other threads:[~2023-01-18 16:44 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-18 14:21 Expected performance w/ bonding Ian Kumlien
2023-01-18 16:43 ` Jay Vosburgh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10437.1674060181@famine \
    --to=jay.vosburgh@canonical.com \
    --cc=andy@greyhouse.net \
    --cc=ian.kumlien@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).