From: Tariq Toukan <ttoukan.linux@gmail.com>
To: Valentin Schneider <vschneid@redhat.com>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
Cc: Tariq Toukan <tariqt@nvidia.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Yury Norov <yury.norov@gmail.com>,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mel Gorman <mgorman@suse.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Heiko Carstens <hca@linux.ibm.com>,
Tony Luck <tony.luck@intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Gal Pressman <gal@nvidia.com>,
Jesse Brandeburg <jesse.brandeburg@intel.com>
Subject: Re: [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints
Date: Mon, 24 Oct 2022 14:24:58 +0300 [thread overview]
Message-ID: <f250fc62-a4a6-6543-d688-e755729a7291@gmail.com> (raw)
In-Reply-To: <20221021121927.2893692-4-vschneid@redhat.com>
On 10/21/2022 3:19 PM, Valentin Schneider wrote:
> From: Tariq Toukan <tariqt@nvidia.com>
>
> In the IRQ affinity hints, replace the binary NUMA preference (local /
> remote) with the improved for_each_numa_hop_cpu() API that minds the
> actual distances, so that remote NUMAs with short distance are preferred
> over farther ones.
>
> This has significant performance implications when using NUMA-aware
> allocated memory (follow [1] and derivatives for example).
>
> [1]
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel()
> int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
>
> Performance tests:
>
> TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
> Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121
>
> +-------------------------+-----------+------------------+------------------+
> | | BW (Gbps) | TX side CPU util | RX side CPU util |
> +-------------------------+-----------+------------------+------------------+
> | Baseline | 52.3 | 6.4 % | 17.9 % |
> +-------------------------+-----------+------------------+------------------+
> | Applied on TX side only | 52.6 | 5.2 % | 18.5 % |
> +-------------------------+-----------+------------------+------------------+
> | Applied on RX side only | 94.9 | 11.9 % | 27.2 % |
> +-------------------------+-----------+------------------+------------------+
> | Applied on both sides | 95.1 | 8.4 % | 27.3 % |
> +-------------------------+-----------+------------------+------------------+
>
> Bottleneck in RX side is released, reached linerate (~1.8x speedup).
> ~30% less cpu util on TX.
>
> * CPU util on active cores only.
>
> Setups details (similar for both sides):
>
> NIC: ConnectX6-DX dual port, 100 Gbps each.
> Single port used in the tests.
>
> $ lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 256
> On-line CPU(s) list: 0-255
> Thread(s) per core: 2
> Core(s) per socket: 64
> Socket(s): 2
> NUMA node(s): 16
> Vendor ID: AuthenticAMD
> CPU family: 25
> Model: 1
> Model name: AMD EPYC 7763 64-Core Processor
> Stepping: 1
> CPU MHz: 2594.804
> BogoMIPS: 4890.73
> Virtualization: AMD-V
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 512K
> L3 cache: 32768K
> NUMA node0 CPU(s): 0-7,128-135
> NUMA node1 CPU(s): 8-15,136-143
> NUMA node2 CPU(s): 16-23,144-151
> NUMA node3 CPU(s): 24-31,152-159
> NUMA node4 CPU(s): 32-39,160-167
> NUMA node5 CPU(s): 40-47,168-175
> NUMA node6 CPU(s): 48-55,176-183
> NUMA node7 CPU(s): 56-63,184-191
> NUMA node8 CPU(s): 64-71,192-199
> NUMA node9 CPU(s): 72-79,200-207
> NUMA node10 CPU(s): 80-87,208-215
> NUMA node11 CPU(s): 88-95,216-223
> NUMA node12 CPU(s): 96-103,224-231
> NUMA node13 CPU(s): 104-111,232-239
> NUMA node14 CPU(s): 112-119,240-247
> NUMA node15 CPU(s): 120-127,248-255
> ..
...
>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> [Tweaked API use]
Thanks for your modification.
It looks good to me.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index a0242dc15741c..7acbeb3d51846 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -812,9 +812,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
> static int comp_irqs_request(struct mlx5_core_dev *dev)
> {
> struct mlx5_eq_table *table = dev->priv.eq_table;
> + const struct cpumask *prev = cpu_none_mask;
> + const struct cpumask *mask;
> int ncomp_eqs = table->num_comp_eqs;
> u16 *cpus;
> int ret;
> + int cpu;
> int i;
>
> ncomp_eqs = table->num_comp_eqs;
> @@ -833,8 +836,19 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
> ret = -ENOMEM;
> goto free_irqs;
> }
> - for (i = 0; i < ncomp_eqs; i++)
> - cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> +
> + i = 0;
> + rcu_read_lock();
> + for_each_numa_hop_mask(mask, dev->priv.numa_node) {
> + for_each_cpu_andnot(cpu, mask, prev) {
> + cpus[i] = cpu;
> + if (++i == ncomp_eqs)
> + goto spread_done;
> + }
> + prev = mask;
> + }
> +spread_done:
> + rcu_read_unlock();
> ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
> kfree(cpus);
> if (ret < 0)
next prev parent reply other threads:[~2022-10-24 11:25 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-21 12:19 [PATCH v5 0/3] sched, net: NUMA-aware CPU spreading interface Valentin Schneider
2022-10-21 12:19 ` [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask() Valentin Schneider
2022-10-24 22:55 ` Yury Norov
2022-10-21 12:19 ` [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() Valentin Schneider
2022-10-21 13:16 ` Andy Shevchenko
2022-10-21 13:34 ` Andy Shevchenko
2022-10-21 14:06 ` Valentin Schneider
2022-10-21 13:57 ` Valentin Schneider
2022-10-21 12:19 ` [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Valentin Schneider
2022-10-24 11:24 ` Tariq Toukan [this message]
2022-10-24 23:17 ` Yury Norov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f250fc62-a4a6-6543-d688-e755729a7291@gmail.com \
--to=ttoukan.linux@gmail.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=davem@davemloft.net \
--cc=dietmar.eggemann@arm.com \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=gregkh@linuxfoundation.org \
--cc=hca@linux.ibm.com \
--cc=jesse.brandeburg@intel.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
--cc=tony.luck@intel.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).