Re: [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tariq Toukan <ttoukan.linux@gmail.com>
To: Valentin Schneider <vschneid@redhat.com>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Tariq Toukan <tariqt@nvidia.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Yury Norov <yury.norov@gmail.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mel Gorman <mgorman@suse.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Tony Luck <tony.luck@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Gal Pressman <gal@nvidia.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>
Subject: Re: [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints
Date: Mon, 24 Oct 2022 14:24:58 +0300	[thread overview]
Message-ID: <f250fc62-a4a6-6543-d688-e755729a7291@gmail.com> (raw)
In-Reply-To: <20221021121927.2893692-4-vschneid@redhat.com>



On 10/21/2022 3:19 PM, Valentin Schneider wrote:
> From: Tariq Toukan <tariqt@nvidia.com>
> 
> In the IRQ affinity hints, replace the binary NUMA preference (local /
> remote) with the improved for_each_numa_hop_cpu() API that minds the
> actual distances, so that remote NUMAs with short distance are preferred
> over farther ones.
> 
> This has significant performance implications when using NUMA-aware
> allocated memory (follow [1] and derivatives for example).
> 
> [1]
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel()
>     int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
> 
> Performance tests:
> 
> TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
> Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121
> 
> +-------------------------+-----------+------------------+------------------+
> |                         | BW (Gbps) | TX side CPU util | RX side CPU util |
> +-------------------------+-----------+------------------+------------------+
> | Baseline                | 52.3      | 6.4 %            | 17.9 %           |
> +-------------------------+-----------+------------------+------------------+
> | Applied on TX side only | 52.6      | 5.2 %            | 18.5 %           |
> +-------------------------+-----------+------------------+------------------+
> | Applied on RX side only | 94.9      | 11.9 %           | 27.2 %           |
> +-------------------------+-----------+------------------+------------------+
> | Applied on both sides   | 95.1      | 8.4 %            | 27.3 %           |
> +-------------------------+-----------+------------------+------------------+
> 
> Bottleneck in RX side is released, reached linerate (~1.8x speedup).
> ~30% less cpu util on TX.
> 
> * CPU util on active cores only.
> 
> Setups details (similar for both sides):
> 
> NIC: ConnectX6-DX dual port, 100 Gbps each.
> Single port used in the tests.
> 
> $ lscpu
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              256
> On-line CPU(s) list: 0-255
> Thread(s) per core:  2
> Core(s) per socket:  64
> Socket(s):           2
> NUMA node(s):        16
> Vendor ID:           AuthenticAMD
> CPU family:          25
> Model:               1
> Model name:          AMD EPYC 7763 64-Core Processor
> Stepping:            1
> CPU MHz:             2594.804
> BogoMIPS:            4890.73
> Virtualization:      AMD-V
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            512K
> L3 cache:            32768K
> NUMA node0 CPU(s):   0-7,128-135
> NUMA node1 CPU(s):   8-15,136-143
> NUMA node2 CPU(s):   16-23,144-151
> NUMA node3 CPU(s):   24-31,152-159
> NUMA node4 CPU(s):   32-39,160-167
> NUMA node5 CPU(s):   40-47,168-175
> NUMA node6 CPU(s):   48-55,176-183
> NUMA node7 CPU(s):   56-63,184-191
> NUMA node8 CPU(s):   64-71,192-199
> NUMA node9 CPU(s):   72-79,200-207
> NUMA node10 CPU(s):  80-87,208-215
> NUMA node11 CPU(s):  88-95,216-223
> NUMA node12 CPU(s):  96-103,224-231
> NUMA node13 CPU(s):  104-111,232-239
> NUMA node14 CPU(s):  112-119,240-247
> NUMA node15 CPU(s):  120-127,248-255
> ..
...
> 
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> [Tweaked API use]

Thanks for your modification.
It looks good to me.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index a0242dc15741c..7acbeb3d51846 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -812,9 +812,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
>   static int comp_irqs_request(struct mlx5_core_dev *dev)
>   {
>   	struct mlx5_eq_table *table = dev->priv.eq_table;
> +	const struct cpumask *prev = cpu_none_mask;
> +	const struct cpumask *mask;
>   	int ncomp_eqs = table->num_comp_eqs;
>   	u16 *cpus;
>   	int ret;
> +	int cpu;
>   	int i;
>   
>   	ncomp_eqs = table->num_comp_eqs;
> @@ -833,8 +836,19 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
>   		ret = -ENOMEM;
>   		goto free_irqs;
>   	}
> -	for (i = 0; i < ncomp_eqs; i++)
> -		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> +
> +	i = 0;
> +	rcu_read_lock();
> +	for_each_numa_hop_mask(mask, dev->priv.numa_node) {
> +		for_each_cpu_andnot(cpu, mask, prev) {
> +			cpus[i] = cpu;
> +			if (++i == ncomp_eqs)
> +				goto spread_done;
> +		}
> +		prev = mask;
> +	}
> +spread_done:
> +	rcu_read_unlock();
>   	ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
>   	kfree(cpus);
>   	if (ret < 0)

next prev parent reply	other threads:[~2022-10-24 11:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-21 12:19 [PATCH v5 0/3] sched, net: NUMA-aware CPU spreading interface Valentin Schneider
2022-10-21 12:19 ` [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask() Valentin Schneider
2022-10-24 22:55   ` Yury Norov
2022-10-21 12:19 ` [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() Valentin Schneider
2022-10-21 13:16   ` Andy Shevchenko
2022-10-21 13:34     ` Andy Shevchenko
2022-10-21 14:06       ` Valentin Schneider
2022-10-21 13:57     ` Valentin Schneider
2022-10-21 12:19 ` [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Valentin Schneider
2022-10-24 11:24   ` Tariq Toukan [this message]
2022-10-24 23:17   ` Yury Norov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f250fc62-a4a6-6543-d688-e755729a7291@gmail.com \
    --to=ttoukan.linux@gmail.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    --cc=tony.luck@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).