All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tariq Toukan <ttoukan.linux@gmail.com>
To: Valentin Schneider <vschneid@redhat.com>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Tariq Toukan <tariqt@nvidia.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Yury Norov <yury.norov@gmail.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mel Gorman <mgorman@suse.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Tony Luck <tony.luck@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Gal Pressman <gal@nvidia.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>
Subject: Re: [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints
Date: Mon, 24 Oct 2022 14:24:58 +0300	[thread overview]
Message-ID: <f250fc62-a4a6-6543-d688-e755729a7291@gmail.com> (raw)
In-Reply-To: <20221021121927.2893692-4-vschneid@redhat.com>



On 10/21/2022 3:19 PM, Valentin Schneider wrote:
> From: Tariq Toukan <tariqt@nvidia.com>
> 
> In the IRQ affinity hints, replace the binary NUMA preference (local /
> remote) with the improved for_each_numa_hop_cpu() API that minds the
> actual distances, so that remote NUMAs with short distance are preferred
> over farther ones.
> 
> This has significant performance implications when using NUMA-aware
> allocated memory (follow [1] and derivatives for example).
> 
> [1]
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel()
>     int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
> 
> Performance tests:
> 
> TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
> Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121
> 
> +-------------------------+-----------+------------------+------------------+
> |                         | BW (Gbps) | TX side CPU util | RX side CPU util |
> +-------------------------+-----------+------------------+------------------+
> | Baseline                | 52.3      | 6.4 %            | 17.9 %           |
> +-------------------------+-----------+------------------+------------------+
> | Applied on TX side only | 52.6      | 5.2 %            | 18.5 %           |
> +-------------------------+-----------+------------------+------------------+
> | Applied on RX side only | 94.9      | 11.9 %           | 27.2 %           |
> +-------------------------+-----------+------------------+------------------+
> | Applied on both sides   | 95.1      | 8.4 %            | 27.3 %           |
> +-------------------------+-----------+------------------+------------------+
> 
> Bottleneck in RX side is released, reached linerate (~1.8x speedup).
> ~30% less cpu util on TX.
> 
> * CPU util on active cores only.
> 
> Setups details (similar for both sides):
> 
> NIC: ConnectX6-DX dual port, 100 Gbps each.
> Single port used in the tests.
> 
> $ lscpu
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              256
> On-line CPU(s) list: 0-255
> Thread(s) per core:  2
> Core(s) per socket:  64
> Socket(s):           2
> NUMA node(s):        16
> Vendor ID:           AuthenticAMD
> CPU family:          25
> Model:               1
> Model name:          AMD EPYC 7763 64-Core Processor
> Stepping:            1
> CPU MHz:             2594.804
> BogoMIPS:            4890.73
> Virtualization:      AMD-V
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            512K
> L3 cache:            32768K
> NUMA node0 CPU(s):   0-7,128-135
> NUMA node1 CPU(s):   8-15,136-143
> NUMA node2 CPU(s):   16-23,144-151
> NUMA node3 CPU(s):   24-31,152-159
> NUMA node4 CPU(s):   32-39,160-167
> NUMA node5 CPU(s):   40-47,168-175
> NUMA node6 CPU(s):   48-55,176-183
> NUMA node7 CPU(s):   56-63,184-191
> NUMA node8 CPU(s):   64-71,192-199
> NUMA node9 CPU(s):   72-79,200-207
> NUMA node10 CPU(s):  80-87,208-215
> NUMA node11 CPU(s):  88-95,216-223
> NUMA node12 CPU(s):  96-103,224-231
> NUMA node13 CPU(s):  104-111,232-239
> NUMA node14 CPU(s):  112-119,240-247
> NUMA node15 CPU(s):  120-127,248-255
> ..
...
> 
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> [Tweaked API use]

Thanks for your modification.
It looks good to me.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index a0242dc15741c..7acbeb3d51846 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -812,9 +812,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
>   static int comp_irqs_request(struct mlx5_core_dev *dev)
>   {
>   	struct mlx5_eq_table *table = dev->priv.eq_table;
> +	const struct cpumask *prev = cpu_none_mask;
> +	const struct cpumask *mask;
>   	int ncomp_eqs = table->num_comp_eqs;
>   	u16 *cpus;
>   	int ret;
> +	int cpu;
>   	int i;
>   
>   	ncomp_eqs = table->num_comp_eqs;
> @@ -833,8 +836,19 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
>   		ret = -ENOMEM;
>   		goto free_irqs;
>   	}
> -	for (i = 0; i < ncomp_eqs; i++)
> -		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> +
> +	i = 0;
> +	rcu_read_lock();
> +	for_each_numa_hop_mask(mask, dev->priv.numa_node) {
> +		for_each_cpu_andnot(cpu, mask, prev) {
> +			cpus[i] = cpu;
> +			if (++i == ncomp_eqs)
> +				goto spread_done;
> +		}
> +		prev = mask;
> +	}
> +spread_done:
> +	rcu_read_unlock();
>   	ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
>   	kfree(cpus);
>   	if (ret < 0)

  reply	other threads:[~2022-10-24 11:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-21 12:19 [PATCH v5 0/3] sched, net: NUMA-aware CPU spreading interface Valentin Schneider
2022-10-21 12:19 ` [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask() Valentin Schneider
2022-10-24 22:55   ` Yury Norov
2022-10-21 12:19 ` [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() Valentin Schneider
2022-10-21 13:16   ` Andy Shevchenko
2022-10-21 13:34     ` Andy Shevchenko
2022-10-21 14:06       ` Valentin Schneider
2022-10-21 13:57     ` Valentin Schneider
2022-10-21 12:19 ` [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Valentin Schneider
2022-10-24 11:24   ` Tariq Toukan [this message]
2022-10-24 23:17   ` Yury Norov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f250fc62-a4a6-6543-d688-e755729a7291@gmail.com \
    --to=ttoukan.linux@gmail.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    --cc=tony.luck@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.