[PATCH net-next 0/2] net/mlx4: Mellanox driver update 01-01-2014

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/2] net/mlx4: Mellanox driver update 01-01-2014
@ 2014-01-01 13:05 Amir Vadai
  2014-01-01 13:05 ` [PATCH net-next 1/2] net/mlx4_en: Use affinity hint Amir Vadai
  2014-01-01 13:05 ` [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues" Amir Vadai
  0 siblings, 2 replies; 16+ messages in thread
From: Amir Vadai @ 2014-01-01 13:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai

Happy new year,

This driver update contains some fixes to enhance the OOB user experience of
Mellanox driver.
Yuval added support in set_irq_affinity_hint().
Jenny reverted commit 90b1ebe "mlx4: set maximal number of default RSS queues "
that limited the number of rx rings to the hard coded number 8.

Patchset was applied and tested against commit: 21eb218 "net, sch: fix the typo
in register_qdisc()"

Thanks,
Amir

Eugenia Emantayev (1):
  net/mlx4: Revert "mlx4: set maximal number of default RSS queues"

Yuval Atias (1):
  net/mlx4_en: Use affinity hint

 drivers/infiniband/hw/mlx4/main.c              |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c     |  4 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 60 +++++++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx4/eq.c        | 12 +++++-
 drivers/net/ethernet/mellanox/mlx4/main.c      |  5 +--
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 +
 include/linux/mlx4/device.h                    |  2 +-
 7 files changed, 78 insertions(+), 8 deletions(-)

-- 
1.8.3.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next 1/2] net/mlx4_en: Use affinity hint
  2014-01-01 13:05 [PATCH net-next 0/2] net/mlx4: Mellanox driver update 01-01-2014 Amir Vadai
@ 2014-01-01 13:05 ` Amir Vadai
  2014-01-01 16:34   ` Ben Hutchings
  2014-01-02  3:13   ` [PATCH net-next 1/2] " David Miller
  2014-01-01 13:05 ` [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues" Amir Vadai
  1 sibling, 2 replies; 16+ messages in thread
From: Amir Vadai @ 2014-01-01 13:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Yuval Atias

From: Yuval Atias <yuvala@mellanox.com>

The “affinity hint” mechanism is used by the user space
daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
Irqbalancer can use this hint to balance the irqs between the
cpus indicated by the mask.

We wish the HCA to preferentially map the IRQs it uses to numa cores
close to it.
To accomplish this, we use affinity hint: first we map IRQs to “close”
numa cores.
If these are exhausted, the remaining IRQs are mapped to “far” numa
cores.

Signed-off-by: Yuval Atias <yuvala@mellanox.com>
---
 drivers/infiniband/hw/mlx4/main.c              |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c     |  4 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 60 +++++++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx4/eq.c        | 12 +++++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 +
 include/linux/mlx4/device.h                    |  2 +-
 6 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 1958c5c..3607a3d 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1486,7 +1486,7 @@ static void mlx4_ib_alloc_eqs(struct mlx4_dev *dev, struct mlx4_ib_dev *ibdev)
 				i, j, dev->pdev->bus->name);
 			/* Set IRQ for specific name (per ring) */
 			if (mlx4_assign_eq(dev, name, NULL,
-					   &ibdev->eq_table[eq])) {
+					   &ibdev->eq_table[eq], NULL)) {
 				/* Use legacy (same as mlx4_en driver) */
 				pr_warn("Can't allocate EQ %d; reverting to legacy\n", eq);
 				ibdev->eq_table[eq] =
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 70e9532..52396d1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -123,7 +123,9 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 					cq->ring);
 				/* Set IRQ for specific name (per ring) */
 				if (mlx4_assign_eq(mdev->dev, name, rmap,
-						   &cq->vector)) {
+						   &cq->vector,
+						   &priv->rx_ring[cq_idx]
+							->affinity_mask)){
 					cq->vector = (cq->ring + 1 + priv->port)
 					    % mdev->dev->caps.num_comp_vectors;
 					mlx4_warn(mdev, "Failed Assigning an EQ to "
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index b68dde0..e6ff67c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1532,6 +1532,45 @@ static void mlx4_en_linkstate(struct work_struct *work)
 	mutex_unlock(&mdev->state_lock);
 }
 
+static int mlx4_en_rings_affinity_hint(struct mlx4_en_priv *priv ,
+				      int *affinity_hint_mapping)
+{
+	int cpu;
+	int cores_arr_index = 0;
+	cpumask_t *numa_cores_mask;
+	cpumask_t *non_numa_cores_mask;
+
+	if (!zalloc_cpumask_var(&non_numa_cores_mask, GFP_KERNEL)) {
+		en_err(priv, "Failed allocating cores mask\n");
+		return -EINVAL;
+	}
+	numa_cores_mask = (cpumask_t *)cpumask_of_node(
+					priv->mdev->dev->numa_node);
+	if (!cpumask_and(numa_cores_mask, cpu_online_mask, numa_cores_mask))
+		en_warn(priv, "Failed to find online cores for local numa\n");
+	cpumask_xor(non_numa_cores_mask, cpu_online_mask, numa_cores_mask);
+
+	for_each_cpu(cpu, numa_cores_mask) {
+		affinity_hint_mapping[cores_arr_index] = cpu;
+		cores_arr_index++;
+	}
+	for_each_cpu(cpu, non_numa_cores_mask) {
+		affinity_hint_mapping[cores_arr_index] = cpu;
+		cores_arr_index++;
+	}
+
+	free_cpumask_var(non_numa_cores_mask);
+
+	return 0;
+}
+
+static int mlx4_en_set_affinity_hint(struct mlx4_en_priv *priv , int ring,
+				     int *affinity_hint_mapping)
+{
+	cpumask_set_cpu(affinity_hint_mapping[ring % num_online_cpus()],
+			&priv->rx_ring[ring]->affinity_mask);
+	return 0;
+}
 
 int mlx4_en_start_port(struct net_device *dev)
 {
@@ -1545,7 +1584,7 @@ int mlx4_en_start_port(struct net_device *dev)
 	int i;
 	int j;
 	u8 mc_list[16] = {0};
-
+	int *affinity_hint_mapping = NULL;
 	if (priv->port_up) {
 		en_dbg(DRV, priv, "start port called while port already up\n");
 		return 0;
@@ -1557,6 +1596,9 @@ int mlx4_en_start_port(struct net_device *dev)
 	memset(&priv->ethtool_rules[0], 0,
 	       sizeof(struct ethtool_flow_id) * MAX_NUM_OF_FS_RULES);
 
+	affinity_hint_mapping = kzalloc(num_online_cpus() * sizeof(int),
+					GFP_KERNEL);
+
 	/* Calculate Rx buf size */
 	dev->mtu = min(dev->mtu, priv->max_mtu);
 	mlx4_en_calc_rx_buf(dev);
@@ -1568,11 +1610,22 @@ int mlx4_en_start_port(struct net_device *dev)
 		en_err(priv, "Failed to activate RX rings\n");
 		return err;
 	}
+
+	err = mlx4_en_rings_affinity_hint(priv, affinity_hint_mapping);
+	if (err) {
+		en_err(priv, "Failed setting affinity hint irq mapping\n");
+	}
+
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		cq = priv->rx_cq[i];
 
 		mlx4_en_cq_init_lock(cq);
 
+		err = mlx4_en_set_affinity_hint(priv , i,
+						affinity_hint_mapping);
+		if (err)
+			en_err(priv, "Failed setting affinity hint");
+
 		err = mlx4_en_activate_cq(priv, cq, i);
 		if (err) {
 			en_err(priv, "Failed activating Rx CQ\n");
@@ -1580,6 +1633,7 @@ int mlx4_en_start_port(struct net_device *dev)
 		}
 		for (j = 0; j < cq->size; j++)
 			cq->buf[j].owner_sr_opcode = MLX4_CQE_OWNER_MASK;
+
 		err = mlx4_en_set_cq_moder(priv, cq);
 		if (err) {
 			en_err(priv, "Failed setting cq moderation parameters");
@@ -1703,6 +1757,8 @@ int mlx4_en_start_port(struct net_device *dev)
 	priv->port_up = true;
 	netif_tx_start_all_queues(dev);
 	netif_device_attach(dev);
+	kfree(affinity_hint_mapping);
+	affinity_hint_mapping = NULL;
 
 	return 0;
 
@@ -1721,6 +1777,8 @@ cq_err:
 		mlx4_en_deactivate_cq(priv, priv->rx_cq[rx_index]);
 	for (i = 0; i < priv->rx_ring_num; i++)
 		mlx4_en_deactivate_rx_ring(priv, priv->rx_ring[i]);
+	kfree(affinity_hint_mapping);
+	affinity_hint_mapping = NULL;
 
 	return err; /* need to close devices */
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
index ae5212f..8c19741 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1312,7 +1312,7 @@ int mlx4_test_interrupts(struct mlx4_dev *dev)
 EXPORT_SYMBOL(mlx4_test_interrupts);
 
 int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
-		   int *vector)
+		   int *vector, cpumask_t *cpu_hint_mask)
 {
 
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -1345,6 +1345,16 @@ int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
 				continue;
 				/*we dont want to break here*/
 			}
+			if (cpu_hint_mask) {
+				err = irq_set_affinity_hint(
+						priv->eq_table.eq[vec].irq,
+						cpu_hint_mask);
+				if (err) {
+					mlx4_warn(dev, "Failed setting affinity hint\n");
+					/*we dont want to break here*/
+				}
+			}
+
 			eq_set_ci(&priv->eq_table.eq[vec], 1);
 		}
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 766691c..1202ed6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -302,6 +302,7 @@ struct mlx4_en_rx_ring {
 	unsigned long csum_ok;
 	unsigned long csum_none;
 	int hwtstamp_rx_filter;
+	cpumask_t affinity_mask;
 };
 
 struct mlx4_en_cq {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index c99ecf6..4aa2122 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1146,7 +1146,7 @@ int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
 int mlx4_SYNC_TPT(struct mlx4_dev *dev);
 int mlx4_test_interrupts(struct mlx4_dev *dev);
 int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
-		   int *vector);
+		   int *vector, cpumask_t *cpu_hint_mask);
 void mlx4_release_eq(struct mlx4_dev *dev, int vec);
 
 int mlx4_get_phys_port_id(struct mlx4_dev *dev);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
  2014-01-01 13:05 [PATCH net-next 0/2] net/mlx4: Mellanox driver update 01-01-2014 Amir Vadai
  2014-01-01 13:05 ` [PATCH net-next 1/2] net/mlx4_en: Use affinity hint Amir Vadai
@ 2014-01-01 13:05 ` Amir Vadai
  2014-01-01 18:46   ` Yuval Mintz
  1 sibling, 1 reply; 16+ messages in thread
From: Amir Vadai @ 2014-01-01 13:05 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Amir Vadai, Eugenia Emantayev, Yuval Mintz,
	Ido Shamay

From: Eugenia Emantayev <eugenia@mellanox.com>

This reverts commit 90b1ebe "mlx4: set maximal number of default RSS
queues".
Limiting the number of receive rings to default (8) reduces overall
packet rate by 15% in multiple TCP streams application:

          Packet rate
8  rings  967691
16 rings  1142070

Results were obtained using netperf:

S=200 ; ( for i in $(seq 1 $S) ; do ( \
  netperf -H 11.7.13.55 -t TCP_RR -l 30 &) ; \
  wait ; done | grep "1        1" | awk '{SUM+=$6} END {print SUM}' )


CC: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 417a595..31ff812 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -41,7 +41,6 @@
 #include <linux/slab.h>
 #include <linux/io-mapping.h>
 #include <linux/delay.h>
-#include <linux/netdevice.h>
 #include <linux/kmod.h>
 
 #include <linux/mlx4/device.h>
@@ -1974,8 +1973,8 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct msix_entry *entries;
 	int nreq = min_t(int, dev->caps.num_ports *
-			 min_t(int, netif_get_num_default_rss_queues() + 1,
-			       MAX_MSIX_P_PORT) + MSIX_LEGACY_SZ, MAX_MSIX);
+			 min_t(int, num_online_cpus() + 1, MAX_MSIX_P_PORT)
+				+ MSIX_LEGACY_SZ, MAX_MSIX);
 	int err;
 	int i;
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 1/2] net/mlx4_en: Use affinity hint
  2014-01-01 13:05 ` [PATCH net-next 1/2] net/mlx4_en: Use affinity hint Amir Vadai
@ 2014-01-01 16:34   ` Ben Hutchings
  2014-01-02 16:33     ` Amir Vadai
  2014-01-02  3:13   ` [PATCH net-next 1/2] " David Miller
  1 sibling, 1 reply; 16+ messages in thread
From: Ben Hutchings @ 2014-01-01 16:34 UTC (permalink / raw)
  To: Amir Vadai; +Cc: David S. Miller, netdev, Or Gerlitz, Yuval Atias

On Wed, 2014-01-01 at 15:05 +0200, Amir Vadai wrote:
> From: Yuval Atias <yuvala@mellanox.com>
> 
> The “affinity hint” mechanism is used by the user space
> daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
> Irqbalancer can use this hint to balance the irqs between the
> cpus indicated by the mask.
> 
> We wish the HCA to preferentially map the IRQs it uses to numa cores
> close to it.
> To accomplish this, we use affinity hint: first we map IRQs to “close”
> numa cores.
> If these are exhausted, the remaining IRQs are mapped to “far” numa
> cores.
> 
> Signed-off-by: Yuval Atias <yuvala@mellanox.com>
[...]
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -1532,6 +1532,45 @@ static void mlx4_en_linkstate(struct work_struct *work)
>  	mutex_unlock(&mdev->state_lock);
>  }
>  
> +static int mlx4_en_rings_affinity_hint(struct mlx4_en_priv *priv ,
> +				      int *affinity_hint_mapping)
> +{
> +	int cpu;
> +	int cores_arr_index = 0;
> +	cpumask_t *numa_cores_mask;
> +	cpumask_t *non_numa_cores_mask;

The types of these masks must be cpumask_var_t.

> +	if (!zalloc_cpumask_var(&non_numa_cores_mask, GFP_KERNEL)) {
> +		en_err(priv, "Failed allocating cores mask\n");
> +		return -EINVAL;
> +	}
> +	numa_cores_mask = (cpumask_t *)cpumask_of_node(
> +					priv->mdev->dev->numa_node);
> +	if (!cpumask_and(numa_cores_mask, cpu_online_mask, numa_cores_mask))
> +		en_warn(priv, "Failed to find online cores for local numa\n");

This changes the CPU mask of your device's NUMA node.  Don't do that.
You need to copy it to numa_cores_mask and then start changing it.

[...]
> @@ -1557,6 +1596,9 @@ int mlx4_en_start_port(struct net_device *dev)
>  	memset(&priv->ethtool_rules[0], 0,
>  	       sizeof(struct ethtool_flow_id) * MAX_NUM_OF_FS_RULES);
>  
> +	affinity_hint_mapping = kzalloc(num_online_cpus() * sizeof(int),
> +					GFP_KERNEL);
[...]

What makes you think that the online CPU IDs are all contiguous starting
at 0?  The correct array size is nr_cpu_ids, if I remember correctly.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
  2014-01-01 13:05 ` [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues" Amir Vadai
@ 2014-01-01 18:46   ` Yuval Mintz
  2014-01-01 21:50     ` Or Gerlitz
  0 siblings, 1 reply; 16+ messages in thread
From: Yuval Mintz @ 2014-01-01 18:46 UTC (permalink / raw)
  To: Amir Vadai, David S. Miller
  Cc: netdev@vger.kernel.org, Or Gerlitz, Eugenia Emantayev, Ido Shamay

> From: Eugenia Emantayev <eugenia@mellanox.com>
> 
> This reverts commit 90b1ebe "mlx4: set maximal number of default RSS
> queues".
> Limiting the number of receive rings to default (8) reduces overall
> packet rate by 15% in multiple TCP streams application:
> 
>           Packet rate
> 8  rings  967691
> 16 rings  1142070
> 
> Results were obtained using netperf:
> 
> S=200 ; ( for i in $(seq 1 $S) ; do ( \
>   netperf -H 11.7.13.55 -t TCP_RR -l 30 &) ; \
>   wait ; done | grep "1        1" | awk '{SUM+=$6} END {print SUM}' )

Seems odd - well, to be more exact, nothing is odd about the results;
the whole notion of the original patch was to set an upper limit on the
number of  interrupt vectors multi-queue devices ask by default, not
to improve the default performance.

If you believe this is a better default (or some relaxation will be, 
e.g.,  16 instead of 8),  why not set it as default for ALL multi-queue 
networking drivers?

Thanks,
Yuval

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
  2014-01-01 18:46   ` Yuval Mintz
@ 2014-01-01 21:50     ` Or Gerlitz
  2014-01-02  6:04       ` Yuval Mintz
  0 siblings, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2014-01-01 21:50 UTC (permalink / raw)
  To: Yuval Mintz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org, Or Gerlitz,
	Eugenia Emantayev, Ido Shamay

On Wed, Jan 1, 2014 at 8:46 PM, Yuval Mintz <yuvalmin@broadcom.com> wrote:
>
> [...] If you believe this is a better default (or some relaxation will be, e.g.,  16 instead of 8),  why not set it as default for ALL multi-queue networking drivers?

Going back to your original commit 16917b87a "net-next: Add
netif_get_num_default_rss_queues" I am still not clear why we want

1. why we want a common default to all MQ devices?

2. why this default has to be hard coded and not derived e.g from the
number of cores or alike attribute of the system?

Can you please elaborate?

Or.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 1/2] net/mlx4_en: Use affinity hint
  2014-01-01 13:05 ` [PATCH net-next 1/2] net/mlx4_en: Use affinity hint Amir Vadai
  2014-01-01 16:34   ` Ben Hutchings
@ 2014-01-02  3:13   ` David Miller
  1 sibling, 0 replies; 16+ messages in thread
From: David Miller @ 2014-01-02  3:13 UTC (permalink / raw)
  To: amirv; +Cc: netdev, ogerlitz, yuvala

From: Amir Vadai <amirv@mellanox.com>
Date: Wed,  1 Jan 2014 15:05:36 +0200

> +						   &priv->rx_ring[cq_idx]
> +							->affinity_mask)){

Splitting up a pointer dereference like this looks aweful.

This code block is very deeply indented, which is what causes this
problem in the first place.  Abstract it out completely into a helper
function.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
  2014-01-01 21:50     ` Or Gerlitz
@ 2014-01-02  6:04       ` Yuval Mintz
  2014-01-02  9:35         ` Or Gerlitz
  0 siblings, 1 reply; 16+ messages in thread
From: Yuval Mintz @ 2014-01-02  6:04 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org, Or Gerlitz,
	Eugenia Emantayev, Ido Shamay

> > [...] If you believe this is a better default (or some relaxation will be, e.g.,  16
> instead of 8),  why not set it as default for ALL multi-queue networking
> drivers?
> 
> Going back to your original commit 16917b87a "net-next: Add
> netif_get_num_default_rss_queues" I am still not clear why we want
> 
> 1. why we want a common default to all MQ devices?

Although networking benefits from multiple Interrupt vectors
(enabling more rings, better performance, etc.), bounding this
number only to the number of cpus is unreasonable as it strains
system resources; e.g., consider a 40-cpu server - we might wish
to have 40 vectors per device, but that means that connecting
several devices to the same server might cause other functions 
to fail probe as they will no longer be able to acquire interrupt
vectors of their own.

Since networking has an API allowing the user to manually set the
number of channels, the default is upper-bounded.

> 2. why this default has to be hard coded and not derived e.g from the
> number of cores or alike attribute of the system?

This is not entirely correct; The default number is derived from
the number of online cpus - it's only upper bounded by some
hard-coded value.

Cheers,
Yuval

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
  2014-01-02  6:04       ` Yuval Mintz
@ 2014-01-02  9:35         ` Or Gerlitz
  2014-01-02 10:27           ` Yuval Mintz
  0 siblings, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2014-01-02  9:35 UTC (permalink / raw)
  To: Yuval Mintz, Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org,
	Eugenia Emantayev, Ido Shamay

On 02/01/2014 08:04, Yuval Mintz wrote:
>>> [...] If you believe this is a better default (or some relaxation will be, e.g.,  16
>> instead of 8),  why not set it as default for ALL multi-queue networking
>> drivers?
>>
>> Going back to your original commit 16917b87a "net-next: Add
>> netif_get_num_default_rss_queues" I am still not clear why we want
>>
>> 1. why we want a common default to all MQ devices?
> Although networking benefits from multiple Interrupt vectors
> (enabling more rings, better performance, etc.), bounding this
> number only to the number of cpus is unreasonable as it strains
> system resources; e.g., consider a 40-cpu server - we might wish
> to have 40 vectors per device, but that means that connecting
> several devices to the same server might cause other functions
> to fail probe as they will no longer be able to acquire interrupt
> vectors of their own.

Modern servers which have tens of CPUs typically have thousands of MSI-X 
vectors which means you should be easily able to plug four cards into a 
server with 64 cores which will consume 256 out of the 1-4K vectors out 
there. Anyway, let me continue your approach - how about raising the 
default hard limit to 16 or having it as the number of cores @ the numa 
node where the card is plugged?

Or.


>
> Since networking has an API allowing the user to manually set the
> number of channels, the default is upper-bounded.
>
>> 2. why this default has to be hard coded and not derived e.g from the
>> number of cores or alike attribute of the system?
> This is not entirely correct; The default number is derived from
> the number of online cpus - it's only upper bounded by some
> hard-coded value.
>
> Cheers,
> Yuval

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
  2014-01-02  9:35         ` Or Gerlitz
@ 2014-01-02 10:27           ` Yuval Mintz
  2014-01-15 12:15             ` Ido Shamai
  0 siblings, 1 reply; 16+ messages in thread
From: Yuval Mintz @ 2014-01-02 10:27 UTC (permalink / raw)
  To: Or Gerlitz, Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org,
	Eugenia Emantayev, Ido Shamay

> >> Going back to your original commit 16917b87a "net-next: Add
> >> netif_get_num_default_rss_queues" I am still not clear why we want
> >>
> >> 1. why we want a common default to all MQ devices?
> > Although networking benefits from multiple Interrupt vectors
> > (enabling more rings, better performance, etc.), bounding this
> > number only to the number of cpus is unreasonable as it strains
> > system resources; e.g., consider a 40-cpu server - we might wish
> > to have 40 vectors per device, but that means that connecting
> > several devices to the same server might cause other functions
> > to fail probe as they will no longer be able to acquire interrupt
> > vectors of their own.
> 
> Modern servers which have tens of CPUs typically have thousands of MSI-X
> vectors which means you should be easily able to plug four cards into a
> server with 64 cores which will consume 256 out of the 1-4K vectors out
> there. Anyway, let me continue your approach - how about raising the
> default hard limit to 16 or having it as the number of cores @ the numa
> node where the card is plugged?

I think an additional issue was memory consumption -
additional interrupts --> additional allocated memory (for Rx rings).
And I do know the issues were real - we've had complains about devices
failing to load due to lack of resources (not all servers in the world are
top of the art).

Anyway, I believe 8/16 are simply strict limitations without any true meaning;
To judge what's more important, default `slimness' or default performance
is beyond me.
Perhaps the numa approach will prove beneficial (and will make some sense).

Thanks,
Yuval

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/mlx4_en: Use affinity hint
  2014-01-01 16:34   ` Ben Hutchings
@ 2014-01-02 16:33     ` Amir Vadai
  0 siblings, 0 replies; 16+ messages in thread
From: Amir Vadai @ 2014-01-02 16:33 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David S. Miller, netdev, Or Gerlitz, Yuval Atias

On 01/01/14 16:34 +0000, Ben Hutchings wrote:
> On Wed, 2014-01-01 at 15:05 +0200, Amir Vadai wrote:
> > From: Yuval Atias <yuvala@mellanox.com>
> > 
> > The “affinity hint” mechanism is used by the user space
> > daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
> > Irqbalancer can use this hint to balance the irqs between the
> > cpus indicated by the mask.
> > 
> > We wish the HCA to preferentially map the IRQs it uses to numa cores
> > close to it.
> > To accomplish this, we use affinity hint: first we map IRQs to “close”
> > numa cores.
> > If these are exhausted, the remaining IRQs are mapped to “far” numa
> > cores.
> > 
> > Signed-off-by: Yuval Atias <yuvala@mellanox.com>
> [...]
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > @@ -1532,6 +1532,45 @@ static void mlx4_en_linkstate(struct work_struct *work)
> >  	mutex_unlock(&mdev->state_lock);
> >  }
> >  
> > +static int mlx4_en_rings_affinity_hint(struct mlx4_en_priv *priv ,
> > +				      int *affinity_hint_mapping)
> > +{
> > +	int cpu;
> > +	int cores_arr_index = 0;
> > +	cpumask_t *numa_cores_mask;
> > +	cpumask_t *non_numa_cores_mask;
> 
> The types of these masks must be cpumask_var_t.
> 
Will be fixed in V1

> > +	if (!zalloc_cpumask_var(&non_numa_cores_mask, GFP_KERNEL)) {
> > +		en_err(priv, "Failed allocating cores mask\n");
> > +		return -EINVAL;
> > +	}
> > +	numa_cores_mask = (cpumask_t *)cpumask_of_node(
> > +					priv->mdev->dev->numa_node);
> > +	if (!cpumask_and(numa_cores_mask, cpu_online_mask, numa_cores_mask))
> > +		en_warn(priv, "Failed to find online cores for local numa\n");
> 
> This changes the CPU mask of your device's NUMA node.  Don't do that.
> You need to copy it to numa_cores_mask and then start changing it.
> 
Will change in V1

> [...]
> > @@ -1557,6 +1596,9 @@ int mlx4_en_start_port(struct net_device *dev)
> >  	memset(&priv->ethtool_rules[0], 0,
> >  	       sizeof(struct ethtool_flow_id) * MAX_NUM_OF_FS_RULES);
> >  
> > +	affinity_hint_mapping = kzalloc(num_online_cpus() * sizeof(int),
> > +					GFP_KERNEL);
> [...]
> 
> What makes you think that the online CPU IDs are all contiguous starting
> at 0?  The correct array size is nr_cpu_ids, if I remember correctly.
Changing the design to go over the cpumask using cpumask_next(),
instead of preparing this array. So, it will fix this comment too.

> 
> Ben.
> 
> -- 
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks,
Amir

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
  2014-01-02 10:27           ` Yuval Mintz
@ 2014-01-15 12:15             ` Ido Shamai
  2014-01-15 12:46               ` Sathya Perla
  2014-01-15 12:54               ` Yuval Mintz
  0 siblings, 2 replies; 16+ messages in thread
From: Ido Shamai @ 2014-01-15 12:15 UTC (permalink / raw)
  To: Yuval Mintz, Or Gerlitz, Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org,
	Eugenia Emantayev, Ido Shamay

On 1/2/2014 12:27 PM, Yuval Mintz wrote:
>>>> Going back to your original commit 16917b87a "net-next: Add
>>>> netif_get_num_default_rss_queues" I am still not clear why we want
>>>>
>>>> 1. why we want a common default to all MQ devices?
>>> Although networking benefits from multiple Interrupt vectors
>>> (enabling more rings, better performance, etc.), bounding this
>>> number only to the number of cpus is unreasonable as it strains
>>> system resources; e.g., consider a 40-cpu server - we might wish
>>> to have 40 vectors per device, but that means that connecting
>>> several devices to the same server might cause other functions
>>> to fail probe as they will no longer be able to acquire interrupt
>>> vectors of their own.
>>
>> Modern servers which have tens of CPUs typically have thousands of MSI-X
>> vectors which means you should be easily able to plug four cards into a
>> server with 64 cores which will consume 256 out of the 1-4K vectors out
>> there. Anyway, let me continue your approach - how about raising the
>> default hard limit to 16 or having it as the number of cores @ the numa
>> node where the card is plugged?
>
> I think an additional issue was memory consumption -
> additional interrupts --> additional allocated memory (for Rx rings).
> And I do know the issues were real - we've had complains about devices
> failing to load due to lack of resources (not all servers in the world are
> top of the art).
>
> Anyway, I believe 8/16 are simply strict limitations without any true meaning;
> To judge what's more important, default `slimness' or default performance
> is beyond me.
> Perhaps the numa approach will prove beneficial (and will make some sense).

After reviewing all that was said, I feel there is no need to enforce 
vendors with this strict limitation without any true meaning.

The reverted commit you applied forces the driver to use 8 rings at max 
at all time, without the possibility to change in flight using ethtool, 
as it's enforced on the PCI driver at module init (restarting the en 
driver with different of requested rings will not affect).
So it's crucial for performance oriented applications using mlx4_en.

Going through all Ethernet vendors I don't see this limitation enforced, 
so this limitation has no true meaning (no fairness).
I think this patch should go in as is.
Ethernet vendors should use it this limitation when they desire.

Ido
> Thanks,
> Yuval
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
  2014-01-15 12:15             ` Ido Shamai
@ 2014-01-15 12:46               ` Sathya Perla
  2014-01-15 12:49                 ` Ido Shamai
  2014-01-15 12:54               ` Yuval Mintz
  1 sibling, 1 reply; 16+ messages in thread
From: Sathya Perla @ 2014-01-15 12:46 UTC (permalink / raw)
  To: Ido Shamai, Yuval Mintz, Or Gerlitz, Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org,
	Eugenia Emantayev, Ido Shamay

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf
> Of Ido Shamai
> 
> On 1/2/2014 12:27 PM, Yuval Mintz wrote:
> >>>> Going back to your original commit 16917b87a "net-next: Add
> >>>> netif_get_num_default_rss_queues" I am still not clear why we want
> >>>>
> >>>> 1. why we want a common default to all MQ devices?
> >>> Although networking benefits from multiple Interrupt vectors
> >>> (enabling more rings, better performance, etc.), bounding this
> >>> number only to the number of cpus is unreasonable as it strains
> >>> system resources; e.g., consider a 40-cpu server - we might wish
> >>> to have 40 vectors per device, but that means that connecting
> >>> several devices to the same server might cause other functions
> >>> to fail probe as they will no longer be able to acquire interrupt
> >>> vectors of their own.
> >>
> >> Modern servers which have tens of CPUs typically have thousands of MSI-X
> >> vectors which means you should be easily able to plug four cards into a
> >> server with 64 cores which will consume 256 out of the 1-4K vectors out
> >> there. Anyway, let me continue your approach - how about raising the
> >> default hard limit to 16 or having it as the number of cores @ the numa
> >> node where the card is plugged?
> >
> > I think an additional issue was memory consumption -
> > additional interrupts --> additional allocated memory (for Rx rings).
> > And I do know the issues were real - we've had complains about devices
> > failing to load due to lack of resources (not all servers in the world are
> > top of the art).
> >
> > Anyway, I believe 8/16 are simply strict limitations without any true meaning;
> > To judge what's more important, default `slimness' or default performance
> > is beyond me.
> > Perhaps the numa approach will prove beneficial (and will make some sense).
> 
> After reviewing all that was said, I feel there is no need to enforce
> vendors with this strict limitation without any true meaning.
> 
> The reverted commit you applied forces the driver to use 8 rings at max
> at all time, without the possibility to change in flight using ethtool,
> as it's enforced on the PCI driver at module init (restarting the en
> driver with different of requested rings will not affect).
> So it's crucial for performance oriented applications using mlx4_en.

The number of RSS/RX rings used by a driver can be increased (up to the HW supported value)
at runtime using set-channels ethtool interface.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
  2014-01-15 12:46               ` Sathya Perla
@ 2014-01-15 12:49                 ` Ido Shamai
  0 siblings, 0 replies; 16+ messages in thread
From: Ido Shamai @ 2014-01-15 12:49 UTC (permalink / raw)
  To: Sathya Perla, Yuval Mintz, Or Gerlitz, Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org,
	Eugenia Emantayev, Ido Shamay

On 1/15/2014 2:46 PM, Sathya Perla wrote:
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf
>> Of Ido Shamai
>>
>> On 1/2/2014 12:27 PM, Yuval Mintz wrote:
>>>>>> Going back to your original commit 16917b87a "net-next: Add
>>>>>> netif_get_num_default_rss_queues" I am still not clear why we want
>>>>>>
>>>>>> 1. why we want a common default to all MQ devices?
>>>>> Although networking benefits from multiple Interrupt vectors
>>>>> (enabling more rings, better performance, etc.), bounding this
>>>>> number only to the number of cpus is unreasonable as it strains
>>>>> system resources; e.g., consider a 40-cpu server - we might wish
>>>>> to have 40 vectors per device, but that means that connecting
>>>>> several devices to the same server might cause other functions
>>>>> to fail probe as they will no longer be able to acquire interrupt
>>>>> vectors of their own.
>>>>
>>>> Modern servers which have tens of CPUs typically have thousands of MSI-X
>>>> vectors which means you should be easily able to plug four cards into a
>>>> server with 64 cores which will consume 256 out of the 1-4K vectors out
>>>> there. Anyway, let me continue your approach - how about raising the
>>>> default hard limit to 16 or having it as the number of cores @ the numa
>>>> node where the card is plugged?
>>>
>>> I think an additional issue was memory consumption -
>>> additional interrupts --> additional allocated memory (for Rx rings).
>>> And I do know the issues were real - we've had complains about devices
>>> failing to load due to lack of resources (not all servers in the world are
>>> top of the art).
>>>
>>> Anyway, I believe 8/16 are simply strict limitations without any true meaning;
>>> To judge what's more important, default `slimness' or default performance
>>> is beyond me.
>>> Perhaps the numa approach will prove beneficial (and will make some sense).
>>
>> After reviewing all that was said, I feel there is no need to enforce
>> vendors with this strict limitation without any true meaning.
>>
>> The reverted commit you applied forces the driver to use 8 rings at max
>> at all time, without the possibility to change in flight using ethtool,
>> as it's enforced on the PCI driver at module init (restarting the en
>> driver with different of requested rings will not affect).
>> So it's crucial for performance oriented applications using mlx4_en.
>
> The number of RSS/RX rings used by a driver can be increased (up to the HW supported value)
> at runtime using set-channels ethtool interface.
Not in this case, see my comment above: as it's enforced on the PCI 
driver at module init.
set-channels interface in our case will not change this limitation, but 
only up to it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
  2014-01-15 12:15             ` Ido Shamai
  2014-01-15 12:46               ` Sathya Perla
@ 2014-01-15 12:54               ` Yuval Mintz
  2014-01-15 13:15                 ` Ido Shamai
  1 sibling, 1 reply; 16+ messages in thread
From: Yuval Mintz @ 2014-01-15 12:54 UTC (permalink / raw)
  To: Ido Shamai, Or Gerlitz, Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org,
	Eugenia Emantayev, Ido Shamay

> > Anyway, I believe 8/16 are simply strict limitations without any true
> meaning;
> > To judge what's more important, default `slimness' or default performance
> > is beyond me.
> > Perhaps the numa approach will prove beneficial (and will make some
> sense).
> 
> After reviewing all that was said, I feel there is no need to enforce
> vendors with this strict limitation without any true meaning.
> 
> The reverted commit you applied forces the driver to use 8 rings at max
> at all time, without the possibility to change in flight using ethtool,
> as it's enforced on the PCI driver at module init (restarting the en
> driver with different of requested rings will not affect).
> So it's crucial for performance oriented applications using mlx4_en.

The rational is to prevent default misusage of resources, be them irq vectors
memories for rings.
Notice this is just the default - You can always re-request interrupts if the
user explicitly requests a large number of rings;
Although, if the driver is incapable of that (e.g., hw limitations), perhaps you
should allocate a larger number of irq vectors during init and use a limitation
on your default number of rings 
(that's assuming that irq vectors are really inexpensive on all machines).

> Going through all Ethernet vendors I don't see this limitation enforced,
> so this limitation has no true meaning (no fairness).
> I think this patch should go in as is.
> Ethernet vendors should use it this limitation when they desire.

Might be true, but two wrongs don't make a right.
I believe that either this limitation should be mandatory, or the functionality
Should Not be included in the kernel as communal code and each driver
should do as it pleases.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
  2014-01-15 12:54               ` Yuval Mintz
@ 2014-01-15 13:15                 ` Ido Shamai
  0 siblings, 0 replies; 16+ messages in thread
From: Ido Shamai @ 2014-01-15 13:15 UTC (permalink / raw)
  To: Yuval Mintz, Or Gerlitz, Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org,
	Eugenia Emantayev, Ido Shamay

On 1/15/2014 2:54 PM, Yuval Mintz wrote:
>>> Anyway, I believe 8/16 are simply strict limitations without any true
>> meaning;
>>> To judge what's more important, default `slimness' or default performance
>>> is beyond me.
>>> Perhaps the numa approach will prove beneficial (and will make some
>> sense).
>>
>> After reviewing all that was said, I feel there is no need to enforce
>> vendors with this strict limitation without any true meaning.
>>
>> The reverted commit you applied forces the driver to use 8 rings at max
>> at all time, without the possibility to change in flight using ethtool,
>> as it's enforced on the PCI driver at module init (restarting the en
>> driver with different of requested rings will not affect).
>> So it's crucial for performance oriented applications using mlx4_en.
>
> The rational is to prevent default misusage of resources, be them irq vectors
> memories for rings.
> Notice this is just the default - You can always re-request interrupts if the
> user explicitly requests a large number of rings;
> Although, if the driver is incapable of that (e.g., hw limitations), perhaps you
> should allocate a larger number of irq vectors during init and use a limitation
> on your default number of rings
> (that's assuming that irq vectors are really inexpensive on all machines).

I fully agree with you on that.
We will send a new patch that will limit the default number of rings to 
this limitation, and could be changed using set-channels ethtool 
interface. Number of IRQ vectors will be (max) of number of CPUs.
Yuval, thanks for clarifying.

>> Going through all Ethernet vendors I don't see this limitation enforced,
>> so this limitation has no true meaning (no fairness).
>> I think this patch should go in as is.
>> Ethernet vendors should use it this limitation when they desire.
>
> Might be true, but two wrongs don't make a right.
> I believe that either this limitation should be mandatory, or the functionality
> Should Not be included in the kernel as communal code and each driver
> should do as it pleases.
I agree, but this is a different discussion that should be held.
I agree, but this is a different discussion, which I hope to be held 
sometimes in the near future.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-01-15 13:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-01 13:05 [PATCH net-next 0/2] net/mlx4: Mellanox driver update 01-01-2014 Amir Vadai
2014-01-01 13:05 ` [PATCH net-next 1/2] net/mlx4_en: Use affinity hint Amir Vadai
2014-01-01 16:34   ` Ben Hutchings
2014-01-02 16:33     ` Amir Vadai
2014-01-02  3:13   ` [PATCH net-next 1/2] " David Miller
2014-01-01 13:05 ` [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues" Amir Vadai
2014-01-01 18:46   ` Yuval Mintz
2014-01-01 21:50     ` Or Gerlitz
2014-01-02  6:04       ` Yuval Mintz
2014-01-02  9:35         ` Or Gerlitz
2014-01-02 10:27           ` Yuval Mintz
2014-01-15 12:15             ` Ido Shamai
2014-01-15 12:46               ` Sathya Perla
2014-01-15 12:49                 ` Ido Shamai
2014-01-15 12:54               ` Yuval Mintz
2014-01-15 13:15                 ` Ido Shamai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).