Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 00/10] mlx4 misc improvements
From: Tariq Toukan @ 2016-11-27 15:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Tariq Toukan

Hi Dave,

This patchset contains several improvements, cleanups, and bug fixes
from the team to the mlx4 Eth and core drivers.

Series generated against net-next commit:
e5f12b3f5ebb Merge branch 'mlxsw-trap-groups-and-policers'

Thanks,
Tariq.

Alaa Hleihel (1):
  net/mlx4_core: Get num_tc using netdev_get_num_tc

Daniel Jurgens (1):
  net/mlx4_core: Set EQ affinity hint to local NUMA CPUs

Eran Ben Elisha (1):
  net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap

Erez Shitrit (2):
  net/mlx4_core: Make each VF manage its own mac table
  net/mlx4_en: Add new FDB entry only if there is space in the mac table

Jack Morgenstein (1):
  net/mlx4_core: Fix racy CQ (Completion Queue) free

Matan Barak (2):
  net/mlx4_core: Add resource alloc/dealloc debugging
  net/mlx4: Change number of max MSIXs from 64 to 1024

Tariq Toukan (1):
  net/mlx4: Replace ENOSYS with better fitting error codes

Yishai Hadas (1):
  net/mlx4_core: Device revision support

 drivers/net/ethernet/mellanox/mlx4/cq.c            |  38 ++--
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c    |   4 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c     |   6 +
 drivers/net/ethernet/mellanox/mlx4/en_tx.c         |   2 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c            |  17 +-
 drivers/net/ethernet/mellanox/mlx4/main.c          | 253 ++++++++++++---------
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |   1 +
 drivers/net/ethernet/mellanox/mlx4/port.c          |  98 +++++++-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  57 ++++-
 include/linux/mlx4/device.h                        |   4 +-
 10 files changed, 342 insertions(+), 138 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net-next 10/10] net/mlx4_core: Fix racy CQ (Completion Queue) free
From: Tariq Toukan @ 2016-11-27 15:51 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Jack Morgenstein, Matan Barak,
	Tariq Toukan
In-Reply-To: <1480261877-19720-1-git-send-email-tariqt@mellanox.com>

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

In function mlx4_cq_completion() and mlx4_cq_event(), the
radix_tree_lookup requires a rcu_read_lock.
This is mandatory: if another core frees the CQ, it could
run the radix_tree_node_rcu_free() call_rcu() callback while
its being used by the radix tree lookup function.

Additionally, in function mlx4_cq_event(), since we are adding
the rcu lock around the radix-tree lookup, we no longer need to take
the spinlock. Also, the synchronize_irq() call for the async event
eliminates the need for incrementing the cq reference count in
mlx4_cq_event().

Other changes:
1. In function mlx4_cq_free(), replace spin_lock_irq with spin_lock:
   we no longer take this spinlock in the interrupt context.
   The spinlock here, therefore, simply protects against different
   threads simultaneously invoking mlx4_cq_free() for different cq's.

2. In function mlx4_cq_free(), we move the radix tree delete to before
   the synchronize_irq() calls. This guarantees that we will not
   access this cq during any subsequent interrupts, and therefore can
   safely free the CQ after the synchronize_irq calls. The rcu_read_lock
   in the interrupt handlers only needs to protect against corrupting the
   radix tree; the interrupt handlers may access the cq outside the
   rcu_read_lock due to the synchronize_irq calls which protect against
   premature freeing of the cq.

3. In function mlx4_cq_event(), we change the mlx_warn message to mlx4_dbg.

4. We leave the cq reference count mechanism in place, because it is
   still needed for the cq completion tasklet mechanism.

Fixes: 6d90aa5cf17b ("net/mlx4_core: Make sure there are no pending async events when freeing CQ")
Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/cq.c | 38 +++++++++++++++++----------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c
index a849da92f857..6b8635378f1f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -101,13 +101,19 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
 {
 	struct mlx4_cq *cq;
 
+	rcu_read_lock();
 	cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree,
 			       cqn & (dev->caps.num_cqs - 1));
+	rcu_read_unlock();
+
 	if (!cq) {
 		mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn);
 		return;
 	}
 
+	/* Acessing the CQ outside of rcu_read_lock is safe, because
+	 * the CQ is freed only after interrupt handling is completed.
+	 */
 	++cq->arm_sn;
 
 	cq->comp(cq);
@@ -118,23 +124,19 @@ void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type)
 	struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table;
 	struct mlx4_cq *cq;
 
-	spin_lock(&cq_table->lock);
-
+	rcu_read_lock();
 	cq = radix_tree_lookup(&cq_table->tree, cqn & (dev->caps.num_cqs - 1));
-	if (cq)
-		atomic_inc(&cq->refcount);
-
-	spin_unlock(&cq_table->lock);
+	rcu_read_unlock();
 
 	if (!cq) {
-		mlx4_warn(dev, "Async event for bogus CQ %08x\n", cqn);
+		mlx4_dbg(dev, "Async event for bogus CQ %08x\n", cqn);
 		return;
 	}
 
+	/* Acessing the CQ outside of rcu_read_lock is safe, because
+	 * the CQ is freed only after interrupt handling is completed.
+	 */
 	cq->event(cq, event_type);
-
-	if (atomic_dec_and_test(&cq->refcount))
-		complete(&cq->free);
 }
 
 static int mlx4_SW2HW_CQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox,
@@ -301,9 +303,9 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent,
 	if (err)
 		return err;
 
-	spin_lock_irq(&cq_table->lock);
+	spin_lock(&cq_table->lock);
 	err = radix_tree_insert(&cq_table->tree, cq->cqn, cq);
-	spin_unlock_irq(&cq_table->lock);
+	spin_unlock(&cq_table->lock);
 	if (err)
 		goto err_icm;
 
@@ -349,9 +351,9 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent,
 	return 0;
 
 err_radix:
-	spin_lock_irq(&cq_table->lock);
+	spin_lock(&cq_table->lock);
 	radix_tree_delete(&cq_table->tree, cq->cqn);
-	spin_unlock_irq(&cq_table->lock);
+	spin_unlock(&cq_table->lock);
 
 err_icm:
 	mlx4_cq_free_icm(dev, cq->cqn);
@@ -370,15 +372,15 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq)
 	if (err)
 		mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn);
 
+	spin_lock(&cq_table->lock);
+	radix_tree_delete(&cq_table->tree, cq->cqn);
+	spin_unlock(&cq_table->lock);
+
 	synchronize_irq(priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq);
 	if (priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq !=
 	    priv->eq_table.eq[MLX4_EQ_ASYNC].irq)
 		synchronize_irq(priv->eq_table.eq[MLX4_EQ_ASYNC].irq);
 
-	spin_lock_irq(&cq_table->lock);
-	radix_tree_delete(&cq_table->tree, cq->cqn);
-	spin_unlock_irq(&cq_table->lock);
-
 	if (atomic_dec_and_test(&cq->refcount))
 		complete(&cq->free);
 	wait_for_completion(&cq->free);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 05/10] net/mlx4_core: Set EQ affinity hint to local NUMA CPUs
From: Tariq Toukan @ 2016-11-27 15:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Daniel Jurgens, Tariq Toukan
In-Reply-To: <1480261877-19720-1-git-send-email-tariqt@mellanox.com>

From: Daniel Jurgens <danielj@mellanox.com>

Use CPUs on the close NUMA when setting the EQ affinity hints.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 41cd05f37b20..4a9497e9778d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2825,7 +2825,8 @@ static int mlx4_init_affinity_hint(struct mlx4_dev *dev, int port, int eqn)
 	if (!zalloc_cpumask_var(&eq->affinity_mask, GFP_KERNEL))
 		return -ENOMEM;
 
-	cpumask_set_cpu(requested_cpu, eq->affinity_mask);
+	cpumask_set_cpu(cpumask_local_spread(requested_cpu, dev->numa_node),
+			eq->affinity_mask);
 
 	return 0;
 }
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 03/10] net/mlx4_en: Add new FDB entry only if there is space in the mac table
From: Tariq Toukan @ 2016-11-27 15:51 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Erez Shitrit, Gal Pressman, Tariq Toukan
In-Reply-To: <1480261877-19720-1-git-send-email-tariqt@mellanox.com>

From: Erez Shitrit <erezsh@mellanox.com>

Before adding a new mac to the FDB (Forwarding Database),
make sure there is space for it. Each port has 128
macs that are allocated between the hypervisor and the VFs.
If there is no space, return error.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 6 ++++++
 drivers/net/ethernet/mellanox/mlx4/port.c      | 8 ++++++++
 include/linux/mlx4/device.h                    | 1 +
 3 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 9018bb1b2e12..60c3b2da8714 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1123,6 +1123,12 @@ static void mlx4_en_do_uc_filter(struct mlx4_en_priv *priv,
 			}
 			mac = mlx4_mac_to_u64(ha->addr);
 			memcpy(entry->mac, ha->addr, ETH_ALEN);
+
+			if (!mlx4_is_available_mac(mdev->dev, priv->port)) {
+				mlx4_warn(mdev, "Cannot add mac:%pM, no free macs.\n", &mac);
+				break;
+			}
+
 			err = mlx4_register_mac(mdev->dev, priv->port, mac);
 			if (err < 0) {
 				en_err(priv, "Failed registering MAC %pM on port %d: %d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index 86cb58690845..ccc4670f92b5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -2104,3 +2104,11 @@ int mlx4_get_port_free_macs(struct mlx4_dev *mdev, int port)
 		mlx4_get_port_total_macs(mdev, port));
 }
 EXPORT_SYMBOL(mlx4_get_port_free_macs);
+
+bool mlx4_is_available_mac(struct mlx4_dev *mdev, int port)
+{
+	int free_macs = mlx4_get_port_free_macs(mdev, port);
+
+	return free_macs >= MLX4_VF_MAC_QUOTA;
+}
+EXPORT_SYMBOL(mlx4_is_available_mac);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 4220fe8fe094..1dcd6ac3b1f3 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1494,6 +1494,7 @@ struct mlx4_slaves_pport mlx4_phys_to_slaves_pport_actv(
 
 int mlx4_config_vxlan_port(struct mlx4_dev *dev, __be16 udp_port);
 int mlx4_get_port_free_macs(struct mlx4_dev *mdev, int port);
+bool mlx4_is_available_mac(struct mlx4_dev *mdev, int port);
 int mlx4_disable_rx_port_check(struct mlx4_dev *dev, bool dis);
 int mlx4_config_roce_v2_port(struct mlx4_dev *dev, u16 udp_port);
 int mlx4_virt2phy_port_map(struct mlx4_dev *dev, u32 port1, u32 port2);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 09/10] net/mlx4: Change number of max MSIXs from 64 to 1024
From: Tariq Toukan @ 2016-11-27 15:51 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Matan Barak, Moshe Shemesh, Tariq Toukan
In-Reply-To: <1480261877-19720-1-git-send-email-tariqt@mellanox.com>

From: Matan Barak <matanb@mellanox.com>

Increase the number of max MSIXs in order to achieve
better performance on machines with high number of CPUs.

Fixes: 0b7ca5a928e2 ("mlx4: Changing interrupt scheme")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 include/linux/mlx4/device.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 1dcd6ac3b1f3..44863e3c0981 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -47,7 +47,7 @@
 #define DEFAULT_UAR_PAGE_SHIFT  12

 #define MAX_MSIX_P_PORT		17
-#define MAX_MSIX		64
+#define MAX_MSIX		1024
 #define MIN_MSIX_P_PORT		5
 #define MLX4_IS_LEGACY_EQ_MODE(dev_cap) ((dev_cap).num_comp_vectors < \
 					 (dev_cap).num_ports * MIN_MSIX_P_PORT)
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 08/10] net/mlx4_core: Get num_tc using netdev_get_num_tc
From: Tariq Toukan @ 2016-11-27 15:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Alaa Hleihel, Tariq Toukan
In-Reply-To: <1480261877-19720-1-git-send-email-tariqt@mellanox.com>

From: Alaa Hleihel <alaa@mellanox.com>

Avoid reading num_tc directly from struct net_device, but use
the helper function netdev_get_num_tc.

Fixes: bc6a4744b827 ("net/mlx4_en: num cores tx rings for every UP")
Fixes: f5b6345ba8da ("net/mlx4_en: User prio mapping gets corrupted when changing number of channels")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 2 +-
 drivers/net/ethernet/mellanox/mlx4/en_tx.c      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 7e6d0425394d..9848302ace84 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -1791,7 +1791,7 @@ static int mlx4_en_set_channels(struct net_device *dev,
 	netif_set_real_num_tx_queues(dev, priv->tx_ring_num[TX]);
 	netif_set_real_num_rx_queues(dev, priv->rx_ring_num);
 
-	if (dev->num_tc)
+	if (netdev_get_num_tc(dev))
 		mlx4_en_setup_tc(dev, MLX4_EN_NUM_UP);
 
 	en_warn(priv, "Using %d TX rings\n", priv->tx_ring_num[TX]);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 4b597dca5c52..abe1c5994b73 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -710,7 +710,7 @@ u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb,
 	u16 rings_p_up = priv->num_tx_rings_p_up;
 	u8 up = 0;
 
-	if (dev->num_tc)
+	if (netdev_get_num_tc(dev))
 		return skb_tx_hash(dev, skb);
 
 	if (skb_vlan_tag_present(skb))
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 06/10] net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap
From: Tariq Toukan @ 2016-11-27 15:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Tal Alon, Tariq Toukan
In-Reply-To: <1480261877-19720-1-git-send-email-tariqt@mellanox.com>

From: Eran Ben Elisha <eranbe@mellanox.com>

In order to avoid temporary large structs on the stack,
allocate them dynamically.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tal Alon <talal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 244 +++++++++++++++++-------------
 1 file changed, 142 insertions(+), 102 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 4a9497e9778d..65502df9fd96 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -799,40 +799,117 @@ static void slave_adjust_steering_mode(struct mlx4_dev *dev,
 		 mlx4_steering_mode_str(dev->caps.steering_mode));
 }
 
+static void mlx4_slave_destroy_special_qp_cap(struct mlx4_dev *dev)
+{
+	kfree(dev->caps.qp0_qkey);
+	kfree(dev->caps.qp0_tunnel);
+	kfree(dev->caps.qp0_proxy);
+	kfree(dev->caps.qp1_tunnel);
+	kfree(dev->caps.qp1_proxy);
+	dev->caps.qp0_qkey = NULL;
+	dev->caps.qp0_tunnel = NULL;
+	dev->caps.qp0_proxy = NULL;
+	dev->caps.qp1_tunnel = NULL;
+	dev->caps.qp1_proxy = NULL;
+}
+
+static int mlx4_slave_special_qp_cap(struct mlx4_dev *dev)
+{
+	struct mlx4_func_cap *func_cap = NULL;
+	int i, err = 0;
+
+	func_cap = kzalloc(sizeof(*func_cap), GFP_KERNEL);
+	dev->caps.qp0_qkey = kcalloc(dev->caps.num_ports,
+				     sizeof(u32), GFP_KERNEL);
+	dev->caps.qp0_tunnel = kcalloc(dev->caps.num_ports,
+				       sizeof(u32), GFP_KERNEL);
+	dev->caps.qp0_proxy = kcalloc(dev->caps.num_ports,
+				      sizeof(u32), GFP_KERNEL);
+	dev->caps.qp1_tunnel = kcalloc(dev->caps.num_ports,
+				       sizeof(u32), GFP_KERNEL);
+	dev->caps.qp1_proxy = kcalloc(dev->caps.num_ports,
+				      sizeof(u32), GFP_KERNEL);
+
+	if (!dev->caps.qp0_tunnel || !dev->caps.qp0_proxy ||
+	    !dev->caps.qp1_tunnel || !dev->caps.qp1_proxy ||
+	    !dev->caps.qp0_qkey || !func_cap) {
+		mlx4_err(dev, "Failed to allocate memory for special qps cap\n");
+		err = -ENOMEM;
+		goto err_mem;
+	}
+
+	for (i = 1; i <= dev->caps.num_ports; ++i) {
+		err = mlx4_QUERY_FUNC_CAP(dev, i, func_cap);
+		if (err) {
+			mlx4_err(dev, "QUERY_FUNC_CAP port command failed for port %d, aborting (%d)\n",
+				 i, err);
+			goto err_mem;
+		}
+		dev->caps.qp0_qkey[i - 1] = func_cap->qp0_qkey;
+		dev->caps.qp0_tunnel[i - 1] = func_cap->qp0_tunnel_qpn;
+		dev->caps.qp0_proxy[i - 1] = func_cap->qp0_proxy_qpn;
+		dev->caps.qp1_tunnel[i - 1] = func_cap->qp1_tunnel_qpn;
+		dev->caps.qp1_proxy[i - 1] = func_cap->qp1_proxy_qpn;
+		dev->caps.port_mask[i] = dev->caps.port_type[i];
+		dev->caps.phys_port_id[i] = func_cap->phys_port_id;
+		err = mlx4_get_slave_pkey_gid_tbl_len(dev, i,
+				    &dev->caps.gid_table_len[i],
+				    &dev->caps.pkey_table_len[i]);
+		if (err) {
+			mlx4_err(dev, "QUERY_PORT command failed for port %d, aborting (%d)\n",
+				 i, err);
+			goto err_mem;
+		}
+	}
+
+err_mem:
+	if (err)
+		mlx4_slave_destroy_special_qp_cap(dev);
+	kfree(func_cap);
+	return err;
+}
+
 static int mlx4_slave_cap(struct mlx4_dev *dev)
 {
 	int			   err;
 	u32			   page_size;
-	struct mlx4_dev_cap	   dev_cap;
-	struct mlx4_func_cap	   func_cap;
-	struct mlx4_init_hca_param hca_param;
-	u8			   i;
+	struct mlx4_dev_cap	   *dev_cap = NULL;
+	struct mlx4_func_cap	   *func_cap = NULL;
+	struct mlx4_init_hca_param *hca_param = NULL;
+
+	hca_param = kzalloc(sizeof(*hca_param), GFP_KERNEL);
+	func_cap = kzalloc(sizeof(*func_cap), GFP_KERNEL);
+	dev_cap = kzalloc(sizeof(*dev_cap), GFP_KERNEL);
+	if (!hca_param || !func_cap || !dev_cap) {
+		mlx4_err(dev, "Failed to allocate memory for slave_cap\n");
+		err = -ENOMEM;
+		goto free_mem;
+	}
 
-	memset(&hca_param, 0, sizeof(hca_param));
-	err = mlx4_QUERY_HCA(dev, &hca_param);
+	err = mlx4_QUERY_HCA(dev, hca_param);
 	if (err) {
 		mlx4_err(dev, "QUERY_HCA command failed, aborting\n");
-		return err;
+		goto free_mem;
 	}
 
 	/* fail if the hca has an unknown global capability
 	 * at this time global_caps should be always zeroed
 	 */
-	if (hca_param.global_caps) {
+	if (hca_param->global_caps) {
 		mlx4_err(dev, "Unknown hca global capabilities\n");
-		return -EINVAL;
+		err = -EINVAL;
+		goto free_mem;
 	}
 
-	mlx4_log_num_mgm_entry_size = hca_param.log_mc_entry_sz;
+	mlx4_log_num_mgm_entry_size = hca_param->log_mc_entry_sz;
 
-	dev->caps.hca_core_clock = hca_param.hca_core_clock;
+	dev->caps.hca_core_clock = hca_param->hca_core_clock;
 
-	memset(&dev_cap, 0, sizeof(dev_cap));
-	dev->caps.max_qp_dest_rdma = 1 << hca_param.log_rd_per_qp;
-	err = mlx4_dev_cap(dev, &dev_cap);
+	dev->caps.max_qp_dest_rdma = 1 << hca_param->log_rd_per_qp;
+	err = mlx4_dev_cap(dev, dev_cap);
 	if (err) {
 		mlx4_err(dev, "QUERY_DEV_CAP command failed, aborting\n");
-		return err;
+		goto free_mem;
 	}
 
 	err = mlx4_QUERY_FW(dev);
@@ -844,21 +921,23 @@ static int mlx4_slave_cap(struct mlx4_dev *dev)
 	if (page_size > PAGE_SIZE) {
 		mlx4_err(dev, "HCA minimum page size of %d bigger than kernel PAGE_SIZE of %ld, aborting\n",
 			 page_size, PAGE_SIZE);
-		return -ENODEV;
+		err = -ENODEV;
+		goto free_mem;
 	}
 
 	/* Set uar_page_shift for VF */
-	dev->uar_page_shift = hca_param.uar_page_sz + 12;
+	dev->uar_page_shift = hca_param->uar_page_sz + 12;
 
 	/* Make sure the master uar page size is valid */
 	if (dev->uar_page_shift > PAGE_SHIFT) {
 		mlx4_err(dev,
 			 "Invalid configuration: uar page size is larger than system page size\n");
-		return  -ENODEV;
+		err = -ENODEV;
+		goto free_mem;
 	}
 
 	/* Set reserved_uars based on the uar_page_shift */
-	mlx4_set_num_reserved_uars(dev, &dev_cap);
+	mlx4_set_num_reserved_uars(dev, dev_cap);
 
 	/* Although uar page size in FW differs from system page size,
 	 * upper software layers (mlx4_ib, mlx4_en and part of mlx4_core)
@@ -866,34 +945,35 @@ static int mlx4_slave_cap(struct mlx4_dev *dev)
 	 */
 	dev->caps.uar_page_size = PAGE_SIZE;
 
-	memset(&func_cap, 0, sizeof(func_cap));
-	err = mlx4_QUERY_FUNC_CAP(dev, 0, &func_cap);
+	err = mlx4_QUERY_FUNC_CAP(dev, 0, func_cap);
 	if (err) {
 		mlx4_err(dev, "QUERY_FUNC_CAP general command failed, aborting (%d)\n",
 			 err);
-		return err;
+		goto free_mem;
 	}
 
-	if ((func_cap.pf_context_behaviour | PF_CONTEXT_BEHAVIOUR_MASK) !=
+	if ((func_cap->pf_context_behaviour | PF_CONTEXT_BEHAVIOUR_MASK) !=
 	    PF_CONTEXT_BEHAVIOUR_MASK) {
 		mlx4_err(dev, "Unknown pf context behaviour %x known flags %x\n",
-			 func_cap.pf_context_behaviour, PF_CONTEXT_BEHAVIOUR_MASK);
-		return -EINVAL;
-	}
-
-	dev->caps.num_ports		= func_cap.num_ports;
-	dev->quotas.qp			= func_cap.qp_quota;
-	dev->quotas.srq			= func_cap.srq_quota;
-	dev->quotas.cq			= func_cap.cq_quota;
-	dev->quotas.mpt			= func_cap.mpt_quota;
-	dev->quotas.mtt			= func_cap.mtt_quota;
-	dev->caps.num_qps		= 1 << hca_param.log_num_qps;
-	dev->caps.num_srqs		= 1 << hca_param.log_num_srqs;
-	dev->caps.num_cqs		= 1 << hca_param.log_num_cqs;
-	dev->caps.num_mpts		= 1 << hca_param.log_mpt_sz;
-	dev->caps.num_eqs		= func_cap.max_eq;
-	dev->caps.reserved_eqs		= func_cap.reserved_eq;
-	dev->caps.reserved_lkey		= func_cap.reserved_lkey;
+			 func_cap->pf_context_behaviour,
+			 PF_CONTEXT_BEHAVIOUR_MASK);
+		err = -EINVAL;
+		goto free_mem;
+	}
+
+	dev->caps.num_ports		= func_cap->num_ports;
+	dev->quotas.qp			= func_cap->qp_quota;
+	dev->quotas.srq			= func_cap->srq_quota;
+	dev->quotas.cq			= func_cap->cq_quota;
+	dev->quotas.mpt			= func_cap->mpt_quota;
+	dev->quotas.mtt			= func_cap->mtt_quota;
+	dev->caps.num_qps		= 1 << hca_param->log_num_qps;
+	dev->caps.num_srqs		= 1 << hca_param->log_num_srqs;
+	dev->caps.num_cqs		= 1 << hca_param->log_num_cqs;
+	dev->caps.num_mpts		= 1 << hca_param->log_mpt_sz;
+	dev->caps.num_eqs		= func_cap->max_eq;
+	dev->caps.reserved_eqs		= func_cap->reserved_eq;
+	dev->caps.reserved_lkey		= func_cap->reserved_lkey;
 	dev->caps.num_pds               = MLX4_NUM_PDS;
 	dev->caps.num_mgms              = 0;
 	dev->caps.num_amgms             = 0;
@@ -906,38 +986,10 @@ static int mlx4_slave_cap(struct mlx4_dev *dev)
 
 	mlx4_replace_zero_macs(dev);
 
-	dev->caps.qp0_qkey = kcalloc(dev->caps.num_ports, sizeof(u32), GFP_KERNEL);
-	dev->caps.qp0_tunnel = kcalloc(dev->caps.num_ports, sizeof (u32), GFP_KERNEL);
-	dev->caps.qp0_proxy = kcalloc(dev->caps.num_ports, sizeof (u32), GFP_KERNEL);
-	dev->caps.qp1_tunnel = kcalloc(dev->caps.num_ports, sizeof (u32), GFP_KERNEL);
-	dev->caps.qp1_proxy = kcalloc(dev->caps.num_ports, sizeof (u32), GFP_KERNEL);
-
-	if (!dev->caps.qp0_tunnel || !dev->caps.qp0_proxy ||
-	    !dev->caps.qp1_tunnel || !dev->caps.qp1_proxy ||
-	    !dev->caps.qp0_qkey) {
-		err = -ENOMEM;
-		goto err_mem;
-	}
-
-	for (i = 1; i <= dev->caps.num_ports; ++i) {
-		err = mlx4_QUERY_FUNC_CAP(dev, i, &func_cap);
-		if (err) {
-			mlx4_err(dev, "QUERY_FUNC_CAP port command failed for port %d, aborting (%d)\n",
-				 i, err);
-			goto err_mem;
-		}
-		dev->caps.qp0_qkey[i - 1] = func_cap.qp0_qkey;
-		dev->caps.qp0_tunnel[i - 1] = func_cap.qp0_tunnel_qpn;
-		dev->caps.qp0_proxy[i - 1] = func_cap.qp0_proxy_qpn;
-		dev->caps.qp1_tunnel[i - 1] = func_cap.qp1_tunnel_qpn;
-		dev->caps.qp1_proxy[i - 1] = func_cap.qp1_proxy_qpn;
-		dev->caps.port_mask[i] = dev->caps.port_type[i];
-		dev->caps.phys_port_id[i] = func_cap.phys_port_id;
-		err = mlx4_get_slave_pkey_gid_tbl_len(dev, i,
-						      &dev->caps.gid_table_len[i],
-						      &dev->caps.pkey_table_len[i]);
-		if (err)
-			goto err_mem;
+	err = mlx4_slave_special_qp_cap(dev);
+	if (err) {
+		mlx4_err(dev, "Set special QP caps failed. aborting\n");
+		goto free_mem;
 	}
 
 	if (dev->caps.uar_page_size * (dev->caps.num_uars -
@@ -952,7 +1004,7 @@ static int mlx4_slave_cap(struct mlx4_dev *dev)
 		goto err_mem;
 	}
 
-	if (hca_param.dev_cap_enabled & MLX4_DEV_CAP_64B_EQE_ENABLED) {
+	if (hca_param->dev_cap_enabled & MLX4_DEV_CAP_64B_EQE_ENABLED) {
 		dev->caps.eqe_size   = 64;
 		dev->caps.eqe_factor = 1;
 	} else {
@@ -960,20 +1012,20 @@ static int mlx4_slave_cap(struct mlx4_dev *dev)
 		dev->caps.eqe_factor = 0;
 	}
 
-	if (hca_param.dev_cap_enabled & MLX4_DEV_CAP_64B_CQE_ENABLED) {
+	if (hca_param->dev_cap_enabled & MLX4_DEV_CAP_64B_CQE_ENABLED) {
 		dev->caps.cqe_size   = 64;
 		dev->caps.userspace_caps |= MLX4_USER_DEV_CAP_LARGE_CQE;
 	} else {
 		dev->caps.cqe_size   = 32;
 	}
 
-	if (hca_param.dev_cap_enabled & MLX4_DEV_CAP_EQE_STRIDE_ENABLED) {
-		dev->caps.eqe_size = hca_param.eqe_size;
+	if (hca_param->dev_cap_enabled & MLX4_DEV_CAP_EQE_STRIDE_ENABLED) {
+		dev->caps.eqe_size = hca_param->eqe_size;
 		dev->caps.eqe_factor = 0;
 	}
 
-	if (hca_param.dev_cap_enabled & MLX4_DEV_CAP_CQE_STRIDE_ENABLED) {
-		dev->caps.cqe_size = hca_param.cqe_size;
+	if (hca_param->dev_cap_enabled & MLX4_DEV_CAP_CQE_STRIDE_ENABLED) {
+		dev->caps.cqe_size = hca_param->cqe_size;
 		/* User still need to know when CQE > 32B */
 		dev->caps.userspace_caps |= MLX4_USER_DEV_CAP_LARGE_CQE;
 	}
@@ -981,31 +1033,24 @@ static int mlx4_slave_cap(struct mlx4_dev *dev)
 	dev->caps.flags2 &= ~MLX4_DEV_CAP_FLAG2_TS;
 	mlx4_warn(dev, "Timestamping is not supported in slave mode\n");
 
-	slave_adjust_steering_mode(dev, &dev_cap, &hca_param);
+	slave_adjust_steering_mode(dev, dev_cap, hca_param);
 	mlx4_dbg(dev, "RSS support for IP fragments is %s\n",
-		 hca_param.rss_ip_frags ? "on" : "off");
+		 hca_param->rss_ip_frags ? "on" : "off");
 
-	if (func_cap.extra_flags & MLX4_QUERY_FUNC_FLAGS_BF_RES_QP &&
+	if (func_cap->extra_flags & MLX4_QUERY_FUNC_FLAGS_BF_RES_QP &&
 	    dev->caps.bf_reg_size)
 		dev->caps.alloc_res_qp_mask |= MLX4_RESERVE_ETH_BF_QP;
 
-	if (func_cap.extra_flags & MLX4_QUERY_FUNC_FLAGS_A0_RES_QP)
+	if (func_cap->extra_flags & MLX4_QUERY_FUNC_FLAGS_A0_RES_QP)
 		dev->caps.alloc_res_qp_mask |= MLX4_RESERVE_A0_QP;
 
-	return 0;
-
 err_mem:
-	kfree(dev->caps.qp0_qkey);
-	kfree(dev->caps.qp0_tunnel);
-	kfree(dev->caps.qp0_proxy);
-	kfree(dev->caps.qp1_tunnel);
-	kfree(dev->caps.qp1_proxy);
-	dev->caps.qp0_qkey = NULL;
-	dev->caps.qp0_tunnel = NULL;
-	dev->caps.qp0_proxy = NULL;
-	dev->caps.qp1_tunnel = NULL;
-	dev->caps.qp1_proxy = NULL;
-
+	if (err)
+		mlx4_slave_destroy_special_qp_cap(dev);
+free_mem:
+	kfree(hca_param);
+	kfree(func_cap);
+	kfree(dev_cap);
 	return err;
 }
 
@@ -2381,13 +2426,8 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 	unmap_internal_clock(dev);
 	unmap_bf_area(dev);
 
-	if (mlx4_is_slave(dev)) {
-		kfree(dev->caps.qp0_qkey);
-		kfree(dev->caps.qp0_tunnel);
-		kfree(dev->caps.qp0_proxy);
-		kfree(dev->caps.qp1_tunnel);
-		kfree(dev->caps.qp1_proxy);
-	}
+	if (mlx4_is_slave(dev))
+		mlx4_slave_destroy_special_qp_cap(dev);
 
 err_close:
 	if (mlx4_is_slave(dev))
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 07/10] net/mlx4_core: Add resource alloc/dealloc debugging
From: Tariq Toukan @ 2016-11-27 15:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Matan Barak, Tariq Toukan
In-Reply-To: <1480261877-19720-1-git-send-email-tariqt@mellanox.com>

From: Matan Barak <matanb@mellanox.com>

In order to aid debugging of functions that take a resource but
don't put it, add the last function name that successfully grabbed
this resource.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  | 49 ++++++++++++++++++++--
 1 file changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 05ba270c4862..f59efe59ce58 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -77,6 +77,7 @@ struct res_common {
 	int			from_state;
 	int			to_state;
 	int			removing;
+	const char		*func_name;
 };
 
 enum {
@@ -837,6 +838,36 @@ static int mpt_mask(struct mlx4_dev *dev)
 	return dev->caps.num_mpts - 1;
 }
 
+static const char *mlx4_resource_type_to_str(enum mlx4_resource t)
+{
+	switch (t) {
+	case RES_QP:
+		return "QP";
+	case RES_CQ:
+		return "CQ";
+	case RES_SRQ:
+		return "SRQ";
+	case RES_XRCD:
+		return "XRCD";
+	case RES_MPT:
+		return "MPT";
+	case RES_MTT:
+		return "MTT";
+	case RES_MAC:
+		return "MAC";
+	case RES_VLAN:
+		return "VLAN";
+	case RES_COUNTER:
+		return "COUNTER";
+	case RES_FS_RULE:
+		return "FS_RULE";
+	case RES_EQ:
+		return "EQ";
+	default:
+		return "INVALID RESOURCE";
+	}
+}
+
 static void *find_res(struct mlx4_dev *dev, u64 res_id,
 		      enum mlx4_resource type)
 {
@@ -846,9 +877,9 @@ static void *find_res(struct mlx4_dev *dev, u64 res_id,
 				  res_id);
 }
 
-static int get_res(struct mlx4_dev *dev, int slave, u64 res_id,
-		   enum mlx4_resource type,
-		   void *res)
+static int _get_res(struct mlx4_dev *dev, int slave, u64 res_id,
+		    enum mlx4_resource type,
+		    void *res, const char *func_name)
 {
 	struct res_common *r;
 	int err = 0;
@@ -861,6 +892,10 @@ static int get_res(struct mlx4_dev *dev, int slave, u64 res_id,
 	}
 
 	if (r->state == RES_ANY_BUSY) {
+		mlx4_warn(dev,
+			  "%s(%d) trying to get resource %llx of type %s, but it's already taken by %s\n",
+			  func_name, slave, res_id, mlx4_resource_type_to_str(type),
+			  r->func_name);
 		err = -EBUSY;
 		goto exit;
 	}
@@ -872,6 +907,7 @@ static int get_res(struct mlx4_dev *dev, int slave, u64 res_id,
 
 	r->from_state = r->state;
 	r->state = RES_ANY_BUSY;
+	r->func_name = func_name;
 
 	if (res)
 		*((struct res_common **)res) = r;
@@ -881,6 +917,9 @@ static int get_res(struct mlx4_dev *dev, int slave, u64 res_id,
 	return err;
 }
 
+#define get_res(dev, slave, res_id, type, res) \
+	_get_res((dev), (slave), (res_id), (type), (res), __func__)
+
 int mlx4_get_slave_from_resource_id(struct mlx4_dev *dev,
 				    enum mlx4_resource type,
 				    u64 res_id, int *slave)
@@ -911,8 +950,10 @@ static void put_res(struct mlx4_dev *dev, int slave, u64 res_id,
 
 	spin_lock_irq(mlx4_tlock(dev));
 	r = find_res(dev, res_id, type);
-	if (r)
+	if (r) {
 		r->state = r->from_state;
+		r->func_name = "";
+	}
 	spin_unlock_irq(mlx4_tlock(dev));
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 01/10] net/mlx4_core: Make each VF manage its own mac table
From: Tariq Toukan @ 2016-11-27 15:51 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Erez Shitrit, Gal Pressman, Tariq Toukan
In-Reply-To: <1480261877-19720-1-git-send-email-tariqt@mellanox.com>

From: Erez Shitrit <erezsh@mellanox.com>

Each VF can catch up to the max number of MACs in the
MAC table (128) on the base of "first asks first gets".
The VF should know the total free number of MACs from
the PF.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/fw.c            | 13 ++++
 drivers/net/ethernet/mellanox/mlx4/main.c          |  4 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |  1 +
 drivers/net/ethernet/mellanox/mlx4/port.c          | 90 +++++++++++++++++++++-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  6 +-
 include/linux/mlx4/device.h                        |  1 +
 6 files changed, 110 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 84bab9f0732e..b03b473a7b07 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -1415,6 +1415,9 @@ int mlx4_QUERY_PORT_wrapper(struct mlx4_dev *dev, int slave,
 			   MLX4_CMD_NATIVE);
 
 	if (!err && dev->caps.function != slave) {
+		u8 field;
+		u8 vlan;
+
 		def_mac = priv->mfunc.master.vf_oper[slave].vport[vhcr->in_modifier].state.mac;
 		MLX4_PUT(outbox->buf, def_mac, QUERY_PORT_MAC_OFFSET);
 
@@ -1455,6 +1458,16 @@ int mlx4_QUERY_PORT_wrapper(struct mlx4_dev *dev, int slave,
 		short_field = dev->caps.pkey_table_len[vhcr->in_modifier];
 		MLX4_PUT(outbox->buf, short_field,
 			 QUERY_PORT_CUR_MAX_PKEY_OFFSET);
+
+		/* Change the mac table size for the VF */
+		MLX4_GET(field, outbox, QUERY_PORT_MAX_MACVLAN_OFFSET);
+		/* keep the origin vlan of the VF */
+		vlan = field >> 4;
+		/* set the field with the prev vlan and the mac defined quota */
+		field = vlan << 4;
+		field |= ilog2(mlx4_get_port_free_macs(dev,
+						       priv->port->port + 1));
+		MLX4_PUT(outbox->buf, field, QUERY_PORT_MAX_MACVLAN_OFFSET);
 	}
 out:
 	return err;
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 6f4e67bc3538..7cd1fb566f5a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2937,12 +2937,14 @@ static int mlx4_init_port_info(struct mlx4_dev *dev, int port)
 	info->dev = dev;
 	info->port = port;
 	if (!mlx4_is_slave(dev)) {
-		mlx4_init_mac_table(dev, &info->mac_table);
 		mlx4_init_vlan_table(dev, &info->vlan_table);
 		mlx4_init_roce_gid_table(dev, &info->gid_table);
 		info->base_qpn = mlx4_get_base_qpn(dev, port);
 	}
 
+	/* let the vf manage its own mac table state */
+	mlx4_init_mac_table(dev, &info->mac_table);
+
 	sprintf(info->dev_name, "mlx4_port%d", port);
 	info->port_attr.attr.name = info->dev_name;
 	if (mlx4_is_mfunc(dev))
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index 88ee7d8a5923..d953d6eb7d9e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -741,6 +741,7 @@ struct mlx4_catas_err {
 
 #define MLX4_MAX_MAC_NUM	128
 #define MLX4_MAC_TABLE_SIZE	(MLX4_MAX_MAC_NUM << 3)
+#define MLX4_VF_MAC_QUOTA	2
 
 struct mlx4_mac_table {
 	__be64			entries[MLX4_MAX_MAC_NUM];
diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index b656dd5772e5..86cb58690845 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -39,6 +39,7 @@
 
 #include "mlx4.h"
 #include "mlx4_stats.h"
+#include "fw.h"
 
 #define MLX4_MAC_VALID		(1ull << 63)
 
@@ -54,6 +55,34 @@
 #define MLX4_IGNORE_FCS_MASK			0x1
 #define MLX4_TC_MAX_NUMBER			8
 
+static void mlx4_inc_port_macs(struct mlx4_dev *mdev, int port)
+{
+	struct mlx4_port_info *info = &mlx4_priv(mdev)->port[port];
+
+	mutex_lock(&info->mac_table.mutex);
+	info->mac_table.total++;
+	mutex_unlock(&info->mac_table.mutex);
+	mlx4_info(mdev, "%s added mac for port: %d, now: %d\n",
+		  __func__, port, info->mac_table.total);
+}
+
+static void mlx4_dec_port_macs(struct mlx4_dev *mdev, int port)
+{
+	struct mlx4_port_info *info = &mlx4_priv(mdev)->port[port];
+
+	if (!info->mac_table.total) {
+		mlx4_warn(mdev, "No current macs for port: %d\n", port);
+		return;
+	}
+
+	mutex_lock(&info->mac_table.mutex);
+	info->mac_table.total--;
+	mutex_unlock(&info->mac_table.mutex);
+
+	mlx4_info(mdev, "%s removed mac, port: %d, now: %d\n",
+		  __func__, port, info->mac_table.total);
+}
+
 void mlx4_init_mac_table(struct mlx4_dev *dev, struct mlx4_mac_table *table)
 {
 	int i;
@@ -340,6 +369,8 @@ int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac)
 	int err = -EINVAL;
 
 	if (mlx4_is_mfunc(dev)) {
+		u32 p_l;
+
 		if (!(dev->flags & MLX4_FLAG_OLD_REG_MAC)) {
 			err = mlx4_cmd_imm(dev, mac, &out_param,
 					   ((u32) port) << 8 | (u32) RES_MAC,
@@ -358,8 +389,13 @@ int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac)
 		if (err)
 			return err;
 
-		return get_param_l(&out_param);
+		p_l = get_param_l(&out_param);
+		/* update vf table, the master updated via __register_mac */
+		if (p_l && mlx4_is_slave(dev))
+			mlx4_inc_port_macs(dev, port);
+		return p_l;
 	}
+
 	return __mlx4_register_mac(dev, port, mac);
 }
 EXPORT_SYMBOL_GPL(mlx4_register_mac);
@@ -459,6 +495,11 @@ void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, u64 mac)
 					    RES_OP_RESERVE_AND_MAP, MLX4_CMD_FREE_RES,
 					    MLX4_CMD_TIME_CLASS_A, MLX4_CMD_WRAPPED);
 		}
+
+		/* update vf mac table */
+		if (mlx4_is_slave(dev))
+			mlx4_dec_port_macs(dev, port);
+
 		return;
 	}
 	__mlx4_unregister_mac(dev, port, mac);
@@ -2016,3 +2057,50 @@ int mlx4_max_tc(struct mlx4_dev *dev)
 	return num_tc;
 }
 EXPORT_SYMBOL(mlx4_max_tc);
+
+static int mlx4_get_port_reserved_mac_num(struct mlx4_dev *mdev, int port)
+{
+	struct mlx4_priv *priv = mlx4_priv(mdev);
+	struct resource_allocator *res_alloc;
+	int reserved;
+
+	if (mlx4_is_slave(mdev))
+		return 0;
+
+	res_alloc = &priv->mfunc.master.res_tracker.res_alloc[RES_MAC];
+
+	reserved = (port > 0) ? res_alloc->res_port_rsvd[port - 1] :
+		res_alloc->res_reserved;
+
+	return reserved;
+}
+
+static int mlx4_get_port_max_macs(struct mlx4_dev *mdev, int port)
+{
+	struct mlx4_port_info *info = &mlx4_priv(mdev)->port[port];
+
+	/* The maximum value should considers the reserved macs for the vfs */
+	return info->mac_table.max - mlx4_get_port_reserved_mac_num(mdev, port);
+}
+
+static int mlx4_get_port_total_macs(struct mlx4_dev *mdev, int port)
+{
+	struct mlx4_port_info *info = &mlx4_priv(mdev)->port[port];
+
+	return info->mac_table.total;
+}
+
+int mlx4_get_port_free_macs(struct mlx4_dev *mdev, int port)
+{
+	/* slave will get the free macs (log2) from its master */
+	if (mlx4_is_slave(mdev)) {
+		struct mlx4_port_cap port_cap;
+
+		mlx4_QUERY_PORT(mdev, port, &port_cap);
+		return (1 << port_cap.log_max_macs);
+	}
+
+	return (mlx4_get_port_max_macs(mdev, port) -
+		mlx4_get_port_total_macs(mdev, port));
+}
+EXPORT_SYMBOL(mlx4_get_port_free_macs);
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index c548beaaf910..ba7b70630d5d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -576,14 +576,14 @@ int mlx4_init_resource_tracker(struct mlx4_dev *dev)
 					}
 					res_alloc->quota[t] =
 						MLX4_MAX_MAC_NUM -
-						2 * max_vfs_pport;
-					res_alloc->guaranteed[t] = 2;
+						MLX4_VF_MAC_QUOTA * max_vfs_pport;
+					res_alloc->guaranteed[t] = MLX4_VF_MAC_QUOTA;
 					for (j = 0; j < MLX4_MAX_PORTS; j++)
 						res_alloc->res_port_free[j] =
 							MLX4_MAX_MAC_NUM;
 				} else {
 					res_alloc->quota[t] = MLX4_MAX_MAC_NUM;
-					res_alloc->guaranteed[t] = 2;
+					res_alloc->guaranteed[t] = MLX4_VF_MAC_QUOTA;
 				}
 				break;
 			case RES_VLAN:
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 3be7abd6e722..4220fe8fe094 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1493,6 +1493,7 @@ struct mlx4_slaves_pport mlx4_phys_to_slaves_pport_actv(
 int mlx4_get_base_gid_ix(struct mlx4_dev *dev, int slave, int port);
 
 int mlx4_config_vxlan_port(struct mlx4_dev *dev, __be16 udp_port);
+int mlx4_get_port_free_macs(struct mlx4_dev *mdev, int port);
 int mlx4_disable_rx_port_check(struct mlx4_dev *dev, bool dis);
 int mlx4_config_roce_v2_port(struct mlx4_dev *dev, u16 udp_port);
 int mlx4_virt2phy_port_map(struct mlx4_dev *dev, u32 port1, u32 port2);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 04/10] net/mlx4_core: Device revision support
From: Tariq Toukan @ 2016-11-27 15:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Yishai Hadas, Tariq Toukan
In-Reply-To: <1480261877-19720-1-git-send-email-tariqt@mellanox.com>

From: Yishai Hadas <yishaih@mellanox.com>

The device revision field returned by the NodeInfo MAD is incorrect
on ConnectX3 devices.

This patch is driver side handling to complete a FW fix added at 2.11.1172.
INIT_HCA - bit at offset 0x0C.12 is set to 1 so that FW will report
correct device revision.

Older FW versions won't be affected from turning on that bit,
no capability bit is needed.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/fw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index c2ba16e23169..53031cabf138 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -1888,7 +1888,7 @@ int mlx4_INIT_HCA(struct mlx4_dev *dev, struct mlx4_init_hca_param *param)
 	*((u8 *) mailbox->buf + INIT_HCA_VERSION_OFFSET) = INIT_HCA_VERSION;

 	*((u8 *) mailbox->buf + INIT_HCA_CACHELINE_SZ_OFFSET) =
-		(ilog2(cache_line_size()) - 4) << 5;
+		((ilog2(cache_line_size()) - 4) << 5) | (1 << 4);

 #if defined(__LITTLE_ENDIAN)
 	*(inbox + INIT_HCA_FLAGS_OFFSET / 4) &= ~cpu_to_be32(1 << 1);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 02/10] net/mlx4: Replace ENOSYS with better fitting error codes
From: Tariq Toukan @ 2016-11-27 15:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Tariq Toukan
In-Reply-To: <1480261877-19720-1-git-send-email-tariqt@mellanox.com>

Conform the following warning:
WARNING: ENOSYS means 'invalid syscall nr' and nothing else.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c       | 2 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c               | 2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c             | 6 +++---
 drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 487a58f9c192..7e6d0425394d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -1983,7 +1983,7 @@ static int mlx4_en_get_module_info(struct net_device *dev,
 		modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
 		break;
 	default:
-		return -ENOSYS;
+		return -EINVAL;
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index b03b473a7b07..c2ba16e23169 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -672,7 +672,7 @@ int mlx4_QUERY_FUNC_CAP(struct mlx4_dev *dev, u8 gen_or_port,
 	MLX4_GET(field, outbox, QUERY_FUNC_CAP_PHYS_PORT_OFFSET);
 	func_cap->physical_port = field;
 	if (func_cap->physical_port != gen_or_port) {
-		err = -ENOSYS;
+		err = -EINVAL;
 		goto out;
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 7cd1fb566f5a..41cd05f37b20 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -820,7 +820,7 @@ static int mlx4_slave_cap(struct mlx4_dev *dev)
 	 */
 	if (hca_param.global_caps) {
 		mlx4_err(dev, "Unknown hca global capabilities\n");
-		return -ENOSYS;
+		return -EINVAL;
 	}
 
 	mlx4_log_num_mgm_entry_size = hca_param.log_mc_entry_sz;
@@ -878,7 +878,7 @@ static int mlx4_slave_cap(struct mlx4_dev *dev)
 	    PF_CONTEXT_BEHAVIOUR_MASK) {
 		mlx4_err(dev, "Unknown pf context behaviour %x known flags %x\n",
 			 func_cap.pf_context_behaviour, PF_CONTEXT_BEHAVIOUR_MASK);
-		return -ENOSYS;
+		return -EINVAL;
 	}
 
 	dev->caps.num_ports		= func_cap.num_ports;
@@ -3476,7 +3476,7 @@ static int mlx4_load_one(struct pci_dev *pdev, int pci_dev_data,
 	mlx4_enable_msi_x(dev);
 	if ((mlx4_is_mfunc(dev)) &&
 	    !(dev->flags & MLX4_FLAG_MSI_X)) {
-		err = -ENOSYS;
+		err = -ENOTSUPP;
 		mlx4_err(dev, "INTx is not supported in multi-function mode, aborting\n");
 		goto err_free_eq;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index ba7b70630d5d..05ba270c4862 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -1396,7 +1396,7 @@ static int remove_ok(struct res_common *res, enum mlx4_resource type, int extra)
 	case RES_MTT:
 		return remove_mtt_ok((struct res_mtt *)res, extra);
 	case RES_MAC:
-		return -ENOSYS;
+		return -ENOTSUPP;
 	case RES_EQ:
 		return remove_eq_ok((struct res_eq *)res);
 	case RES_COUNTER:
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net-next 09/11] qede: Better utilize the qede_[rt]x_queue
From: kbuild test robot @ 2016-11-27 15:51 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: kbuild-all, davem, netdev, Yuval Mintz
In-Reply-To: <1480258273-24973-10-git-send-email-Yuval.Mintz@cavium.com>

[-- Attachment #1: Type: text/plain, Size: 2946 bytes --]

Hi Yuval,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Yuval-Mintz/qed-Add-XDP-support/20161127-225956
config: tile-allmodconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=tile 

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/qlogic/qede/qede_main.c: In function 'qede_alloc_mem_rxq':
>> drivers/net/ethernet/qlogic/qede/qede_main.c:2960:3: warning: large integer implicitly truncated to unsigned type [-Woverflow]

vim +2960 drivers/net/ethernet/qlogic/qede/qede_main.c

55482edc Manish Chopra 2016-03-04  2944  err:
55482edc Manish Chopra 2016-03-04  2945  	qede_free_sge_mem(edev, rxq);
55482edc Manish Chopra 2016-03-04  2946  	edev->gro_disable = 1;
55482edc Manish Chopra 2016-03-04  2947  	return -ENOMEM;
55482edc Manish Chopra 2016-03-04  2948  }
55482edc Manish Chopra 2016-03-04  2949  
2950219d Yuval Mintz   2015-10-26  2950  /* This function allocates all memory needed per Rx queue */
1a635e48 Yuval Mintz   2016-08-15  2951  static int qede_alloc_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq)
2950219d Yuval Mintz   2015-10-26  2952  {
f86af2df Manish Chopra 2016-04-20  2953  	int i, rc, size;
2950219d Yuval Mintz   2015-10-26  2954  
2950219d Yuval Mintz   2015-10-26  2955  	rxq->num_rx_buffers = edev->q_num_rx_buffers;
2950219d Yuval Mintz   2015-10-26  2956  
1a635e48 Yuval Mintz   2016-08-15  2957  	rxq->rx_buf_size = NET_IP_ALIGN + ETH_OVERHEAD + edev->ndev->mtu;
1a635e48 Yuval Mintz   2016-08-15  2958  
fc48b7a6 Yuval Mintz   2016-02-15  2959  	if (rxq->rx_buf_size > PAGE_SIZE)
fc48b7a6 Yuval Mintz   2016-02-15 @2960  		rxq->rx_buf_size = PAGE_SIZE;
fc48b7a6 Yuval Mintz   2016-02-15  2961  
fc48b7a6 Yuval Mintz   2016-02-15  2962  	/* Segment size to spilt a page in multiple equal parts */
fc48b7a6 Yuval Mintz   2016-02-15  2963  	rxq->rx_buf_seg_size = roundup_pow_of_two(rxq->rx_buf_size);
2950219d Yuval Mintz   2015-10-26  2964  
2950219d Yuval Mintz   2015-10-26  2965  	/* Allocate the parallel driver ring for Rx buffers */
fc48b7a6 Yuval Mintz   2016-02-15  2966  	size = sizeof(*rxq->sw_rx_ring) * RX_RING_SIZE;
2950219d Yuval Mintz   2015-10-26  2967  	rxq->sw_rx_ring = kzalloc(size, GFP_KERNEL);
2950219d Yuval Mintz   2015-10-26  2968  	if (!rxq->sw_rx_ring) {

:::::: The code at line 2960 was first introduced by commit
:::::: fc48b7a6148af974b49db145812a8b060324a503 qed/qede: use 8.7.3.0 FW.

:::::: TO: Yuval Mintz <Yuval.Mintz@qlogic.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 46485 bytes --]

^ permalink raw reply

* [PATCH net] net/sched: act_pedit: limit negative offset
From: Amir Vadai @ 2016-11-27 15:58 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jamal Hadi Salim, Or Gerlitz, Hadar Har-Zion, Jiri Pirko,
	Amir Vadai

Should not allow setting a negative offset that goes below the skb head.

Signed-off-by: Amir Vadai <amir@vadai.me>
---
Hi Dave,

Please pull to -stable branches.

Thanks,
Amir

 net/sched/act_pedit.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index b54d56d4959b..e79e8a88f2d2 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -154,8 +154,11 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 			}
 
 			ptr = skb_header_pointer(skb, off + offset, 4, &_data);
-			if (!ptr)
+			if ((unsigned char *)ptr < skb->head) {
+				pr_info("tc filter pedit offset out of bounds\n");
 				goto bad;
+			}
+
 			/* just do it, baby */
 			*ptr = ((*ptr & tkey->mask) ^ tkey->val);
 			if (ptr == &_data)
-- 
2.10.2

^ permalink raw reply related

* Re: [PATCH net-next 4/5] net/socket: add helpers for recvmmsg
From: Paolo Abeni @ 2016-11-27 16:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David S. Miller, Eric Dumazet, Jesper Dangaard Brouer,
	Hannes Frederic Sowa, Sabrina Dubroca
In-Reply-To: <1480113022.8455.580.camel@edumazet-glaptop3.roam.corp.google.com>

Hi Eric,

On Fri, 2016-11-25 at 14:30 -0800, Eric Dumazet wrote:
> On Fri, 2016-11-25 at 16:39 +0100, Paolo Abeni wrote:
> > _skb_try_recv_datagram_batch dequeues multiple skb's from the
> > socket's receive queue, and runs the bulk_destructor callback under
> > the receive queue lock.
> 
> ...
> 
> > +	last = (struct sk_buff *)queue;
> > +	first = (struct sk_buff *)queue->next;
> > +	skb_queue_walk(queue, skb) {
> > +		last = skb;
> > +		totalsize += skb->truesize;
> > +		if (++datagrams == batch)
> > +			break;
> > +	}
> 
> This is absolutely not good.
> 
> Walking through a list, bringing 2 cache lines per skb, is not the
> proper way to deal with bulking.
> 
> And I do not see where 'batch' value coming from user space is capped ?
> 
> Is it really vlen argument coming from recvmmsg() system call ???
> 
> This code runs with BH masked, so you do not want to give user a way to
> make you loop there 1000 times 
> 
> Bulking is nice, only if you do not compromise with system stability and
> latency requirements from other users/applications.

Thank you for reviewing this.

You are right, the cacheline miss while accessing skb->truesize has
measurable performance impact, and the max burst length comes in from
recvmmsg(). 

We can easily cap the burst to some max value (e.g. 64 or less) and we
can pass to the bulk destructor the skb list and burst length without
accessing skb truesize beforehand. If the burst is short, say 8 skbs or
less, the bulk destructor walk again the list and release the memory,
elsewhere it defers the release after __skb_try_recv_datagram_batch()
completion: we walk the list without the lock held and we acquire it
later again to release all the memory. 

Thank you again,

Paolo

^ permalink raw reply

* Re: Large performance regression with 6in4 tunnel (sit)
From: Eli Cooper @ 2016-11-27 16:22 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: netdev
In-Reply-To: <20161127130229.4c88cc2c@canb.auug.org.au>

Hi Stephen,


On 2016/11/27 10:02, Stephen Rothwell wrote:
> Hi Eli,
>
> On Sun, 27 Nov 2016 11:54:41 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>> On Fri, 25 Nov 2016 14:05:04 +0800 Eli Cooper <elicooper@gmx.com> wrote:
>>> I think this is similar to the bug I fixed in commit ae148b085876
>>> ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").
>>>
>>> I can reproduce a similar problem by applying xfrm to sit traffic.
>>> TSO/GSO packets are dropped when IPSec is enabled, and IPv6 throughput
>>> drops to 10s of Kbps. I am not sure if this is the same issue you
>>> experienced, but I wrote a patch that fixed at least the issue I had.
>>>
>>> Could you test the patch I sent to the mailing list just now?  
>> Thanks for the patch!
>>
>> Its a bit tricky to test since the problem only occurs in a production
>> machine (I tried reproducing in a VM, but the problem did not occur),
That's probably because the ethernet NIC in your VM does not support
segmentation offloading. You could, however, try reproducing it on
another (real) machine with the same driver.
>> but I will try to just rebuild the sit module and see if I can insert
>> the modified one.
> OK, I tried your patch and unfortunately, it doesn't seem to have
> worked ... I still get the large packets dropped and resent smaller.
>
It's a shame ... In my case, large packets are dropped only when xfrm is
in effect (therefore another output path is taken), and probably that's
not your case. Well, on the plus side, at least you reminded me that sit
device also needs to update skb's protocol.

Thanks,
Eli

^ permalink raw reply

* Re: Crash due to mutex genl_lock called from RCU context
From: Eric Dumazet @ 2016-11-27 16:23 UTC (permalink / raw)
  To: Cong Wang
  Cc: Subash Abhinov Kasiviswanathan, Thomas Graf,
	Linux Kernel Network Developers, Herbert Xu
In-Reply-To: <CAM_iQpUnz2kkjOFuk3fKKEYDh54b9WHk1dvH098mGtrGTPjZFQ@mail.gmail.com>

On Sat, 2016-11-26 at 22:28 -0800, Cong Wang wrote:
> On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Are you telling me inet_release() is called when we close() the first
> > file descriptor ?
> >
> > fd1 = socket()
> > fd2 = dup(fd1);
> > close(fd2) -> release() ???
> 
> Sorry, I didn't express myself clearly, I meant your change,
> if exclude the SOCK_RCU_FREE part, basically reverts this commit:
> 
> commit 3f660d66dfbc13ea4b61d3865851b348444c24b4
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Thu May 3 03:17:14 2007 -0700
> 
>     [NETLINK]: Kill CB only when socket is unused
> 
> IOW, ->release() is called when the last sock fd ref is gone, but ->destructor()
> is called with the last sock ref is gone. They are very different.

Hmm...


> I am confused, what Subash reported is a kernel warning which can
> surely be fixed by removing genl lock (if it is correct, I need to double
> check), so why for net-next?

Because Subash pointed to a buggy commit.

We want to fix all issues bring by this commit, not only the immediate
problem about mutex.

I have no idea if we can safely remove the mutex from genl_lock_done() :

The genl_lock() is not only protecting the socket itself, it might
protect global data as well, or protect some kind of lock ordering among
multiple mutexes.

Have you checked all genl users, down to linux-4.0 , point where commit
21e4902aea80ef35a was added ?

Herbert, Thomas, your help would be appreciated, thanks.

^ permalink raw reply

* [PATCH] wireless: ath: ath9k: constify ath_bus_ops structure
From: Bhumika Goyal @ 2016-11-27 17:03 UTC (permalink / raw)
  To: julia.lawall, ath9k-devel, kvalo, linux-wireless, ath9k-devel,
	netdev, linux-kernel
  Cc: Bhumika Goyal

Declare the structure ath_bus_ops as const as it is only passed as an
argument to the function ath9k_init_device. This argument is of type
const struct ath_bus_ops *, so ath_bus_ops structures with this property
can be declared as const.
Done using Coccinelle:
@r1 disable optional_qualifier @
identifier i;
position p;
@@
static struct ath_bus_ops i@p = {...};

@ok1@
identifier r1.i;
position p;
expression e1,e2;
@@
ath9k_init_device(e1,e2,&i@p)

@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
static
+const
struct ath_bus_ops i={...};

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
+const
struct ath_bus_ops i;

File size before:
   text	   data	    bss	    dec	    hex	filename
   1295	    232	      0	   1527	    5f7	ath/ath9k/ahb.o

File size after:
   text	   data	    bss	    dec	    hex	filename
   1359	    176	      0	   1535	    5ff	ath/ath9k/ahb.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
---
 drivers/net/wireless/ath/ath9k/ahb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/ahb.c b/drivers/net/wireless/ath/ath9k/ahb.c
index bea6186..2bd982c 100644
--- a/drivers/net/wireless/ath/ath9k/ahb.c
+++ b/drivers/net/wireless/ath/ath9k/ahb.c
@@ -62,7 +62,7 @@ static bool ath_ahb_eeprom_read(struct ath_common *common, u32 off, u16 *data)
 	return false;
 }
 
-static struct ath_bus_ops ath_ahb_bus_ops  = {
+static const struct ath_bus_ops ath_ahb_bus_ops  = {
 	.ath_bus_type = ATH_AHB,
 	.read_cachesize = ath_ahb_read_cachesize,
 	.eeprom_read = ath_ahb_eeprom_read,
-- 
1.9.1

^ permalink raw reply related

* [PATCH net 1/2] Revert "net/mlx4_en: Avoid unregister_netdev at shutdown flow"
From: Tariq Toukan @ 2016-11-27 17:20 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Sebastian Ott, Steve Wise, Tariq Toukan,
	Jiri Pirko
In-Reply-To: <1480267252-26146-1-git-send-email-tariqt@mellanox.com>

This reverts commit 9d76931180557270796f9631e2c79b9c7bb3c9fb.

Using unregister_netdev at shutdown flow prevents calling
the netdev's ndos or trying to access its freed resources.

This fixes crashes like the following:
 Call Trace:
  [<ffffffff81587a6e>] dev_get_phys_port_id+0x1e/0x30
  [<ffffffff815a36ce>] rtnl_fill_ifinfo+0x4be/0xff0
  [<ffffffff815a53f3>] rtmsg_ifinfo_build_skb+0x73/0xe0
  [<ffffffff815a5476>] rtmsg_ifinfo.part.27+0x16/0x50
  [<ffffffff815a54c8>] rtmsg_ifinfo+0x18/0x20
  [<ffffffff8158a6c6>] netdev_state_change+0x46/0x50
  [<ffffffff815a5e78>] linkwatch_do_dev+0x38/0x50
  [<ffffffff815a6165>] __linkwatch_run_queue+0xf5/0x170
  [<ffffffff815a6205>] linkwatch_event+0x25/0x30
  [<ffffffff81099a82>] process_one_work+0x152/0x400
  [<ffffffff8109a325>] worker_thread+0x125/0x4b0
  [<ffffffff8109a200>] ? rescuer_thread+0x350/0x350
  [<ffffffff8109fc6a>] kthread+0xca/0xe0
  [<ffffffff8109fba0>] ? kthread_park+0x60/0x60
  [<ffffffff816a1285>] ret_from_fork+0x25/0x30

Fixes: 9d7693118055 ("net/mlx4_en: Avoid unregister_netdev at shutdown flow")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Cc: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 17 ++---------------
 drivers/net/ethernet/mellanox/mlx4/main.c      |  5 +----
 include/linux/mlx4/device.h                    |  1 -
 3 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index a60f635da78b..fb8bb027b69c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2079,13 +2079,6 @@ static int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)
 	return -ENOMEM;
 }
 
-static void mlx4_en_shutdown(struct net_device *dev)
-{
-	rtnl_lock();
-	netif_device_detach(dev);
-	mlx4_en_close(dev);
-	rtnl_unlock();
-}
 
 static int mlx4_en_copy_priv(struct mlx4_en_priv *dst,
 			     struct mlx4_en_priv *src,
@@ -2162,8 +2155,6 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_en_dev *mdev = priv->mdev;
-	bool shutdown = mdev->dev->persist->interface_state &
-					    MLX4_INTERFACE_STATE_SHUTDOWN;
 
 	en_dbg(DRV, priv, "Destroying netdev on port:%d\n", priv->port);
 
@@ -2171,10 +2162,7 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
 	if (priv->registered) {
 		devlink_port_type_clear(mlx4_get_devlink_port(mdev->dev,
 							      priv->port));
-		if (shutdown)
-			mlx4_en_shutdown(dev);
-		else
-			unregister_netdev(dev);
+		unregister_netdev(dev);
 	}
 
 	if (priv->allocated)
@@ -2203,8 +2191,7 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
 	kfree(priv->tx_ring);
 	kfree(priv->tx_cq);
 
-	if (!shutdown)
-		free_netdev(dev);
+	free_netdev(dev);
 }
 
 static int mlx4_en_change_mtu(struct net_device *dev, int new_mtu)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 6f4e67bc3538..75d07fa9d0b1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -4147,11 +4147,8 @@ static void mlx4_shutdown(struct pci_dev *pdev)
 
 	mlx4_info(persist->dev, "mlx4_shutdown was called\n");
 	mutex_lock(&persist->interface_state_mutex);
-	if (persist->interface_state & MLX4_INTERFACE_STATE_UP) {
-		/* Notify mlx4 clients that the kernel is being shut down */
-		persist->interface_state |= MLX4_INTERFACE_STATE_SHUTDOWN;
+	if (persist->interface_state & MLX4_INTERFACE_STATE_UP)
 		mlx4_unload_one(pdev);
-	}
 	mutex_unlock(&persist->interface_state_mutex);
 }
 
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 3be7abd6e722..c9f379689dd0 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -476,7 +476,6 @@ enum {
 enum {
 	MLX4_INTERFACE_STATE_UP		= 1 << 0,
 	MLX4_INTERFACE_STATE_DELETION	= 1 << 1,
-	MLX4_INTERFACE_STATE_SHUTDOWN	= 1 << 2,
 };
 
 #define MSTR_SM_CHANGE_MASK (MLX4_EQ_PORT_INFO_MSTR_SM_SL_CHANGE_MASK | \
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 0/2] mlx4 bug fixes for 4.9
From: Tariq Toukan @ 2016-11-27 17:20 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Sebastian Ott, Steve Wise, Tariq Toukan

Hi Dave,

This patchset includes 2 bug fixes:
* In patch 1 we revert the commit that avoids invoking unregister_netdev
in shutdown flow, as it introduces netdev presence issues where
it can be accessed unsafely by ndo operations during the flow.
* Patch 2 is a simple fix for a variable uninitialization issue.

Series generated against net commit:
6998cc6ec237 tipc: resolve connection flow control compatibility problem

Thanks,
Tariq.

Jack Morgenstein (1):
  net/mlx4: Fix uninitialized fields in rule when adding promiscuous
    mode to device managed flow steering

Tariq Toukan (1):
  Revert "net/mlx4_en: Avoid unregister_netdev at shutdown flow"

 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 17 ++---------------
 drivers/net/ethernet/mellanox/mlx4/main.c      |  5 +----
 drivers/net/ethernet/mellanox/mlx4/mcg.c       |  7 ++++++-
 include/linux/mlx4/device.h                    |  1 -
 4 files changed, 9 insertions(+), 21 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net 2/2] net/mlx4: Fix uninitialized fields in rule when adding promiscuous mode to device managed flow steering
From: Tariq Toukan @ 2016-11-27 17:20 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Sebastian Ott, Steve Wise,
	Jack Morgenstein, Tariq Toukan
In-Reply-To: <1480267252-26146-1-git-send-email-tariqt@mellanox.com>

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

In procedure mlx4_flow_steer_promisc_add(), several fields
were left uninitialized in the rule structure.
Correctly initialize these fields.

Fixes: 592e49dda812 ("net/mlx4: Implement promiscuous mode with device managed flow-steering")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mcg.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index 94b891c118c1..1a670b681555 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -1457,7 +1457,12 @@ int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 int mlx4_flow_steer_promisc_add(struct mlx4_dev *dev, u8 port,
 				u32 qpn, enum mlx4_net_trans_promisc_mode mode)
 {
-	struct mlx4_net_trans_rule rule;
+	struct mlx4_net_trans_rule rule = {
+		.queue_mode = MLX4_NET_TRANS_Q_FIFO,
+		.exclusive = 0,
+		.allow_loopback = 1,
+	};
+
 	u64 *regid_p;
 
 	switch (mode) {
-- 
1.8.3.1

^ permalink raw reply related

* RE: [PATCH net-next 09/11] qede: Better utilize the qede_[rt]x_queue
From: Mintz, Yuval @ 2016-11-27 16:15 UTC (permalink / raw)
  To: kbuild test robot, davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <201611272330.xRUonBtv%fengguang.wu@intel.com>

> Hi Yuval,
> 
> [auto build test WARNING on net-next/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Yuval-Mintz/qed-Add-XDP-
> support/20161127-225956
> config: tile-allmodconfig (attached as .config)
> compiler: tilegx-linux-gcc (GCC) 4.6.2
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-
> tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=tile
> 
> All warnings (new ones prefixed by >>):
> 
>    drivers/net/ethernet/qlogic/qede/qede_main.c: In function
> 'qede_alloc_mem_rxq':
> >> drivers/net/ethernet/qlogic/qede/qede_main.c:2960:3: warning: large
> >> integer implicitly truncated to unsigned type [-Woverflow]
> 
> vim +2960 drivers/net/ethernet/qlogic/qede/qede_main.c
> 
> 55482edc Manish Chopra 2016-03-04  2944  err:
> 55482edc Manish Chopra 2016-03-04  2945  	qede_free_sge_mem(edev,
> rxq);
> 55482edc Manish Chopra 2016-03-04  2946  	edev->gro_disable = 1;
> 55482edc Manish Chopra 2016-03-04  2947  	return -ENOMEM;
> 55482edc Manish Chopra 2016-03-04  2948  } 55482edc Manish Chopra 2016-
> 03-04  2949
> 2950219d Yuval Mintz   2015-10-26  2950  /* This function allocates all
> memory needed per Rx queue */
> 1a635e48 Yuval Mintz   2016-08-15  2951  static int
> qede_alloc_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq)
> 2950219d Yuval Mintz   2015-10-26  2952  {
> f86af2df Manish Chopra 2016-04-20  2953  	int i, rc, size;
> 2950219d Yuval Mintz   2015-10-26  2954
> 2950219d Yuval Mintz   2015-10-26  2955  	rxq->num_rx_buffers = edev-
> >q_num_rx_buffers;
> 2950219d Yuval Mintz   2015-10-26  2956
> 1a635e48 Yuval Mintz   2016-08-15  2957  	rxq->rx_buf_size =
> NET_IP_ALIGN + ETH_OVERHEAD + edev->ndev->mtu;
> 1a635e48 Yuval Mintz   2016-08-15  2958
> fc48b7a6 Yuval Mintz   2016-02-15  2959  	if (rxq->rx_buf_size >
> PAGE_SIZE)
> fc48b7a6 Yuval Mintz   2016-02-15 @2960  		rxq->rx_buf_size =
> PAGE_SIZE;

I'd say this is a false positive, given that MTU can't be so large.
Although patch #10 is going to hit the same when setting rx_buf_seg_size
[also a u16] to PAGE_SIZE to make sure there's a single packet per page.

While I can surely address that, I was just wondering about whether this
is an interesting scenario at the moment. I.e., using XDP with 64 Kb pages
is going to be very costly from a memory perspective.

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver
From: Markus Böhme @ 2016-11-27 17:59 UTC (permalink / raw)
  To: Lino Sanfilippo, davem, charrer, liodot, gregkh, andrew
  Cc: devel, netdev, linux-kernel
In-Reply-To: <1480162850-8014-2-git-send-email-LinoSanfilippo@gmx.de>

Hello Lino,

just some things barely worth mentioning:

On 11/26/2016 01:20 PM, Lino Sanfilippo wrote:
> Add driver for Alacritech gigabit ethernet cards with SLIC (session-layer
> interface control) technology. The driver provides basic support without
> SLIC for the following devices:
> 
> - Mojave cards (single port PCI Gigabit) both copper and fiber
> - Oasis cards (single and dual port PCI-x Gigabit) copper and fiber
> - Kalahari cards (dual and quad port PCI-e Gigabit) copper and fiber
> 
> Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> ---
>  drivers/net/ethernet/Kconfig              |    1 +
>  drivers/net/ethernet/Makefile             |    1 +
>  drivers/net/ethernet/alacritech/Kconfig   |   28 +
>  drivers/net/ethernet/alacritech/Makefile  |    4 +
>  drivers/net/ethernet/alacritech/slic.h    |  576 +++++++++
>  drivers/net/ethernet/alacritech/slicoss.c | 1867 +++++++++++++++++++++++++++++
>  6 files changed, 2477 insertions(+)
>  create mode 100644 drivers/net/ethernet/alacritech/Kconfig
>  create mode 100644 drivers/net/ethernet/alacritech/Makefile
>  create mode 100644 drivers/net/ethernet/alacritech/slic.h
>  create mode 100644 drivers/net/ethernet/alacritech/slicoss.c
> 

[...]

> diff --git a/drivers/net/ethernet/alacritech/slic.h b/drivers/net/ethernet/alacritech/slic.h
> new file mode 100644
> index 0000000..c62d46b
> --- /dev/null
> +++ b/drivers/net/ethernet/alacritech/slic.h
> @@ -0,0 +1,576 @@
> +
> +#ifndef _SLIC_H
> +#define _SLIC_H


I found a bunch of unused #defines in slic.h. I cannot judge if they are
worth keeping:

	SLIC_VRHSTATB_LONGE
	SLIC_VRHSTATB_PREA
	SLIC_ISR_IO
	SLIC_ISR_PING_MASK
	SLIC_GIG_SPEED_MASK
	SLIC_GMCR_RESET
	SLIC_XCR_RESET
	SLIC_XCR_XMTEN
	SLIC_XCR_PAUSEEN
	SLIC_XCR_LOADRNG
	SLIC_REG_DBAR
	SLIC_REG_PING
	SLIC_REG_DUMP_CMD
	SLIC_REG_DUMP_DATA
	SLIC_REG_WRHOSTID
	SLIC_REG_LOW_POWER
	SLIC_REG_RESET_IFACE
	SLIC_REG_ADDR_UPPER
	SLIC_REG_HBAR64
	SLIC_REG_DBAR64
	SLIC_REG_CBAR64
	SLIC_REG_RBAR64
	SLIC_REG_WRVLANID
	SLIC_REG_READ_XF_INFO
	SLIC_REG_WRITE_XF_INFO
	SLIC_REG_TICKS_PER_SEC

These device IDs are not used, either, but maybe it's good to keep them
for documentation purposes:

	PCI_SUBDEVICE_ID_ALACRITECH_1000X1_2
	PCI_SUBDEVICE_ID_ALACRITECH_SES1001T
	PCI_SUBDEVICE_ID_ALACRITECH_SEN2002XT
	PCI_SUBDEVICE_ID_ALACRITECH_SEN2001XT
	PCI_SUBDEVICE_ID_ALACRITECH_SEN2104ET
	PCI_SUBDEVICE_ID_ALACRITECH_SEN2102ET

[...]

> +
> +/* SLIC EEPROM structure for Oasis */
> +struct slic_mojave_eeprom {

Comment: "for Mojave".

[...]

> +struct slic_device {
> +	struct pci_dev *pdev;
> +	struct net_device *netdev;
> +	void __iomem *regs;
> +	/* upper address setting lock */
> +	spinlock_t upper_lock;
> +	struct slic_shmem shmem;
> +	struct napi_struct napi;
> +	struct slic_rx_queue rxq;
> +	struct slic_tx_queue txq;
> +	struct slic_stat_queue stq;
> +	struct slic_stats stats;
> +	struct slic_upr_list upr_list;
> +	/* link configuration lock */
> +	spinlock_t link_lock;
> +	bool promisc;
> +	bool autoneg;
> +	int speed;
> +	int duplex;

Maybe make speed and duplex unsigned? They are assigned and compared
against unsigned values in slicoss.c, so this would get rid of some
(benign, because of the range of the values) -Wsign-compare warnings in
slic_configure_link_locked. However, in a comparison there SPEED_UNKNOWN
would need to be casted to unsigned to prevent another one popping up.

[...]

> +#endif /* _SLIC_H */
> diff --git a/drivers/net/ethernet/alacritech/slicoss.c b/drivers/net/ethernet/alacritech/slicoss.c
> new file mode 100644
> index 0000000..8cd862a
> --- /dev/null
> +++ b/drivers/net/ethernet/alacritech/slicoss.c
> @@ -0,0 +1,1867 @@

[...]

> +
> +static const struct pci_device_id slic_id_tbl[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_ALACRITECH,
> +		     PCI_DEVICE_ID_ALACRITECH_MOAVE) },

I missed this in slic.h, but is this a typo and "MOAVE" should be
"MOJAVE"? There are a couple similar #defines in slic.h.

[...]

> +static void slic_refill_rx_queue(struct slic_device *sdev, gfp_t gfp)
> +{
> +	const unsigned int ALIGN_MASK = SLIC_RX_BUFF_ALIGN - 1;
> +	unsigned int maplen = SLIC_RX_BUFF_SIZE;
> +	struct slic_rx_queue *rxq = &sdev->rxq;
> +	struct net_device *dev = sdev->netdev;
> +	struct slic_rx_buffer *buff;
> +	struct slic_rx_desc *desc;
> +	unsigned int misalign;
> +	unsigned int offset;
> +	struct sk_buff *skb;
> +	dma_addr_t paddr;
> +
> +	while (slic_get_free_rx_descs(rxq) > SLIC_MAX_REQ_RX_DESCS) {
> +		skb = alloc_skb(maplen + ALIGN_MASK, gfp);
> +		if (!skb)
> +			break;
> +
> +		paddr = dma_map_single(&sdev->pdev->dev, skb->data, maplen,
> +				       DMA_FROM_DEVICE);
> +		if (dma_mapping_error(&sdev->pdev->dev, paddr)) {
> +			netdev_err(dev, "mapping rx packet failed\n");
> +			/* drop skb */
> +			dev_kfree_skb_any(skb);
> +			break;
> +		}
> +		/* ensure head buffer descriptors are 256 byte aligned */
> +		offset = 0;
> +		misalign = paddr & ALIGN_MASK;
> +		if (misalign) {
> +			offset = SLIC_RX_BUFF_ALIGN - misalign;
> +			skb_reserve(skb, offset);
> +		}
> +		/* the HW expects dma chunks for descriptor + frame data */
> +		desc = (struct slic_rx_desc *)skb->data;
> +		memset(desc, 0, sizeof(*desc));
> +
> +		buff = &rxq->rxbuffs[rxq->put_idx];
> +		buff->skb = skb;
> +		dma_unmap_addr_set(buff, map_addr, paddr);
> +		dma_unmap_len_set(buff, map_len, maplen);
> +		buff->addr_offset = offset;
> +		/* head buffer descriptors are placed immediately before skb */
> +		slic_write(sdev, SLIC_REG_HBAR, lower_32_bits(paddr) +
> +						offset);

This fits nicely on one line. :-)

[...]

> +static int slic_init_tx_queue(struct slic_device *sdev)
> +{
> +	struct slic_tx_queue *txq = &sdev->txq;
> +	struct slic_tx_buffer *buff;
> +	struct slic_tx_desc *desc;
> +	int err;
> +	int i;

You could make i unsigned...

> +
> +	txq->len = SLIC_NUM_TX_DESCS;
> +	txq->put_idx = 0;
> +	txq->done_idx = 0;
> +
> +	txq->txbuffs = kcalloc(txq->len, sizeof(*buff), GFP_KERNEL);
> +	if (!txq->txbuffs)
> +		return -ENOMEM;
> +
> +	txq->dma_pool = dma_pool_create("slic_pool", &sdev->pdev->dev,
> +					sizeof(*desc), SLIC_TX_DESC_ALIGN,
> +					4096);
> +	if (!txq->dma_pool) {
> +		err = -ENOMEM;
> +		netdev_err(sdev->netdev, "failed to create dma pool\n");
> +		goto free_buffs;
> +	}
> +
> +	for (i = 0; i < txq->len; i++) {

...to fix a signed/unsigned comparison warning here, but...

> +		buff = &txq->txbuffs[i];
> +		desc = dma_pool_zalloc(txq->dma_pool, GFP_KERNEL,
> +				       &buff->desc_paddr);
> +		if (!desc) {
> +			netdev_err(sdev->netdev,
> +				   "failed to alloc pool chunk (%i)\n", i);
> +			err = -ENOMEM;
> +			goto free_descs;
> +		}
> +
> +		desc->hnd = cpu_to_le32((u32)(i + 1));
> +		desc->cmd = SLIC_CMD_XMT_REQ;
> +		desc->flags = 0;
> +		desc->type = cpu_to_le32(SLIC_CMD_TYPE_DUMB);
> +		buff->desc = desc;
> +	}
> +
> +	return 0;
> +
> +free_descs:
> +	while (i--) {

...this would require reworking this logic to prevent an endless loop,
so probably not worth bothering, considering that txq->len is well
within the positive signed range.

> +		buff = &txq->txbuffs[i];
> +		dma_pool_free(txq->dma_pool, buff->desc, buff->desc_paddr);
> +	}
> +	dma_pool_destroy(txq->dma_pool);
> +
> +free_buffs:
> +	kfree(txq->txbuffs);
> +
> +	return err;
> +}
> +
> +static void slic_free_tx_queue(struct slic_device *sdev)
> +{
> +	struct slic_tx_queue *txq = &sdev->txq;
> +	struct slic_tx_buffer *buff;
> +	int i;

Make i unsigned? One warning less, almost no work invested.

> +
> +	for (i = 0; i < txq->len; i++) {
> +		buff = &txq->txbuffs[i];
> +		dma_pool_free(txq->dma_pool, buff->desc, buff->desc_paddr);
> +		if (!buff->skb)
> +			continue;
> +
> +		dma_unmap_single(&sdev->pdev->dev,
> +				 dma_unmap_addr(buff, map_addr),
> +				 dma_unmap_len(buff, map_len), DMA_TO_DEVICE);
> +		consume_skb(buff->skb);
> +	}
> +	dma_pool_destroy(txq->dma_pool);
> +
> +	kfree(txq->txbuffs);
> +}
> +

[...]

> +static void slic_free_rx_queue(struct slic_device *sdev)
> +{
> +	struct slic_rx_queue *rxq = &sdev->rxq;
> +	struct slic_rx_buffer *buff;
> +	int i;

Unsigned?

> +
> +	/* free rx buffers */
> +	for (i = 0; i < rxq->len; i++) {
> +		buff = &rxq->rxbuffs[i];
> +
> +		if (!buff->skb)
> +			continue;
> +
> +		dma_unmap_single(&sdev->pdev->dev,
> +				 dma_unmap_addr(buff, map_addr),
> +				 dma_unmap_len(buff, map_len),
> +				 DMA_FROM_DEVICE);
> +		consume_skb(buff->skb);
> +	}
> +	kfree(rxq->rxbuffs);
> +}

[...]

> +static int slic_load_firmware(struct slic_device *sdev)
> +{
> +	u32 sectstart[SLIC_FIRMWARE_MAX_SECTIONS];
> +	u32 sectsize[SLIC_FIRMWARE_MAX_SECTIONS];
> +	const struct firmware *fw;
> +	unsigned int datalen;
> +	const char *file;
> +	int code_start;
> +	u32 numsects;
> +	int idx = 0;
> +	u32 sect;
> +	u32 instr;
> +	u32 addr;
> +	u32 base;
> +	int err;
> +	int i;

Make i unsigned?

> +
> +	file = (sdev->model == SLIC_MODEL_OASIS) ?  SLIC_FIRMWARE_OASIS :
> +						    SLIC_FIRMWARE_MOAVE;
> +	err = request_firmware(&fw, file, &sdev->pdev->dev);
> +	if (err) {
> +		dev_err(&sdev->pdev->dev, "failed to load firmware %s\n", file);
> +		return err;
> +	}
> +	/* Do an initial sanity check concerning firmware size now. A further
> +	 * check follows below.
> +	 */
> +	if (fw->size < SLIC_FIRMWARE_MIN_SIZE) {
> +		dev_err(&sdev->pdev->dev,
> +			"invalid firmware size %zu (min is %u)\n", fw->size,
> +			SLIC_FIRMWARE_MIN_SIZE);
> +		err = -EINVAL;
> +		goto release;
> +	}
> +
> +	numsects = slic_read_dword_from_firmware(fw, &idx);
> +	if (numsects == 0 || numsects > SLIC_FIRMWARE_MAX_SECTIONS) {
> +		dev_err(&sdev->pdev->dev,
> +			"invalid number of sections in firmware: %u", numsects);
> +		err = -EINVAL;
> +		goto release;
> +	}
> +
> +	datalen = numsects * 8 + 4;
> +	for (i = 0; i < numsects; i++) {
> +		sectsize[i] = slic_read_dword_from_firmware(fw, &idx);
> +		datalen += sectsize[i];
> +	}
> +
> +	/* do another sanity check against firmware size */
> +	if (datalen > fw->size) {
> +		dev_err(&sdev->pdev->dev,
> +			"invalid firmware size %zu (expected >= %u)\n",
> +			fw->size, datalen);
> +		err = -EINVAL;
> +		goto release;
> +	}
> +	/* get sections */
> +	for (i = 0; i < numsects; i++)
> +		sectstart[i] = slic_read_dword_from_firmware(fw, &idx);
> +
> +	code_start = idx;
> +	instr = slic_read_dword_from_firmware(fw, &idx);
> +
> +	for (sect = 0; sect < numsects; sect++) {
> +		unsigned int ssize = sectsize[sect] >> 3;
> +
> +		base = sectstart[sect];
> +
> +		for (addr = 0; addr < ssize; addr++) {
> +			/* write out instruction address */
> +			slic_write(sdev, SLIC_REG_WCS, base + addr);
> +			/* write out instruction to low addr */
> +			slic_write(sdev, SLIC_REG_WCS, instr);
> +			instr = slic_read_dword_from_firmware(fw, &idx);
> +			/* write out instruction to high addr */
> +			slic_write(sdev, SLIC_REG_WCS, instr);
> +			instr = slic_read_dword_from_firmware(fw, &idx);
> +		}
> +	}
> +
> +	idx = code_start;
> +
> +	for (sect = 0; sect < numsects; sect++) {
> +		unsigned int ssize = sectsize[sect] >> 3;
> +
> +		instr = slic_read_dword_from_firmware(fw, &idx);
> +		base = sectstart[sect];
> +		if (base < 0x8000)
> +			continue;
> +
> +		for (addr = 0; addr < ssize; addr++) {
> +			/* write out instruction address */
> +			slic_write(sdev, SLIC_REG_WCS,
> +				   SLIC_WCS_COMPARE | (base + addr));
> +			/* write out instruction to low addr */
> +			slic_write(sdev, SLIC_REG_WCS, instr);
> +			instr = slic_read_dword_from_firmware(fw, &idx);
> +			/* write out instruction to high addr */
> +			slic_write(sdev, SLIC_REG_WCS, instr);
> +			instr = slic_read_dword_from_firmware(fw, &idx);
> +		}
> +	}
> +	slic_flush_write(sdev);
> +	mdelay(10);
> +	/* everything OK, kick off the card */
> +	slic_write(sdev, SLIC_REG_WCS, SLIC_WCS_START);
> +	slic_flush_write(sdev);
> +	/* wait long enough for ucode to init card and reach the mainloop */
> +	mdelay(20);
> +release:
> +	release_firmware(fw);
> +
> +	return err;
> +}

[...]

> +static int slic_init_iface(struct slic_device *sdev)
> +{
> +	struct slic_shmem *sm = &sdev->shmem;
> +	int err;
> +
> +	sdev->upr_list.pending = false;
> +
> +	err = slic_init_shmem(sdev);
> +	if (err) {
> +		netdev_err(sdev->netdev, "failed to load firmware\n");

Wrong error message.

> +		return err;
> +	}

[...]

> +static netdev_tx_t slic_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> +	struct slic_device *sdev = netdev_priv(dev);
> +	struct slic_tx_queue *txq = &sdev->txq;
> +	struct slic_tx_buffer *buff;
> +	struct slic_tx_desc *desc;
> +	dma_addr_t paddr;
> +	u32 cbar_val;
> +	u32 maplen;
> +
> +	if (unlikely(slic_get_free_tx_descs(txq) < SLIC_MAX_REQ_TX_DESCS)) {
> +		netdev_err(dev, "BUG! not enought tx LEs left: %u\n",

"Enough"?

> +			   slic_get_free_tx_descs(txq));
> +		return NETDEV_TX_BUSY;
> +	}

[...]

> +static int slic_read_eeprom(struct slic_device *sdev)
> +{
> +	unsigned int devfn = PCI_FUNC(sdev->pdev->devfn);
> +	struct slic_shmem *sm = &sdev->shmem;
> +	struct slic_shmem_data *sm_data = sm->shmem_data;
> +	const unsigned int MAX_LOOPS = 5000;

Another benign -Wsign-compare warning can be fixed by either dropping
the unsigned here or making i below unsigned, too.

> +	unsigned int codesize;
> +	unsigned char *eeprom;
> +	struct slic_upr *upr;
> +	dma_addr_t paddr;
> +	int err = 0;
> +	u8 *mac[2];
> +	int i = 0;
> +
> +	eeprom = dma_zalloc_coherent(&sdev->pdev->dev, SLIC_EEPROM_SIZE,
> +				     &paddr, GFP_KERNEL);
> +	if (!eeprom)
> +		return -ENOMEM;
> +
> +	slic_write(sdev, SLIC_REG_ICR, SLIC_ICR_INT_OFF);
> +	/* setup ISP temporarily */
> +	slic_write(sdev, SLIC_REG_ISP, lower_32_bits(sm->isr_paddr));
> +
> +	err = slic_new_upr(sdev, SLIC_UPR_CONFIG, paddr);
> +	if (!err) {
> +		for (i = 0; i < MAX_LOOPS; i++) {
> +			if (le32_to_cpu(sm_data->isr) & SLIC_ISR_UPC)
> +				break;
> +			mdelay(1);
> +		}
> +		if (i == MAX_LOOPS) {
> +			dev_err(&sdev->pdev->dev,
> +				"timed out while waiting for eeprom data\n");
> +			err = -ETIMEDOUT;
> +		}
> +		upr = slic_dequeue_upr(sdev);
> +		kfree(upr);
> +	}
> +
> +	slic_write(sdev, SLIC_REG_ISP, 0);
> +	slic_write(sdev, SLIC_REG_ISR, 0);
> +	slic_flush_write(sdev);
> +
> +	if (err)
> +		goto free_eeprom;
> +
> +	if (sdev->model == SLIC_MODEL_OASIS) {
> +		struct slic_oasis_eeprom *oee;
> +
> +		oee = (struct slic_oasis_eeprom *)eeprom;
> +		mac[0] = oee->mac;
> +		mac[1] = oee->mac2;
> +		codesize = le16_to_cpu(oee->eeprom_code_size);
> +	} else {
> +		struct slic_mojave_eeprom *mee;
> +
> +		mee = (struct slic_mojave_eeprom *)eeprom;
> +		mac[0] = mee->mac;
> +		mac[1] = mee->mac2;
> +		codesize = le16_to_cpu(mee->eeprom_code_size);
> +	}
> +
> +	if (!slic_eeprom_valid(eeprom, codesize)) {
> +		dev_err(&sdev->pdev->dev, "invalid checksum in eeprom\n");
> +		err = -EINVAL;
> +		goto free_eeprom;
> +	}
> +	/* set mac address */
> +	ether_addr_copy(sdev->netdev->dev_addr, mac[devfn]);
> +free_eeprom:
> +	dma_free_coherent(&sdev->pdev->dev, SLIC_EEPROM_SIZE, eeprom, paddr);
> +
> +	return err;
> +}

[...]

> +static int slic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> +{

[...]

> +	err = register_netdev(dev);
> +	if (err) {
> +		dev_err(&pdev->dev, "failed to register net device: %i\n",
> +			err);

Could be on one line.

Regards,
Markus

^ permalink raw reply

* Re: [PATCH iproute2 0/3] update ifstat for new stats
From: Roopa Prabhu @ 2016-11-27 18:00 UTC (permalink / raw)
  To: Nogah Frankel
  Cc: netdev, eladr, yotamg, jiri, idosch, ogerlitz,
	Nikolay Aleksandrov
In-Reply-To: <1479996760-61271-1-git-send-email-nogahf@mellanox.com>

(resending ...failed to send it to the list earlier)

On 11/24/16, 6:12 AM, Nogah Frankel wrote:
> Previously stats were gotten by RTM_GETLINK which return 32 bits based
> statistics. It support only one type of stats.
> Lately, a new method to get stats was added - RTM_GETSTATS. It supports
> ability to choose stats type. The basic stats were changed from 32 bits
> based to 64 bits based.
>
> This patchset change ifstat to the new method, add it the ability to
> choose an extended type of statistic, and add the extended type of SW
> stats for packets that hit cpu.
>
>

(please cc me on the GETSTATS patches)

This looks similar to the one I had submitted here: https://www.spinics.net/lists/netdev/msg375546.html <https://www.spinics.net/lists/netdev/msg375546.html>

There are a few issues with this approach.. (unless they have already been looked at by your patch series).
 This fails new ifstat on older kernels. Moving to 64bit also invalidates existing ifstats history file. 
If you follow the discussion on my patch, there is a way to move to a new history file for 64bit
stats file and still be compatible (ie create a new file for 64 bit stats).

I had started work on fixing these limitations..., but then re-thinking all other new stats in one place
in the context of the new stats api, it is better to extend ip link. This work is also in progress.
here is how we think it should be (also CCing nikolay):

ip link stats /* similar to ip -s link for completeness */
ip link xstats [igmp|lacp]  /* depending on link-type */
ip link afstats [inet|inet6|mpls] /* depending on link-family */
ip link offloadstas [cpu|..]

possible future global non-link stats with 'ip stats [tcp]' and so on.

^ permalink raw reply

* Re: [PATCH net-next 0/4] Documentation: net: phy: Improve documentation
From: Florian Fainelli @ 2016-11-27 18:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur, jbrunet
In-Reply-To: <20161127060133.10357-1-f.fainelli@gmail.com>

David,

Le 26/11/2016 à 22:01, Florian Fainelli a écrit :
> Hi all,
> 
> This patch series addresses discussions and feedback that was recently received
> on the mailing-list in the area of: flow control/pause frames, interpretation of
> phy_interface_t and finally add some links to useful standards documents.

I will improve patch 3 a bit since it contains some minor mistakes and
could deserve some more clarifications, stay tuned, thanks!
-- 
Florian

^ permalink raw reply

* [PATCH net-next v2 0/4] Documentation: net: phy: Improve documentation
From: Florian Fainelli @ 2016-11-27 18:44 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur, jbrunet, Florian Fainelli

Hi all,

This patch series addresses discussions and feedback that was recently received
on the mailing-list in the area of: flow control/pause frames, interpretation of
phy_interface_t and finally add some links to useful standards documents.

Changes in v2:

- clarify a few things in the RGMII section, add a paragraph about common issues
  with RGMII delay mismatches

Florian Fainelli (4):
  Documentation: net: phy: remove description of function pointers
  Documentation: net: phy: Add a paragraph about pause frames/flow
    control
  Documentation: net: phy: Add blurb about RGMII
  Documentation: net: phy: Add links to several standards documents

 Documentation/networking/phy.txt | 139 +++++++++++++++++++++++++++++----------
 1 file changed, 104 insertions(+), 35 deletions(-)

-- 
2.9.3

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox