Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v1 0/5] can: add SAE J1939 protocol
From: Marc Kleine-Budde @ 2018-10-30 14:54 UTC (permalink / raw)
  To: Oleksij Rempel, dev.kurt, wg; +Cc: netdev, kernel, linux-can
In-Reply-To: <20181008094836.14080-1-o.rempel@pengutronix.de>


[-- Attachment #1.1: Type: text/plain, Size: 696 bytes --]

On 10/08/2018 11:48 AM, Oleksij Rempel wrote:
> This series adds SAE J1939 support to the current kernel v4.19-rc6.
> 
> This stack has long history, starting back in 27 Apr 2011, if not
> earlier:
> https://lists.openwall.net/netdev/2011/04/27/45
> 
> After major rework and testing it is a time to send it mainline.

I've removed some trailing newlines and added the stack to
linux-can-next/j1939.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Eric Dumazet @ 2018-10-30 14:16 UTC (permalink / raw)
  To: Paweł Staszewski, Dimitris Michailidis
  Cc: Cong Wang, Linux Kernel Network Developers
In-Reply-To: <b0a36d1f-5518-392c-d731-1ec66ec48f92@itcare.pl>



On 10/30/2018 01:09 AM, Paweł Staszewski wrote:
> 
> 
> W dniu 30.10.2018 o 08:29, Eric Dumazet pisze:
>>
>> On 10/29/2018 11:09 PM, Dimitris Michailidis wrote:
>>
>>> Indeed this is a bug. I would expect it to produce frequent errors
>>> though as many odd-length
>>> packets would trigger it. Do you have RXFCS? Regardless, how
>>> frequently do you see the problem?
>>>
>> Old kernels (before 88078d98d1bb) were simply resetting ip_summed to CHECKSUM_NONE
>>
>> And before your fix (commit d55bef5059dd057bd), mlx5 bug was canceling the bug you fixed.
>>
>> So we now need to also fix mlx5.
>>
>> And of course use skb_header_pointer() in mlx5e_get_fcs() as I mentioned earlier,
>> plus __get_unaligned_cpu32() as you hinted.
>>
>>
>>
>>
> 
> No RXFCS
> 
> And this trace is rly frequently like once per 3/4 seconds
> like below:
> [28965.776864] vlan1490: hw csum failure

Might be vlan related.

Can you first check this :

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 94224c22ecc310a87b6715051e335446f29bec03..6f4bfebf0d9a3ae7567062abb3ea6532b3aaf3d6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -789,13 +789,8 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
                skb->ip_summed = CHECKSUM_COMPLETE;
                skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
                if (network_depth > ETH_HLEN)
-                       /* CQE csum is calculated from the IP header and does
-                        * not cover VLAN headers (if present). This will add
-                        * the checksum manually.
-                        */
-                       skb->csum = csum_partial(skb->data + ETH_HLEN,
-                                                network_depth - ETH_HLEN,
-                                                skb->csum);
+                       /* Temporary debugging */
+                       skb->ip_summed = CHECKSUM_NONE;
                if (unlikely(netdev->features & NETIF_F_RXFCS))
                        skb->csum = csum_add(skb->csum,
                                             (__force __wsum)mlx5e_get_fcs(skb));

^ permalink raw reply related

* [Patch V5 net 05/11] net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring()
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

It is not necessary to reset the queue in the hns3_uninit_all_ring(),
since the queue is stopped in the down operation, and will be reset
in the up operation. And the judgment of the HCLGE_STATE_RST_HANDLING
flag in the hclge_reset_tqp() is not correct, because we need to reset
tqp during pf reset, otherwise it may cause queue not being reset to
working state problem.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
V4: Fixes comments from Sergei Shtylyov
V3: Fixes comments from Sergei Shtylyov
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c         | 3 ---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index b767ff9..bf71c23 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -3250,9 +3250,6 @@ int hns3_uninit_all_ring(struct hns3_nic_priv *priv)
 	int i;
 
 	for (i = 0; i < h->kinfo.num_tqps; i++) {
-		if (h->ae_algo->ops->reset_queue)
-			h->ae_algo->ops->reset_queue(h, i);
-
 		hns3_fini_ring(priv->ring_data[i].ring);
 		hns3_fini_ring(priv->ring_data[i + h->kinfo.num_tqps].ring);
 	}
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 2a63147..4dd0506 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -6116,9 +6116,6 @@ void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 	u16 queue_gid;
 	int ret;
 
-	if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
-		return;
-
 	queue_gid = hclge_covert_handle_qid_global(handle, queue_id);
 
 	ret = hclge_tqp_enable(hdev, queue_id, 0, false);
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 08/11] net: hns3: fix incorrect return value/type of some functions
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

There are some functions that, when they fail to send the command,
need to return the corresponding error value to its caller.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Fixes: 681ec3999b3d ("net: hns3: fix for vlan table lost problem when resetting")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
V2: Fixes the compilation error reported by kbuild test robot
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h        |  6 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 80 +++++++++++++++-------
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h    |  2 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 34 ++++-----
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  2 +-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 14 ++--
 6 files changed, 85 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index e82e4ca..055b406 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -316,8 +316,8 @@ struct hnae3_ae_ops {
 	int (*set_loopback)(struct hnae3_handle *handle,
 			    enum hnae3_loop loop_mode, bool en);
 
-	void (*set_promisc_mode)(struct hnae3_handle *handle, bool en_uc_pmc,
-				 bool en_mc_pmc);
+	int (*set_promisc_mode)(struct hnae3_handle *handle, bool en_uc_pmc,
+				bool en_mc_pmc);
 	int (*set_mtu)(struct hnae3_handle *handle, int new_mtu);
 
 	void (*get_pauseparam)(struct hnae3_handle *handle,
@@ -391,7 +391,7 @@ struct hnae3_ae_ops {
 				      int vector_num,
 				      struct hnae3_ring_chain_node *vr_chain);
 
-	void (*reset_queue)(struct hnae3_handle *handle, u16 queue_id);
+	int (*reset_queue)(struct hnae3_handle *handle, u16 queue_id);
 	u32 (*get_fw_version)(struct hnae3_handle *handle);
 	void (*get_mdix_mode)(struct hnae3_handle *handle,
 			      u8 *tp_mdix_ctrl, u8 *tp_mdix);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index bf71c23..3f96aa3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -509,16 +509,18 @@ static void hns3_nic_set_rx_mode(struct net_device *netdev)
 	h->netdev_flags = new_flags;
 }
 
-void hns3_update_promisc_mode(struct net_device *netdev, u8 promisc_flags)
+int hns3_update_promisc_mode(struct net_device *netdev, u8 promisc_flags)
 {
 	struct hns3_nic_priv *priv = netdev_priv(netdev);
 	struct hnae3_handle *h = priv->ae_handle;
 
 	if (h->ae_algo->ops->set_promisc_mode) {
-		h->ae_algo->ops->set_promisc_mode(h,
-						  promisc_flags & HNAE3_UPE,
-						  promisc_flags & HNAE3_MPE);
+		return h->ae_algo->ops->set_promisc_mode(h,
+						promisc_flags & HNAE3_UPE,
+						promisc_flags & HNAE3_MPE);
 	}
+
+	return 0;
 }
 
 void hns3_enable_vlan_filter(struct net_device *netdev, bool enable)
@@ -1494,18 +1496,22 @@ static int hns3_vlan_rx_kill_vid(struct net_device *netdev,
 	return ret;
 }
 
-static void hns3_restore_vlan(struct net_device *netdev)
+static int hns3_restore_vlan(struct net_device *netdev)
 {
 	struct hns3_nic_priv *priv = netdev_priv(netdev);
+	int ret = 0;
 	u16 vid;
-	int ret;
 
 	for_each_set_bit(vid, priv->active_vlans, VLAN_N_VID) {
 		ret = hns3_vlan_rx_add_vid(netdev, htons(ETH_P_8021Q), vid);
-		if (ret)
-			netdev_warn(netdev, "Restore vlan: %d filter, ret:%d\n",
-				    vid, ret);
+		if (ret) {
+			netdev_err(netdev, "Restore vlan: %d filter, ret:%d\n",
+				   vid, ret);
+			return ret;
+		}
 	}
+
+	return ret;
 }
 
 static int hns3_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan,
@@ -3257,11 +3263,12 @@ int hns3_uninit_all_ring(struct hns3_nic_priv *priv)
 }
 
 /* Set mac addr if it is configured. or leave it to the AE driver */
-static void hns3_init_mac_addr(struct net_device *netdev, bool init)
+static int hns3_init_mac_addr(struct net_device *netdev, bool init)
 {
 	struct hns3_nic_priv *priv = netdev_priv(netdev);
 	struct hnae3_handle *h = priv->ae_handle;
 	u8 mac_addr_temp[ETH_ALEN];
+	int ret = 0;
 
 	if (h->ae_algo->ops->get_mac_addr && init) {
 		h->ae_algo->ops->get_mac_addr(h, mac_addr_temp);
@@ -3276,8 +3283,9 @@ static void hns3_init_mac_addr(struct net_device *netdev, bool init)
 	}
 
 	if (h->ae_algo->ops->set_mac_addr)
-		h->ae_algo->ops->set_mac_addr(h, netdev->dev_addr, true);
+		ret = h->ae_algo->ops->set_mac_addr(h, netdev->dev_addr, true);
 
+	return ret;
 }
 
 static int hns3_restore_fd_rules(struct net_device *netdev)
@@ -3490,20 +3498,29 @@ static int hns3_client_setup_tc(struct hnae3_handle *handle, u8 tc)
 	return ret;
 }
 
-static void hns3_recover_hw_addr(struct net_device *ndev)
+static int hns3_recover_hw_addr(struct net_device *ndev)
 {
 	struct netdev_hw_addr_list *list;
 	struct netdev_hw_addr *ha, *tmp;
+	int ret = 0;
 
 	/* go through and sync uc_addr entries to the device */
 	list = &ndev->uc;
-	list_for_each_entry_safe(ha, tmp, &list->list, list)
-		hns3_nic_uc_sync(ndev, ha->addr);
+	list_for_each_entry_safe(ha, tmp, &list->list, list) {
+		ret = hns3_nic_uc_sync(ndev, ha->addr);
+		if (ret)
+			return ret;
+	}
 
 	/* go through and sync mc_addr entries to the device */
 	list = &ndev->mc;
-	list_for_each_entry_safe(ha, tmp, &list->list, list)
-		hns3_nic_mc_sync(ndev, ha->addr);
+	list_for_each_entry_safe(ha, tmp, &list->list, list) {
+		ret = hns3_nic_mc_sync(ndev, ha->addr);
+		if (ret)
+			return ret;
+	}
+
+	return ret;
 }
 
 static void hns3_remove_hw_addr(struct net_device *netdev)
@@ -3630,7 +3647,10 @@ int hns3_nic_reset_all_ring(struct hnae3_handle *h)
 	int ret;
 
 	for (i = 0; i < h->kinfo.num_tqps; i++) {
-		h->ae_algo->ops->reset_queue(h, i);
+		ret = h->ae_algo->ops->reset_queue(h, i);
+		if (ret)
+			return ret;
+
 		hns3_init_ring_hw(priv->ring_data[i].ring);
 
 		/* We need to clear tx ring here because self test will
@@ -3722,18 +3742,30 @@ static int hns3_reset_notify_init_enet(struct hnae3_handle *handle)
 	bool vlan_filter_enable;
 	int ret;
 
-	hns3_init_mac_addr(netdev, false);
-	hns3_recover_hw_addr(netdev);
-	hns3_update_promisc_mode(netdev, handle->netdev_flags);
+	ret = hns3_init_mac_addr(netdev, false);
+	if (ret)
+		return ret;
+
+	ret = hns3_recover_hw_addr(netdev);
+	if (ret)
+		return ret;
+
+	ret = hns3_update_promisc_mode(netdev, handle->netdev_flags);
+	if (ret)
+		return ret;
+
 	vlan_filter_enable = netdev->flags & IFF_PROMISC ? false : true;
 	hns3_enable_vlan_filter(netdev, vlan_filter_enable);
 
-
 	/* Hardware table is only clear when pf resets */
-	if (!(handle->flags & HNAE3_SUPPORT_VF))
-		hns3_restore_vlan(netdev);
+	if (!(handle->flags & HNAE3_SUPPORT_VF)) {
+		ret = hns3_restore_vlan(netdev);
+		return ret;
+	}
 
-	hns3_restore_fd_rules(netdev);
+	ret = hns3_restore_fd_rules(netdev);
+	if (ret)
+		return ret;
 
 	/* Carrier off reporting is important to ethtool even BEFORE open */
 	netif_carrier_off(netdev);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index 71cfca1..d3636d0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -640,7 +640,7 @@ void hns3_set_vector_coalesce_rl(struct hns3_enet_tqp_vector *tqp_vector,
 				 u32 rl_value);
 
 void hns3_enable_vlan_filter(struct net_device *netdev, bool enable);
-void hns3_update_promisc_mode(struct net_device *netdev, u8 promisc_flags);
+int hns3_update_promisc_mode(struct net_device *netdev, u8 promisc_flags);
 
 #ifdef CONFIG_HNS3_DCB
 void hns3_dcbnl_setup(struct hnae3_handle *handle);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 4dd0506..f3212c9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3314,8 +3314,8 @@ void hclge_promisc_param_init(struct hclge_promisc_param *param, bool en_uc,
 	param->vf_id = vport_id;
 }
 
-static void hclge_set_promisc_mode(struct hnae3_handle *handle, bool en_uc_pmc,
-				   bool en_mc_pmc)
+static int hclge_set_promisc_mode(struct hnae3_handle *handle, bool en_uc_pmc,
+				  bool en_mc_pmc)
 {
 	struct hclge_vport *vport = hclge_get_vport(handle);
 	struct hclge_dev *hdev = vport->back;
@@ -3323,7 +3323,7 @@ static void hclge_set_promisc_mode(struct hnae3_handle *handle, bool en_uc_pmc,
 
 	hclge_promisc_param_init(&param, en_uc_pmc, en_mc_pmc, true,
 				 vport->vport_id);
-	hclge_cmd_set_promisc_mode(hdev, &param);
+	return hclge_cmd_set_promisc_mode(hdev, &param);
 }
 
 static int hclge_get_fd_mode(struct hclge_dev *hdev, u8 *fd_mode)
@@ -6107,28 +6107,28 @@ static u16 hclge_covert_handle_qid_global(struct hnae3_handle *handle,
 	return tqp->index;
 }
 
-void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
+int hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 {
 	struct hclge_vport *vport = hclge_get_vport(handle);
 	struct hclge_dev *hdev = vport->back;
 	int reset_try_times = 0;
 	int reset_status;
 	u16 queue_gid;
-	int ret;
+	int ret = 0;
 
 	queue_gid = hclge_covert_handle_qid_global(handle, queue_id);
 
 	ret = hclge_tqp_enable(hdev, queue_id, 0, false);
 	if (ret) {
-		dev_warn(&hdev->pdev->dev, "Disable tqp fail, ret = %d\n", ret);
-		return;
+		dev_err(&hdev->pdev->dev, "Disable tqp fail, ret = %d\n", ret);
+		return ret;
 	}
 
 	ret = hclge_send_reset_tqp_cmd(hdev, queue_gid, true);
 	if (ret) {
-		dev_warn(&hdev->pdev->dev,
-			 "Send reset tqp cmd fail, ret = %d\n", ret);
-		return;
+		dev_err(&hdev->pdev->dev,
+			"Send reset tqp cmd fail, ret = %d\n", ret);
+		return ret;
 	}
 
 	reset_try_times = 0;
@@ -6141,16 +6141,16 @@ void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 	}
 
 	if (reset_try_times >= HCLGE_TQP_RESET_TRY_TIMES) {
-		dev_warn(&hdev->pdev->dev, "Reset TQP fail\n");
-		return;
+		dev_err(&hdev->pdev->dev, "Reset TQP fail\n");
+		return ret;
 	}
 
 	ret = hclge_send_reset_tqp_cmd(hdev, queue_gid, false);
-	if (ret) {
-		dev_warn(&hdev->pdev->dev,
-			 "Deassert the soft reset fail, ret = %d\n", ret);
-		return;
-	}
+	if (ret)
+		dev_err(&hdev->pdev->dev,
+			"Deassert the soft reset fail, ret = %d\n", ret);
+
+	return ret;
 }
 
 void hclge_reset_vf_queue(struct hclge_vport *vport, u16 queue_id)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index e3dfd65..0d92154 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -778,7 +778,7 @@ int hclge_rss_init_hw(struct hclge_dev *hdev);
 void hclge_rss_indir_init_cfg(struct hclge_dev *hdev);
 
 void hclge_mbx_handler(struct hclge_dev *hdev);
-void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id);
+int hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id);
 void hclge_reset_vf_queue(struct hclge_vport *vport, u16 queue_id);
 int hclge_cfg_flowctrl(struct hclge_dev *hdev);
 int hclge_func_reset_cmd(struct hclge_dev *hdev, int func_id);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index e0a86a5..b224f6a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -925,12 +925,12 @@ static int hclgevf_cmd_set_promisc_mode(struct hclgevf_dev *hdev,
 	return status;
 }
 
-static void hclgevf_set_promisc_mode(struct hnae3_handle *handle,
-				     bool en_uc_pmc, bool en_mc_pmc)
+static int hclgevf_set_promisc_mode(struct hnae3_handle *handle,
+				    bool en_uc_pmc, bool en_mc_pmc)
 {
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
 
-	hclgevf_cmd_set_promisc_mode(hdev, en_uc_pmc, en_mc_pmc);
+	return hclgevf_cmd_set_promisc_mode(hdev, en_uc_pmc, en_mc_pmc);
 }
 
 static int hclgevf_tqp_enable(struct hclgevf_dev *hdev, int tqp_id,
@@ -1080,7 +1080,7 @@ static int hclgevf_en_hw_strip_rxvtag(struct hnae3_handle *handle, bool enable)
 				    1, false, NULL, 0);
 }
 
-static void hclgevf_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
+static int hclgevf_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 {
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
 	u8 msg_data[2];
@@ -1091,10 +1091,10 @@ static void hclgevf_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 	/* disable vf queue before send queue reset msg to PF */
 	ret = hclgevf_tqp_enable(hdev, queue_id, 0, false);
 	if (ret)
-		return;
+		return ret;
 
-	hclgevf_send_mbx_msg(hdev, HCLGE_MBX_QUEUE_RESET, 0, msg_data,
-			     2, true, NULL, 0);
+	return hclgevf_send_mbx_msg(hdev, HCLGE_MBX_QUEUE_RESET, 0, msg_data,
+				    2, true, NULL, 0);
 }
 
 static int hclgevf_notify_client(struct hclgevf_dev *hdev,
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 07/11] net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

When there is a PHY, the driver needs to complete some operations through
MDIO during reset reinitialization, so HCLGE_STATE_CMD_DISABLE is more
suitable than HCLGE_STATE_RST_HANDLING to prevent the MDIO operation from
being sent during the hardware reset.

Fixes: b50ae26c57cb ("net: hns3: never send command queue message to IMP when reset)
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
index 24b1f2a..0301863 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
@@ -52,7 +52,7 @@ static int hclge_mdio_write(struct mii_bus *bus, int phyid, int regnum,
 	struct hclge_desc desc;
 	int ret;
 
-	if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
+	if (test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state))
 		return 0;
 
 	hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_MDIO_CONFIG, false);
@@ -90,7 +90,7 @@ static int hclge_mdio_read(struct mii_bus *bus, int phyid, int regnum)
 	struct hclge_desc desc;
 	int ret;
 
-	if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
+	if (test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state))
 		return 0;
 
 	hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_MDIO_CONFIG, true);
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 04/11] net: hns3: bugfix for the initialization of command queue's spin lock
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

The spin lock of the command queue only need to be initialized once
when the driver initializes the command queue. It is not necessary to
initialize the spin lock when resetting. At the same time, the
modification of the queue member should be performed after acquiring
the lock.

Fixes: 3efb960f056d ("net: hns3: Refactor the initialization of command queue")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index ac13cb2..68026a5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -304,6 +304,10 @@ int hclge_cmd_queue_init(struct hclge_dev *hdev)
 {
 	int ret;
 
+	/* Setup the lock for command queue */
+	spin_lock_init(&hdev->hw.cmq.csq.lock);
+	spin_lock_init(&hdev->hw.cmq.crq.lock);
+
 	/* Setup the queue entries for use cmd queue */
 	hdev->hw.cmq.csq.desc_num = HCLGE_NIC_CMQ_DESC_NUM;
 	hdev->hw.cmq.crq.desc_num = HCLGE_NIC_CMQ_DESC_NUM;
@@ -337,18 +341,20 @@ int hclge_cmd_init(struct hclge_dev *hdev)
 	u32 version;
 	int ret;
 
+	spin_lock_bh(&hdev->hw.cmq.csq.lock);
+	spin_lock_bh(&hdev->hw.cmq.crq.lock);
+
 	hdev->hw.cmq.csq.next_to_clean = 0;
 	hdev->hw.cmq.csq.next_to_use = 0;
 	hdev->hw.cmq.crq.next_to_clean = 0;
 	hdev->hw.cmq.crq.next_to_use = 0;
 
-	/* Setup the lock for command queue */
-	spin_lock_init(&hdev->hw.cmq.csq.lock);
-	spin_lock_init(&hdev->hw.cmq.crq.lock);
-
 	hclge_cmd_init_regs(&hdev->hw);
 	clear_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state);
 
+	spin_unlock_bh(&hdev->hw.cmq.crq.lock);
+	spin_unlock_bh(&hdev->hw.cmq.csq.lock);
+
 	ret = hclge_cmd_query_firmware_version(&hdev->hw, &version);
 	if (ret) {
 		dev_err(&hdev->pdev->dev,
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 03/11] net: hns3: bugfix for reporting unknown vector0 interrupt repeatly problem
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

The current driver supports handling two vector0 interrupts, reset and
mailbox. When the hardware reports an interrupt of another type of
interrupt source, if the driver does not process the interrupt, but
enables the interrupt, the hardware will repeatedly report the unknown
interrupt.

Therefore, the driver enables the vector0 interrupt after clearing the
known type of interrupt source. Other conditions are not enabled.

Fixes: cd8c5c269b1d ("net: hns3: Fix for hclge_reset running repeatly problem")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 5234b53..2a63147 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2236,7 +2236,7 @@ static irqreturn_t hclge_misc_irq_handle(int irq, void *data)
 	}
 
 	/* clear the source of interrupt if it is not cause by reset */
-	if (event_cause != HCLGE_VECTOR0_EVENT_RST) {
+	if (event_cause == HCLGE_VECTOR0_EVENT_MBX) {
 		hclge_clear_event_cause(hdev, event_cause, clearval);
 		hclge_enable_vector(&hdev->misc_vector, true);
 	}
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 02/11] net: hns3: bugfix for buffer not free problem during resetting
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

When hns3_get_ring_config()/hns3_queue_to_ring()/
hns3_get_vector_ring_chain() failed during resetting, the allocated
memory has not been freed before these three functions return. So
this patch adds error handler in these functions to fix it.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
V5: Fixes comments from Sergei Shtylyov
    add error handler for hns3_get_vector_ring_chain()
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 0b4323b..b767ff9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2727,7 +2727,7 @@ static int hns3_get_vector_ring_chain(struct hns3_enet_tqp_vector *tqp_vector,
 			chain = devm_kzalloc(&pdev->dev, sizeof(*chain),
 					     GFP_KERNEL);
 			if (!chain)
-				return -ENOMEM;
+				goto err_free_chain;
 
 			cur_chain->next = chain;
 			chain->tqp_index = tx_ring->tqp->tqp_index;
@@ -2757,7 +2757,7 @@ static int hns3_get_vector_ring_chain(struct hns3_enet_tqp_vector *tqp_vector,
 	while (rx_ring) {
 		chain = devm_kzalloc(&pdev->dev, sizeof(*chain), GFP_KERNEL);
 		if (!chain)
-			return -ENOMEM;
+			goto err_free_chain;
 
 		cur_chain->next = chain;
 		chain->tqp_index = rx_ring->tqp->tqp_index;
@@ -2772,6 +2772,16 @@ static int hns3_get_vector_ring_chain(struct hns3_enet_tqp_vector *tqp_vector,
 	}
 
 	return 0;
+
+err_free_chain:
+	cur_chain = head->next;
+	while (cur_chain) {
+		chain = cur_chain->next;
+		devm_kfree(&pdev->dev, chain);
+		cur_chain = chain;
+	}
+
+	return -ENOMEM;
 }
 
 static void hns3_free_vector_ring_chain(struct hns3_enet_tqp_vector *tqp_vector,
@@ -3037,8 +3047,10 @@ static int hns3_queue_to_ring(struct hnae3_queue *tqp,
 		return ret;
 
 	ret = hns3_ring_get_cfg(tqp, priv, HNAE3_RING_TYPE_RX);
-	if (ret)
+	if (ret) {
+		devm_kfree(priv->dev, priv->ring_data[tqp->tqp_index].ring);
 		return ret;
+	}
 
 	return 0;
 }
@@ -3065,6 +3077,12 @@ static int hns3_get_ring_config(struct hns3_nic_priv *priv)
 
 	return 0;
 err:
+	while (i--) {
+		devm_kfree(priv->dev, priv->ring_data[i].ring);
+		devm_kfree(priv->dev,
+			   priv->ring_data[i + h->kinfo.num_tqps].ring);
+	}
+
 	devm_kfree(&pdev->dev, priv->ring_data);
 	return ret;
 }
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 11/11] net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset()
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

Since hclgevf_reset_wait() is used to wait for the hardware to complete
the reset, it is not necessary to hold the rtnl_lock during
hclgevf_reset_wait(). So this patch releases the lock for the duration
of hclgevf_reset_wait().

Fixes: 6988eb2a9b77 ("net: hns3: Add support to reset the enet/ring mgmt layer")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index b224f6a..085edb9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1170,6 +1170,8 @@ static int hclgevf_reset(struct hclgevf_dev *hdev)
 	/* bring down the nic to stop any ongoing TX/RX */
 	hclgevf_notify_client(hdev, HNAE3_DOWN_CLIENT);
 
+	rtnl_unlock();
+
 	/* check if VF could successfully fetch the hardware reset completion
 	 * status from the hardware
 	 */
@@ -1181,12 +1183,15 @@ static int hclgevf_reset(struct hclgevf_dev *hdev)
 			ret);
 
 		dev_warn(&hdev->pdev->dev, "VF reset failed, disabling VF!\n");
+		rtnl_lock();
 		hclgevf_notify_client(hdev, HNAE3_UNINIT_CLIENT);
 
 		rtnl_unlock();
 		return ret;
 	}
 
+	rtnl_lock();
+
 	/* now, re-initialize the nic client and ae device*/
 	ret = hclgevf_reset_stack(hdev);
 	if (ret)
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 06/11] net: hns3: bugfix for is_valid_csq_clean_head()
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

The HEAD pointer of the hardware command queue maybe equal to the command
queue's next_to_use in the driver, so that does not belong to the invalid
HEAD pointer, since the hardware may not process the command in time,
causing the HEAD pointer to be too late to update. The variables' name
in this function is unreadable, so give them a more readable one.

Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index 68026a5..690f62e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -24,15 +24,15 @@ static int hclge_ring_space(struct hclge_cmq_ring *ring)
 	return ring->desc_num - used - 1;
 }
 
-static int is_valid_csq_clean_head(struct hclge_cmq_ring *ring, int h)
+static int is_valid_csq_clean_head(struct hclge_cmq_ring *ring, int head)
 {
-	int u = ring->next_to_use;
-	int c = ring->next_to_clean;
+	int ntu = ring->next_to_use;
+	int ntc = ring->next_to_clean;
 
-	if (unlikely(h >= ring->desc_num))
-		return 0;
+	if (ntu > ntc)
+		return head >= ntc && head <= ntu;
 
-	return u > c ? (h > c && h <= u) : (h > c || h <= u);
+	return head >= ntc || head <= ntu;
 }
 
 static int hclge_alloc_cmd_desc(struct hclge_cmq_ring *ring)
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 10/11] net: hns3: bugfix for rtnl_lock's range in the hclge_reset()
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

Since hclge_reset_wait() is used to wait for the hardware to complete
the reset, it is not necessary to hold the rtnl_lock during
hclge_reset_wait(). So this patch releases the lock for the duration
of hclge_reset_wait().

Fixes: 6d4fab39533f ("net: hns3: Reset net device with rtnl_lock")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index f3212c9..ffdd960 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2470,14 +2470,17 @@ static void hclge_reset(struct hclge_dev *hdev)
 	handle = &hdev->vport[0].nic;
 	rtnl_lock();
 	hclge_notify_client(hdev, HNAE3_DOWN_CLIENT);
+	rtnl_unlock();
 
 	if (!hclge_reset_wait(hdev)) {
+		rtnl_lock();
 		hclge_notify_client(hdev, HNAE3_UNINIT_CLIENT);
 		hclge_reset_ae_dev(hdev->ae_dev);
 		hclge_notify_client(hdev, HNAE3_INIT_CLIENT);
 
 		hclge_clear_reset_cause(hdev);
 	} else {
+		rtnl_lock();
 		/* schedule again to check pending resets later */
 		set_bit(hdev->reset_type, &hdev->reset_pending);
 		hclge_reset_task_schedule(hdev);
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 09/11] net: hns3: bugfix for handling mailbox while the command queue reinitialized
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

In a multi-core machine, the mailbox service and reset service
will be executed at the same time. The reset service will re-initialize
the command queue, before that, the mailbox handler can only get some
invalid messages.

The HCLGE_STATE_CMD_DISABLE flag means that the command queue is not
available and needs to be reinitialized. Therefore, when the mailbox
handler recognizes this flag, it should not process the command.

Fixes: dde1a86e93ca ("net: hns3: Add mailbox support to PF driver")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
V3: Fixes comments from Sergei Shtylyov
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index 04462a3..f890022 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -400,6 +400,12 @@ void hclge_mbx_handler(struct hclge_dev *hdev)
 
 	/* handle all the mailbox requests in the queue */
 	while (!hclge_cmd_crq_empty(&hdev->hw)) {
+		if (test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state)) {
+			dev_warn(&hdev->pdev->dev,
+				 "command queue needs re-initializing\n");
+			return;
+		}
+
 		desc = &crq->desc[crq->next_to_use];
 		req = (struct hclge_mbx_vf_to_pf_cmd *)desc->data;
 
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 01/11] net: hns3: add error handler for hns3_nic_init_vector_data()
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

When hns3_nic_init_vector_data() fails to map ring to vector,
it should cancel the netif_napi_add() that has been successfully
done and then exits.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
V5: Fixes comments from Joe Perches
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 32f3aca8..0b4323b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2821,7 +2821,7 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv)
 	struct hnae3_handle *h = priv->ae_handle;
 	struct hns3_enet_tqp_vector *tqp_vector;
 	int ret = 0;
-	u16 i;
+	int i;
 
 	hns3_nic_set_cpumask(priv);
 
@@ -2868,13 +2868,19 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv)
 		hns3_free_vector_ring_chain(tqp_vector, &vector_ring_chain);
 
 		if (ret)
-			return ret;
+			goto map_ring_fail;
 
 		netif_napi_add(priv->netdev, &tqp_vector->napi,
 			       hns3_nic_common_poll, NAPI_POLL_WEIGHT);
 	}
 
 	return 0;
+
+map_ring_fail:
+	while (i--)
+		netif_napi_del(&priv->tqp_vector[i].napi);
+
+	return ret;
 }
 
 static int hns3_nic_alloc_vector_data(struct hns3_nic_priv *priv)
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 00/11] Bugfix for the HNS3 driver
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng

This patch series include bugfix for the HNS3 ethernet
controller driver.

Change log:
V4->V5:
	Fixes comments from Joe Perches & Sergei Shtylyov
V3->V4:
	Fixes comments from Sergei Shtylyov
V2->V3:
	Fixes comments from Sergei Shtylyov
V1->V2:
	Fixes the compilation break reported by kbuild test robot
	http://patchwork.ozlabs.org/patch/989818/

Huazhong Tan (11):
  net: hns3: add error handler for hns3_nic_init_vector_data()
  net: hns3: bugfix for buffer not free problem during resetting
  net: hns3: bugfix for reporting unknown vector0 interrupt repeatly
    problem
  net: hns3: bugfix for the initialization of command queue's spin lock
  net: hns3: remove unnecessary queue reset in the
    hns3_uninit_all_ring()
  net: hns3: bugfix for is_valid_csq_clean_head()
  net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read
  net: hns3: fix incorrect return value/type of some functions
  net: hns3: bugfix for handling mailbox while the command queue
    reinitialized
  net: hns3: bugfix for rtnl_lock's range in the hclge_reset()
  net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset()

 drivers/net/ethernet/hisilicon/hns3/hnae3.h        |   6 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 117 +++++++++++++++------
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h    |   2 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c |  26 +++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    |  42 ++++----
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |   2 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c |   6 ++
 .../ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c    |   4 +-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  |  19 ++--
 9 files changed, 147 insertions(+), 77 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [PATCH net] net: ethernet: cadence: fix socket buffer corruption problem
From: Claudiu.Beznea @ 2018-10-30 13:23 UTC (permalink / raw)
  To: Tristram.Ha; +Cc: UNGLinuxDriver, netdev, davem, Nicolas.Ferre
In-Reply-To: <93AF473E2DA327428DE3D46B72B1E9FD411D076B@CHN-SV-EXMX02.mchp-main.com>



On 29.10.2018 23:40, Tristram Ha - C24268 wrote:
>> Could you, please, tell me if the above variable was false in your case?
>>
>> bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb);
>>
>> If yes, then, the proper fix would be as follows:
>>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c
>> b/drivers/net/ethernet/cadence/macb_main.c
>> index 8f5bf9166c11..492a8e1a34cd 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -1690,7 +1690,7 @@ static int macb_pad_and_fcs(struct sk_buff **skb,
>> struct net_device *ndev)
>>                 padlen += ETH_FCS_LEN;
>>         }
>>
>> -       if (!cloned && headroom + tailroom >= padlen) {
>> +       if (!cloned && headroom + tailroom >= ETH_FCS_LEN) {
>>                 (*skb)->data = memmove((*skb)->head, (*skb)->data, (*skb)->len);
>>                 skb_set_tail_pointer(*skb, (*skb)->len);
>>         } else {
>>
>> Could you please check if it works in your case (and without your patch)?
>>
> 
> Actually doing that reveals another bug:
> 
>         if (padlen) {
>                 if (padlen >= ETH_FCS_LEN)
>                         skb_put_zero(*skb, padlen - ETH_FCS_LEN);
>                 else
>                         skb_trim(*skb, ETH_FCS_LEN - padlen);
>         }
> 
> My fix calls skb_put_zero with zero length.  Your change calls skb_trim which
> actually sets the socket buffer length to 1!
> 
> When this problem happens headroom is 2, tailroom is 1, skb->len is 61, and
> padlen is 3.
> 

Ok, I see now. Looking again, your fix is good. But, as you said, there is
the skb_trim() in this function that is wrong from the beginning (my bad).
I propose to remove it since, with your fix is not even reached anymore.

Could you check on your side that applying this on top of your patch, your
scenario is still working?

diff --git a/drivers/net/ethernet/cadence/macb_main.c
b/drivers/net/ethernet/cadence/macb_main.c
index 1d86b4d5645a..e1347d6d1b50 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1702,12 +1702,8 @@ static int macb_pad_and_fcs(struct sk_buff **skb,
struct net_device *ndev)
                *skb = nskb;
        }

-       if (padlen) {
-               if (padlen >= ETH_FCS_LEN)
-                       skb_put_zero(*skb, padlen - ETH_FCS_LEN);
-               else
-                       skb_trim(*skb, ETH_FCS_LEN - padlen);
-       }
+       if (padlen > ETH_FCS_LEN)
+               skb_put_zero(*skb, padlen - ETH_FCS_LEN);


> DSA driver is being used.  That is why the length is already padded to 60 bytes
> and 1-byte tail tag is added.

Ok, I see, I didn't test with such configurations.

> 
> BTW, I am not sure while this macb_pad_and_fcs function was added.  Is it to
> workaround some hardware bugs?  The code is executed only when
> NETIF_IF_HW_CSUM is used.  But if hardware tx checksumming is enabled why
> not also use the hardware to calculate CRC?

It was reported in [1] that UDP checksum is offloaded to hardware no matter
the application previously computed it.

The code should be executed only for packets that has checksum computed by
applications ((*skb)->ip_summed != CHECKSUM_PARTIAL). The idea was to not
recompute checksum for packets with checksum already computed. To do so,
while hardware checksum is enabled (NETIF_F_HW_CSUM), TX_NOCRC bit should
be set on buffer descriptor. But to do so, packets must have a minimum size
of 64 and FCS to be computed.

The NETIF_F_HW_CSUM check was placed there because the issue described in
[1] is reproducible because hardware checksum is enabled and overrides the
checksum provided by applications.

[1] https://www.spinics.net/lists/netdev/msg505065.html
> 
> NETIF_F_SG is not enabled in the MAC I used, so enabling NETIF_IF_HW_CSUM
> is rather pointless.  With the padding code the transmit throughput cannot get
> higher than 100Mbps in a gigabit connection.
> 
> I would recommend to add this option to disable manual padding in one of those
> macb_config structures.

In this way the user would have to know from the beginning what kind of
packets are used.

Thank you,
Claudiu Beznea

> 

^ permalink raw reply related

* Re: [BUG] MVPP2 driver exploding in presence of a tap interface
From: Thomas Petazzoni @ 2018-10-30 13:00 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: Marc Zyngier, Antoine Tenart, Maxime Chevallier, linux-arm-kernel,
	netdev, Grzegorz Jaszczyk, Tomasz Nowicki
In-Reply-To: <CAPv3WKfn6QFPnNjpY2dU5-OHftObzdcrHopX8Y9w5h37Zd4BNw@mail.gmail.com>

Hello Marcin,

Thanks for the feedback.

On Tue, 30 Oct 2018 13:37:37 +0100, Marcin Wojtas wrote:

> You use _really_ archaic firmware, the bug you see is 99% caused by a
> bug already fixed long time ago (cleanup all PP2 BM pools correctly
> during exit boot services). Please grab the latest release:
> https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/wiki/files/flash-image-18.09.4.bin
> and let know if you observe any further issues with vanilla kernel.

Even if this was a bug in the UEFI firmware, shouldn't the kernel be
independent from that, by doing a proper reset/reinit of the HW ?

I.e, isn't the firmware fix papering over a bug that should be fixed in
Linux mvpp2 driver anyway ?

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [BUG] MVPP2 driver exploding in presence of a tap interface
From: Marc Zyngier @ 2018-10-30 12:59 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: Antoine Tenart, Thomas Petazzoni, Maxime Chevallier,
	linux-arm-kernel, netdev, Grzegorz Jaszczyk, Tomasz Nowicki
In-Reply-To: <CAPv3WKfn6QFPnNjpY2dU5-OHftObzdcrHopX8Y9w5h37Zd4BNw@mail.gmail.com>

Marcin,

On 30/10/18 12:37, Marcin Wojtas wrote:
> [Resend in UTF-8]
> 
> Hi Marc,
> 
> You use _really_ archaic firmware, the bug you see is 99% caused by a

Please let me fix this for you:

s/_really_ archaic/released/

> bug already fixed long time ago (cleanup all PP2 BM pools correctly
> during exit boot services). 

How long ago? Why didn't you say so when I reported the bug to you and
Antoine back in January? Also, why isn't that "clean-up" taken care of
by the Linux driver? Exiting boot services itself doesn't seem to cause
the issue, and it is setting the interface up that causes it.

> Please grab the latest release:
> https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/wiki/files/flash-image-18.09.4.bin
> and let know if you observe any further issues with vanilla kernel.

What does this image contain?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply

* Re: [BUG] MVPP2 driver exploding in presence of a tap interface
From: Marcin Wojtas @ 2018-10-30 12:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Antoine Tenart, Thomas Petazzoni, Maxime Chevallier,
	linux-arm-kernel, netdev, Grzegorz Jaszczyk, Tomasz Nowicki
In-Reply-To: <7bd72963-1e7c-5175-ba88-fc020d786f0c@arm.com>

[Resend in UTF-8]

Hi Marc,

You use _really_ archaic firmware, the bug you see is 99% caused by a
bug already fixed long time ago (cleanup all PP2 BM pools correctly
during exit boot services). Please grab the latest release:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/wiki/files/flash-image-18.09.4.bin
and let know if you observe any further issues with vanilla kernel.

Best regards,
Marcin

wt., 30 paź 2018 o 13:16 Marc Zyngier <marc.zyngier@arm.com> napisał(a):
>
> Antoine,
>
> On 30/10/18 10:50, Antoine Tenart wrote:
> > Marc,
> >
> > On Mon, Oct 29, 2018 at 03:05:53PM +0000, Marc Zyngier wrote:
> >>
> >> This is a follow-up on the conversation Thomas and I had last week at
> >> ELC, with me ranting at the sorry state of the MVPP2 driver.
> >
> >> Triggering this is dead simple:
> >> - Add a macvtap to one of the MVPP2 interfaces
> >> - Bring it online
> >> - Watch the kernel exploding and memory being corrupted
> >>
> >> You don't even need anything listening on the tap interface, just its
> >> simple existence triggers it. I use a similar setup on a large variety
> >> of machines, and this box is the only one that catches fire. Removing
> >> the macvtap interface makes it (more) reliable.
> >>
> >> Given that I cannot reproduce this issue on any other ARM (32 or 64bit)
> >> platform, including other Marvell stuff, I can only conclude that the
> >> MVPP2 driver is responsible for this.
> >>
> >> Example crash and .config below (4.19 vanilla, as linux/master dies in
> >> new and wonderful ways on this box). I'm looking forward to testing any
> >> idea you may have.
> >
> > I used a 4.19 vanilla kernel, with both your configuration and mine,
> > on 2 different Macchiatobins, but was unable to trigger the issue:
> >
> >   # ip link set eth0 up
> >   # ip link add link eth0 name macvtap0 type macvtap
> >   # ip link set macvtap0 up>
> > I can even configure the eth0/macvtap0 interfaces, and use them
> > generating or receiving tcp/udp/icmp traffic.
> >
> > (I also made other tests using macvtap and tap interfaces).
> >
> > How much memory do you have on the board? What version of ATF are you
> > using? Version of U-Boot?
>
> 4GB of RAM. As for the version numbers, see below. I don't use u-boot,
> but UEFI (EDK-II v2.60). The problem can be reproduced on two different
> machines, with the same configuration (and firmwares dating from a
> similar era):
>
> Starting CP-0 IOROM 1.07
> Booting from SD 0 (0x29)
> Found valid image at boot postion 0x002
> lNOTICE:  Starting binary extension
> NOTICE:  Gathering DRAM information
> mv_ddr: mv_ddr-armada-17.06.1-g47f4c8b (Jun  2 2017 - 17:07:23)
> mv_ddr: completed successfully
> NOTICE:  Booting Trusted Firmware
> NOTICE:  BL1: v1.3(release):armada-17.06.2:297d68f
> NOTICE:  BL1: Built : 17:07:27, Jun  2 2017
> NOTICE:  BL1: Booting BL2
> lNOTICE:  BL2: v1.3(release):armada-17.06.2:297d68f
> NOTICE:  BL2: Built : 17:07:28, Jun  2 2017
> NOTICE:  BL1: Booting BL31
> lNOTICE:  BL31: v1.3(release):armada-17.06.2:297d68f
> NOTICE:  BL31: Built : 17:07:30, Jun  2 2017
> lUEFI firmware (version MARVELL_EFI built at 17:12:21 on Jun  2 2017)
>
> Armada 8040 MachiatoBin Platform Init
>
> Comphy0-0: PCIE0         5 Gbps
> Comphy0-1: PCIE0         5 Gbps
> Comphy0-2: PCIE0         5 Gbps
> Comphy0-3: PCIE0         5 Gbps
> Comphy0-4: SFI           10.31 Gbps
> Comphy0-5: SATA1         5 Gbps
>
> Comphy1-0: SGMII1        1.25 Gbps
> Comphy1-1: SATA2         5 Gbps
> Comphy1-2: USB3_HOST0    5 Gbps
> Comphy1-3: SATA3         5 Gbps
> Comphy1-4: SFI           10.31 Gbps
> Comphy1-5: SGMII2        3.125 Gbps
>
> UTMI PHY 0 initialized to USB Host0
> UTMI PHY 1 initialized to USB Host1
> UTMI PHY 0 initialized to USB Host0
> RTC: Initialize controller 1
> Skip I2c chip 0
> Succesfully installed protocol interfaces
> ramdisk:blckio install. Status=Success
>
> With the latest mainline, and after fixing that other irq affinity
> bug (see patch posted yesterday), I only need to bring the interface
> up, without doing anything else:
>
> # ip link set eth0 up
> [  155.507877] mvpp2 f2000000.ethernet eth0: PHY [f212a600.mdio-mii:00] driver [mv88x3310]
> [  155.526732] mvpp2 f2000000.ethernet eth0: configuring for phy/10gbase-kr link mode
> [  157.592581] mvpp2 f2000000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> [  158.339396] BUG: Bad page state in process swapper/0  pfn:e6804
> [  158.345345] page:ffff7e00039a0100 count:0 mapcount:0 mapping:ffff8000e7bf3b00 index:0xffff8000e6804c00
> [  158.354696] flags: 0xfffc00000000200(slab)
> [  158.358815] raw: 0fffc00000000200 ffff7e00039cff80 0000000400000004 ffff8000e7bf3b00
> [  158.366594] raw: ffff8000e6804c00 000000008010000f 00000000ffffffff 0000000000000000
> [  158.374371] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [  158.380840] bad because of flags: 0x200(slab)
> [  158.385216] Modules linked in:
> [  158.388288] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-09420-g34ae82ac683c #278
> [  158.396148] Hardware name: Marvell 8040 MACCHIATOBin (DT)
> [  158.401567] Call trace:
> [  158.404031]  dump_backtrace+0x0/0x148
> [  158.407708]  show_stack+0x14/0x20
> [  158.411036]  dump_stack+0x90/0xb4
> [  158.414365]  bad_page+0x104/0x130
> [  158.417692]  free_pages_check_bad+0x9c/0xa8
> [  158.421892]  __free_pages_ok+0x1b0/0x450
> [  158.425829]  page_frag_free+0x8c/0xa8
> [  158.429505]  skb_free_head+0x18/0x30
> [  158.433093]  skb_release_data+0x130/0x160
> [  158.437117]  skb_release_all+0x24/0x30
> [  158.440881]  consume_skb+0x2c/0x58
> [  158.444296]  arp_process.constprop.4+0x200/0x6f0
> [  158.448931]  arp_rcv+0xf4/0x128
> [  158.452084]  __netif_receive_skb_one_core+0x54/0x78
> [  158.456981]  __netif_receive_skb+0x14/0x60
> [  158.461094]  netif_receive_skb_internal+0x40/0x138
> [  158.465903]  napi_gro_receive+0x64/0xc8
> [  158.469754]  mvpp2_poll+0x3f4/0x810
> [  158.473255]  net_rx_action+0x104/0x2c0
> [  158.477018]  __do_softirq+0x11c/0x234
> [  158.480695]  irq_exit+0xb8/0xc8
> [  158.483848]  __handle_domain_irq+0x64/0xb8
> [  158.487959]  gic_handle_irq+0x50/0xa0
> [  158.491634]  el1_irq+0xb0/0x128
> [  158.494786]  arch_cpu_idle+0x10/0x18
> [  158.498375]  do_idle+0x208/0x280
> [  158.501615]  cpu_startup_entry+0x20/0x28
> [  158.505553]  rest_init+0xd4/0xe0
> [  158.508793]  arch_call_rest_init+0xc/0x14
> [  158.512818]  start_kernel+0x3d8/0x400
> [  158.516497] Disabling lock debugging due to kernel taint
> [  159.461058] BUG: Bad page state in process swapper/0  pfn:e681d
> [  159.467013] page:ffff7e00039a0740 count:0 mapcount:0 mapping:ffff8000ef43fb00 index:0x0
> [  159.475051] flags: 0xfffc00000000200(slab)
> [  159.479170] raw: 0fffc00000000200 dead000000000100 dead000000000200 ffff8000ef43fb00
> [  159.486947] raw: 0000000000000000 00000000001e001e 00000000ffffffff 0000000000000000
> [  159.494721] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [  159.501189] bad because of flags: 0x200(slab)
> [  159.505566] Modules linked in:
> [  159.508636] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G    B             4.19.0-09420-g34ae82ac683c #278
> [  159.517892] Hardware name: Marvell 8040 MACCHIATOBin (DT)
> [  159.523311] Call trace:
> [  159.525775]  dump_backtrace+0x0/0x148
> [  159.529451]  show_stack+0x14/0x20
> [  159.532779]  dump_stack+0x90/0xb4
> [  159.536106]  bad_page+0x104/0x130
> [  159.539433]  free_pages_check_bad+0x9c/0xa8
> [  159.543633]  __free_pages_ok+0x1b0/0x450
> [  159.547570]  page_frag_free+0x8c/0xa8
> [  159.551247]  skb_free_head+0x18/0x30
> [  159.554836]  skb_release_data+0x130/0x160
> [  159.558860]  skb_release_all+0x24/0x30
> [  159.562623]  kfree_skb+0x2c/0x58
> [  159.565864]  __udp4_lib_rcv+0x850/0x948
> [  159.569713]  udp_rcv+0x1c/0x28
> [  159.572779]  ip_local_deliver_finish+0x100/0x248
> [  159.577414]  ip_local_deliver+0x60/0x110
> [  159.581350]  ip_rcv_finish+0x38/0x50
> [  159.584938]  ip_rcv+0x50/0xd8
> [  159.587918]  __netif_receive_skb_one_core+0x54/0x78
> [  159.592815]  __netif_receive_skb+0x14/0x60
> [  159.596928]  netif_receive_skb_internal+0x40/0x138
> [  159.601738]  napi_gro_receive+0x64/0xc8
> [  159.605589]  mvpp2_poll+0x3f4/0x810
> [  159.609090]  net_rx_action+0x104/0x2c0
> [  159.612853]  __do_softirq+0x11c/0x234
> [  159.616530]  irq_exit+0xb8/0xc8
> [  159.619683]  __handle_domain_irq+0x64/0xb8
> [  159.623794]  gic_handle_irq+0x50/0xa0
> [  159.627470]  el1_irq+0xb0/0x128
> [  159.630622]  arch_cpu_idle+0x10/0x18
> [  159.634211]  do_idle+0x208/0x280
> [  159.637451]  cpu_startup_entry+0x24/0x28
> [  159.641388]  rest_init+0xd4/0xe0
> [  159.644630]  arch_call_rest_init+0xc/0x14
> [  159.648655]  start_kernel+0x3d8/0x400
>
> Bizarrely, eth1 and eth2 do not crash this way. I have no way to test
> eth3 (no transceiver).
>
> Thanks,
>
>         M.
> --
> Jazz is not dead. It just smells funny...

^ permalink raw reply

* RE: [PATCH net-next 5/6] net/ncsi: Reset channel state in ncsi_start_dev()
From: Justin.Lee1 @ 2018-10-30 21:26 UTC (permalink / raw)
  To: sam, netdev; +Cc: davem, linux-kernel, openbmc
In-Reply-To: <20181018035917.19413-6-sam@mendozajonas.com>

> +int ncsi_reset_dev(struct ncsi_dev *nd)
> +{
> +	struct ncsi_dev_priv *ndp = TO_NCSI_DEV_PRIV(nd);
> +	struct ncsi_channel *nc, *active;
> +	struct ncsi_package *np;
> +	unsigned long flags;
> +	bool enabled;
> +	int state;
> +
> +	active = NULL;
> +	NCSI_FOR_EACH_PACKAGE(ndp, np) {
> +		NCSI_FOR_EACH_CHANNEL(np, nc) {
> +			spin_lock_irqsave(&nc->lock, flags);
> +			enabled = nc->monitor.enabled;
> +			state = nc->state;
> +			spin_unlock_irqrestore(&nc->lock, flags);
> +
> +			if (enabled)
> +				ncsi_stop_channel_monitor(nc);
> +			if (state == NCSI_CHANNEL_ACTIVE) {
> +				active = nc;
> +				break;

Is the original intention to process the channel one by one?
If it is the case, there are two loops and we might need to use
"goto found" instead.

> +			}
> +		}
> +	}
> +

found: ?

> +	if (!active) {
> +		/* Done */
> +		spin_lock_irqsave(&ndp->lock, flags);
> +		ndp->flags &= ~NCSI_DEV_RESET;
> +		spin_unlock_irqrestore(&ndp->lock, flags);
> +		return ncsi_choose_active_channel(ndp);
> +	}
> +
> +	spin_lock_irqsave(&ndp->lock, flags);
> +	ndp->flags |= NCSI_DEV_RESET;
> +	ndp->active_channel = active;
> +	ndp->active_package = active->package;
> +	spin_unlock_irqrestore(&ndp->lock, flags);
> +
> +	nd->state = ncsi_dev_state_suspend;
> +	schedule_work(&ndp->work);
> +	return 0;
> +}


Also similar issue in ncsi_choose_active_channel() function below.

> @@ -916,32 +1045,49 @@ static int ncsi_choose_active_channel(struct ncsi_dev_priv *ndp)
>  
>  			ncm = &nc->modes[NCSI_MODE_LINK];
>  			if (ncm->data[2] & 0x1) {
> -				spin_unlock_irqrestore(&nc->lock, flags);
>  				found = nc;
> -				goto out;
> +				with_link = true;
>  			}
>  
> -			spin_unlock_irqrestore(&nc->lock, flags);
> +			/* If multi_channel is enabled configure all valid
> +			 * channels whether or not they currently have link
> +			 * so they will have AENs enabled.
> +			 */
> +			if (with_link || np->multi_channel) {

I notice that there is a case that we will misconfigure the interface.
For example below, multi-channel is not enable for package 1.
But we enable the channel for ncsi2 below (package 1 channel 0) as that interface is the first
channel for that package with link.

cat /sys/kernel/debug/ncsi_protocol/ncsi_device_
IFIDX IFNAME NAME   PID CID RX TX MP MC WP WC PC CS PS LS RU CR NQ HA
=====================================================================
  2   eth2   ncsi0  000 000 1  1  1  1  1  1  0  2  1  1  1  1  0  1
  2   eth2   ncsi1  000 001 1  0  1  1  1  1  0  2  1  1  1  1  0  1
  2   eth2   ncsi2  001 000 1  0  1  0  1  1  0  2  1  1  1  1  0  1
  2   eth2   ncsi3  001 001 0  0  1  0  1  1  0  1  0  1  1  1  0  1
=====================================================================
MP: Multi-mode Package     WP: Whitelist Package
MC: Multi-mode Channel     WC: Whitelist Channel
PC: Primary Channel        CS: Channel State IA/A/IV 1/2/3
PS: Poll Status            LS: Link Status
RU: Running                CR: Carrier OK
NQ: Queue Stopped          HA: Hardware Arbitration

I temporally change to the following to avoid that.
			if ((with_link &&
			     !np->multi_channel &&
			     list_empty(&ndp->channel_queue)) || np->multi_channel) {

> +				spin_lock_irqsave(&ndp->lock, flags);
> +				list_add_tail_rcu(&nc->link,
> +						  &ndp->channel_queue);
> +				spin_unlock_irqrestore(&ndp->lock, flags);
> +
> +				netdev_dbg(ndp->ndev.dev,
> +					   "NCSI: Channel %u added to queue (link %s)\n",
> +					   nc->id,
> +					   ncm->data[2] & 0x1 ? "up" : "down");
> +			}
> +
> +			spin_unlock_irqrestore(&nc->lock, cflags);
> +
> +			if (with_link && !np->multi_channel)
> +				break;

Similar issue here. As we are using break, so each package will configure one active TX.

>  		}
> +		if (with_link && !ndp->multi_package)
> +			break;
>  	}
>  
> -	if (!found) {
> +	if (list_empty(&ndp->channel_queue) && found) {
> +		netdev_info(ndp->ndev.dev,
> +			    "NCSI: No channel with link found, configuring channel %u\n",
> +			    found->id);
> +		spin_lock_irqsave(&ndp->lock, flags);
> +		list_add_tail_rcu(&found->link, &ndp->channel_queue);
> +		spin_unlock_irqrestore(&ndp->lock, flags);
> +	} else if (!found) {
>  		netdev_warn(ndp->ndev.dev,
> -			    "NCSI: No channel found with link\n");
> +			    "NCSI: No channel found to configure!\n");
>  		ncsi_report_link(ndp, true);
>  		return -ENODEV;
>  	}


Also, for deselect package handler function, do we want to set to inactive here?
If we just change the state, the cached data still keeps the old value. If the new 
ncsi_reset_dev() function is handling one by one, can we skip this part?

static int ncsi_rsp_handler_dp(struct ncsi_request *nr)
{
	struct ncsi_rsp_pkt *rsp;
	struct ncsi_dev_priv *ndp = nr->ndp;
	struct ncsi_package *np;
	struct ncsi_channel *nc;
	unsigned long flags;

	/* Find the package */
	rsp = (struct ncsi_rsp_pkt *)skb_network_header(nr->rsp);
	ncsi_find_package_and_channel(ndp, rsp->rsp.common.channel,
				      &np, NULL);
	if (!np)
		return -ENODEV;

	/* Change state of all channels attached to the package */
	NCSI_FOR_EACH_CHANNEL(np, nc) {
		spin_lock_irqsave(&nc->lock, flags);
		nc->state = NCSI_CHANNEL_INACTIVE;

		spin_unlock_irqrestore(&nc->lock, flags);
	}

	return 0;
}

^ permalink raw reply

* Re: [BUG] MVPP2 driver exploding in presence of a tap interface
From: Marc Zyngier @ 2018-10-30 12:16 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: Thomas Petazzoni, Maxime Chevallier, Marcin Wojtas,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org
In-Reply-To: <20181030105015.GB3407@kwain>

Antoine,

On 30/10/18 10:50, Antoine Tenart wrote:
> Marc,
> 
> On Mon, Oct 29, 2018 at 03:05:53PM +0000, Marc Zyngier wrote:
>>
>> This is a follow-up on the conversation Thomas and I had last week at 
>> ELC, with me ranting at the sorry state of the MVPP2 driver.
> 
>> Triggering this is dead simple:
>> - Add a macvtap to one of the MVPP2 interfaces
>> - Bring it online
>> - Watch the kernel exploding and memory being corrupted
>>
>> You don't even need anything listening on the tap interface, just its
>> simple existence triggers it. I use a similar setup on a large variety 
>> of machines, and this box is the only one that catches fire. Removing
>> the macvtap interface makes it (more) reliable.
>>
>> Given that I cannot reproduce this issue on any other ARM (32 or 64bit)
>> platform, including other Marvell stuff, I can only conclude that the
>> MVPP2 driver is responsible for this.
>>
>> Example crash and .config below (4.19 vanilla, as linux/master dies in
>> new and wonderful ways on this box). I'm looking forward to testing any
>> idea you may have.
> 
> I used a 4.19 vanilla kernel, with both your configuration and mine,
> on 2 different Macchiatobins, but was unable to trigger the issue:
> 
>   # ip link set eth0 up
>   # ip link add link eth0 name macvtap0 type macvtap
>   # ip link set macvtap0 up> 
> I can even configure the eth0/macvtap0 interfaces, and use them
> generating or receiving tcp/udp/icmp traffic.
> 
> (I also made other tests using macvtap and tap interfaces).
> 
> How much memory do you have on the board? What version of ATF are you
> using? Version of U-Boot?

4GB of RAM. As for the version numbers, see below. I don't use u-boot, 
but UEFI (EDK-II v2.60). The problem can be reproduced on two different
machines, with the same configuration (and firmwares dating from a 
similar era):

Starting CP-0 IOROM 1.07
Booting from SD 0 (0x29)
Found valid image at boot postion 0x002
lNOTICE:  Starting binary extension
NOTICE:  Gathering DRAM information
mv_ddr: mv_ddr-armada-17.06.1-g47f4c8b (Jun  2 2017 - 17:07:23)
mv_ddr: completed successfully
NOTICE:  Booting Trusted Firmware
NOTICE:  BL1: v1.3(release):armada-17.06.2:297d68f
NOTICE:  BL1: Built : 17:07:27, Jun  2 2017
NOTICE:  BL1: Booting BL2
lNOTICE:  BL2: v1.3(release):armada-17.06.2:297d68f
NOTICE:  BL2: Built : 17:07:28, Jun  2 2017
NOTICE:  BL1: Booting BL31
lNOTICE:  BL31: v1.3(release):armada-17.06.2:297d68f
NOTICE:  BL31: Built : 17:07:30, Jun  2 2017
lUEFI firmware (version MARVELL_EFI built at 17:12:21 on Jun  2 2017)

Armada 8040 MachiatoBin Platform Init

Comphy0-0: PCIE0         5 Gbps    
Comphy0-1: PCIE0         5 Gbps    
Comphy0-2: PCIE0         5 Gbps    
Comphy0-3: PCIE0         5 Gbps    
Comphy0-4: SFI           10.31 Gbps
Comphy0-5: SATA1         5 Gbps    

Comphy1-0: SGMII1        1.25 Gbps 
Comphy1-1: SATA2         5 Gbps    
Comphy1-2: USB3_HOST0    5 Gbps    
Comphy1-3: SATA3         5 Gbps    
Comphy1-4: SFI           10.31 Gbps
Comphy1-5: SGMII2        3.125 Gbps

UTMI PHY 0 initialized to USB Host0
UTMI PHY 1 initialized to USB Host1
UTMI PHY 0 initialized to USB Host0
RTC: Initialize controller 1
Skip I2c chip 0
Succesfully installed protocol interfaces
ramdisk:blckio install. Status=Success

With the latest mainline, and after fixing that other irq affinity
bug (see patch posted yesterday), I only need to bring the interface
up, without doing anything else:

# ip link set eth0 up
[  155.507877] mvpp2 f2000000.ethernet eth0: PHY [f212a600.mdio-mii:00] driver [mv88x3310]
[  155.526732] mvpp2 f2000000.ethernet eth0: configuring for phy/10gbase-kr link mode
[  157.592581] mvpp2 f2000000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[  158.339396] BUG: Bad page state in process swapper/0  pfn:e6804
[  158.345345] page:ffff7e00039a0100 count:0 mapcount:0 mapping:ffff8000e7bf3b00 index:0xffff8000e6804c00
[  158.354696] flags: 0xfffc00000000200(slab)
[  158.358815] raw: 0fffc00000000200 ffff7e00039cff80 0000000400000004 ffff8000e7bf3b00
[  158.366594] raw: ffff8000e6804c00 000000008010000f 00000000ffffffff 0000000000000000
[  158.374371] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[  158.380840] bad because of flags: 0x200(slab)
[  158.385216] Modules linked in:
[  158.388288] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-09420-g34ae82ac683c #278
[  158.396148] Hardware name: Marvell 8040 MACCHIATOBin (DT)
[  158.401567] Call trace:
[  158.404031]  dump_backtrace+0x0/0x148
[  158.407708]  show_stack+0x14/0x20
[  158.411036]  dump_stack+0x90/0xb4
[  158.414365]  bad_page+0x104/0x130
[  158.417692]  free_pages_check_bad+0x9c/0xa8
[  158.421892]  __free_pages_ok+0x1b0/0x450
[  158.425829]  page_frag_free+0x8c/0xa8
[  158.429505]  skb_free_head+0x18/0x30
[  158.433093]  skb_release_data+0x130/0x160
[  158.437117]  skb_release_all+0x24/0x30
[  158.440881]  consume_skb+0x2c/0x58
[  158.444296]  arp_process.constprop.4+0x200/0x6f0
[  158.448931]  arp_rcv+0xf4/0x128
[  158.452084]  __netif_receive_skb_one_core+0x54/0x78
[  158.456981]  __netif_receive_skb+0x14/0x60
[  158.461094]  netif_receive_skb_internal+0x40/0x138
[  158.465903]  napi_gro_receive+0x64/0xc8
[  158.469754]  mvpp2_poll+0x3f4/0x810
[  158.473255]  net_rx_action+0x104/0x2c0
[  158.477018]  __do_softirq+0x11c/0x234
[  158.480695]  irq_exit+0xb8/0xc8
[  158.483848]  __handle_domain_irq+0x64/0xb8
[  158.487959]  gic_handle_irq+0x50/0xa0
[  158.491634]  el1_irq+0xb0/0x128
[  158.494786]  arch_cpu_idle+0x10/0x18
[  158.498375]  do_idle+0x208/0x280
[  158.501615]  cpu_startup_entry+0x20/0x28
[  158.505553]  rest_init+0xd4/0xe0
[  158.508793]  arch_call_rest_init+0xc/0x14
[  158.512818]  start_kernel+0x3d8/0x400
[  158.516497] Disabling lock debugging due to kernel taint
[  159.461058] BUG: Bad page state in process swapper/0  pfn:e681d
[  159.467013] page:ffff7e00039a0740 count:0 mapcount:0 mapping:ffff8000ef43fb00 index:0x0
[  159.475051] flags: 0xfffc00000000200(slab)
[  159.479170] raw: 0fffc00000000200 dead000000000100 dead000000000200 ffff8000ef43fb00
[  159.486947] raw: 0000000000000000 00000000001e001e 00000000ffffffff 0000000000000000
[  159.494721] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[  159.501189] bad because of flags: 0x200(slab)
[  159.505566] Modules linked in:
[  159.508636] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G    B             4.19.0-09420-g34ae82ac683c #278
[  159.517892] Hardware name: Marvell 8040 MACCHIATOBin (DT)
[  159.523311] Call trace:
[  159.525775]  dump_backtrace+0x0/0x148
[  159.529451]  show_stack+0x14/0x20
[  159.532779]  dump_stack+0x90/0xb4
[  159.536106]  bad_page+0x104/0x130
[  159.539433]  free_pages_check_bad+0x9c/0xa8
[  159.543633]  __free_pages_ok+0x1b0/0x450
[  159.547570]  page_frag_free+0x8c/0xa8
[  159.551247]  skb_free_head+0x18/0x30
[  159.554836]  skb_release_data+0x130/0x160
[  159.558860]  skb_release_all+0x24/0x30
[  159.562623]  kfree_skb+0x2c/0x58
[  159.565864]  __udp4_lib_rcv+0x850/0x948
[  159.569713]  udp_rcv+0x1c/0x28
[  159.572779]  ip_local_deliver_finish+0x100/0x248
[  159.577414]  ip_local_deliver+0x60/0x110
[  159.581350]  ip_rcv_finish+0x38/0x50
[  159.584938]  ip_rcv+0x50/0xd8
[  159.587918]  __netif_receive_skb_one_core+0x54/0x78
[  159.592815]  __netif_receive_skb+0x14/0x60
[  159.596928]  netif_receive_skb_internal+0x40/0x138
[  159.601738]  napi_gro_receive+0x64/0xc8
[  159.605589]  mvpp2_poll+0x3f4/0x810
[  159.609090]  net_rx_action+0x104/0x2c0
[  159.612853]  __do_softirq+0x11c/0x234
[  159.616530]  irq_exit+0xb8/0xc8
[  159.619683]  __handle_domain_irq+0x64/0xb8
[  159.623794]  gic_handle_irq+0x50/0xa0
[  159.627470]  el1_irq+0xb0/0x128
[  159.630622]  arch_cpu_idle+0x10/0x18
[  159.634211]  do_idle+0x208/0x280
[  159.637451]  cpu_startup_entry+0x24/0x28
[  159.641388]  rest_init+0xd4/0xe0
[  159.644630]  arch_call_rest_init+0xc/0x14
[  159.648655]  start_kernel+0x3d8/0x400

Bizarrely, eth1 and eth2 do not crash this way. I have no way to test 
eth3 (no transceiver).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply

* Re: [PATCH 0/2] net: xlinx: mdio: recheck condition after timeout
From: Andrew Lunn @ 2018-10-30 12:08 UTC (permalink / raw)
  To: Kurt Kanzenbach
  Cc: Anirudha Sarangi, John Linn, David S. Miller, Michal Simek,
	Radhey Shyam Pandey, YueHaibing, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <20181030093139.10226-1-kurt@linutronix.de>

On Tue, Oct 30, 2018 at 10:31:37AM +0100, Kurt Kanzenbach wrote:
> Hi,
> 
> the Xilinx mdio wait functions may return false positives under certain
> circumstances: If the functions get preempted between reading the corresponding
> mdio register and checking for the timeout, they could falsely indicate a
> timeout.

Hi Kurt

I wonder if it would be possible to add a readx_poll_timeout() which
passes two parameters to op()?

I keep seeing this basic problem in various drivers, and try to point
people towards readx_poll_timeout(), but it is not the best of fit.

Otherwise, could you add a axienet_ior_read_mcr(lp), and use
readx_poll_timeout() as is?

       Andrew

^ permalink raw reply

* Re: [PATCH] xfrm: Fix error return code in xfrm_output_one()
From: Steffen Klassert @ 2018-10-30 11:48 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: Herbert Xu, netdev, kernel-janitors
In-Reply-To: <1540620726-100678-1-git-send-email-weiyongjun1@huawei.com>

On Sat, Oct 27, 2018 at 06:12:06AM +0000, Wei Yongjun wrote:
> xfrm_output_one() does not return a error code when there is
> no dst_entry attached to the skb, it is still possible crash
> with a NULL pointer dereference in xfrm_output_resume(). Fix
> it by return error code -EHOSTUNREACH.
> 
> Fixes: 9e1437937807 ("xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Applied, thanks a lot!

^ permalink raw reply

* Re: [PATCH net v5] net/ipv6: Add anycast addresses to a global hashtable
From: Jeff Barnhill @ 2018-10-30 11:10 UTC (permalink / raw)
  To: davem; +Cc: netdev, Alexey Kuznetsov, yoshfuji
In-Reply-To: <20181029.203211.604037421868394185.davem@davemloft.net>

I originally started implementing it the way you suggested; however,
it seemed to complicate management of that structure because it isn't
currently using rcu.  Also, assuming that can be worked out, where
would I get the net from?  Would I need to store a copy in ifcaddr6,
or is there some way to access it during ipv6_chk_acast_addr()?  It
seems that if I don't add a copy of net, but instead access it through
aca_rt(?), then freeing the ifcaddr6 memory becomes problematic
(detaching it from idev, while read_rcu may still be accessing it).
On Mon, Oct 29, 2018 at 11:32 PM David Miller <davem@davemloft.net> wrote:
>
> From: Jeff Barnhill <0xeffeff@gmail.com>
> Date: Sun, 28 Oct 2018 01:51:59 +0000
>
> > +struct ipv6_ac_addrlist {
> > +     struct in6_addr         acal_addr;
> > +     possible_net_t          acal_pnet;
> > +     refcount_t              acal_users;
> > +     struct hlist_node       acal_lst; /* inet6_acaddr_lst */
> > +     struct rcu_head         rcu;
> > +};
>
> Please just add the hlist to ifcaddr6 instead of duplicating so much
> information and reference counters here.
>
> This seems to waste a lot of memory unnecessary and add lots of
> unnecessary object allocate/setup/destroy logic.

^ permalink raw reply

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
From: Andre Tomt @ 2018-10-30 11:04 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis
In-Reply-To: <869fbb53-a0a5-95f9-2c77-c3ae3f6d181f@tomt.net>

On 30.10.2018 11:58, Andre Tomt wrote:
> On 27.10.2018 23:41, Andre Tomt wrote:
>> On 26.10.2018 13:45, Andre Tomt wrote:
>>> On 25.10.2018 19:38, Eric Dumazet wrote:
>>>>
>>>>
>>>> On 10/24/2018 12:41 PM, Andre Tomt wrote:
>>>>>
>>>>> It eventually showed up again with mlx4, on 4.18.16 + fix and also 
>>>>> on 4.19. I still do not have a useful packet capture.
>>>>>
>>>>> It is running a torrent client serving up various linux distributions.
>>>>>
>>>>
>>>> Have you also applied this fix ?
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 
>>>>
>>>>
>>>
>>> No. I've applied it now to 4.19 and will report back if anything 
>>> shows up.
>>
>> Just hit it on the simpler server; no VRF, no tunnels, no 
>> nat/conntrack. Only a basic stateless nftables ruleset and a vlan 
>> netdev (unlikely to be the one triggering this I guess; it has only v4 
>> traffic).
> 
> I'm currently testing 4.19 with the recomended commit added, plus these 
> to sort out some GRO issues (on a hunch, unsure if related):
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=a8305bff685252e80b7c60f4f5e7dd2e63e38218 
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=992cba7e276d438ac8b0a8c17b147b37c8c286f7 
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ece23711dd956cd5053c9cb03e9fe0668f9c8894 
> 
> 
> and I *think* it is behaving better now? it's not conclusive as it could 
> take a while to trip in this environment but some of the test servers 
> have not shown anything bad in almost 24h.

Sorry, s/some of the/none of the

^ permalink raw reply

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
From: Andre Tomt @ 2018-10-30 10:58 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis
In-Reply-To: <91545596-f932-8834-f613-feda3edc9b84@tomt.net>

On 27.10.2018 23:41, Andre Tomt wrote:
> On 26.10.2018 13:45, Andre Tomt wrote:
>> On 25.10.2018 19:38, Eric Dumazet wrote:
>>>
>>>
>>> On 10/24/2018 12:41 PM, Andre Tomt wrote:
>>>>
>>>> It eventually showed up again with mlx4, on 4.18.16 + fix and also 
>>>> on 4.19. I still do not have a useful packet capture.
>>>>
>>>> It is running a torrent client serving up various linux distributions.
>>>>
>>>
>>> Have you also applied this fix ?
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 
>>>
>>>
>>
>> No. I've applied it now to 4.19 and will report back if anything shows 
>> up.
> 
> Just hit it on the simpler server; no VRF, no tunnels, no nat/conntrack. 
> Only a basic stateless nftables ruleset and a vlan netdev (unlikely to 
> be the one triggering this I guess; it has only v4 traffic).

I'm currently testing 4.19 with the recomended commit added, plus these 
to sort out some GRO issues (on a hunch, unsure if related):
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=a8305bff685252e80b7c60f4f5e7dd2e63e38218
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=992cba7e276d438ac8b0a8c17b147b37c8c286f7
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ece23711dd956cd5053c9cb03e9fe0668f9c8894

and I *think* it is behaving better now? it's not conclusive as it could 
take a while to trip in this environment but some of the test servers 
have not shown anything bad in almost 24h.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox