Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 1/9] net: hns3: Remove error log when getting pfc stats fails
From: Salil Mehta @ 2018-05-01 18:55 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil.lnk, netdev,
	linux-kernel, linuxarm, Yunsheng Lin
In-Reply-To: <20180501185605.9584-1-salil.mehta@huawei.com>

From: Yunsheng Lin <linyunsheng@huawei.com>

When mac supports DCB, but is in GE mode, it does not support
querying pfc stats, firmware returns error when trying to
query the pfc stats. this creates a lot of noise in the kernel
log when it prints the error log.

This patch fixes it by removing the error log, because it already
return the error to the user space, so the user should be aware of
the error.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index 885f25c..c69ecab 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -134,11 +134,8 @@ static int hclge_pfc_stats_get(struct hclge_dev *hdev,
 	}
 
 	ret = hclge_cmd_send(&hdev->hw, desc, HCLGE_TM_PFC_PKT_GET_CMD_NUM);
-	if (ret) {
-		dev_err(&hdev->pdev->dev,
-			"Get pfc pause stats fail, ret = %d.\n", ret);
+	if (ret)
 		return ret;
-	}
 
 	for (i = 0; i < HCLGE_TM_PFC_PKT_GET_CMD_NUM; i++) {
 		struct hclge_pfc_stats_cmd *pfc_stats =
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 2/9] net: hns3: fix to correctly fetch l4 protocol outer header
From: Salil Mehta @ 2018-05-01 18:55 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil.lnk, netdev,
	linux-kernel, linuxarm, Huazhong Tan
In-Reply-To: <20180501185605.9584-1-salil.mehta@huawei.com>

From: Huazhong Tan <tanhuazhong@huawei.com>

This patch fixes the function being used to fetch L4
protocol outer header. Mistakenly skb_inner_transport_header
API was being used earlier.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 8c55965..c4e2950 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -502,7 +502,7 @@ static int hns3_get_l4_protocol(struct sk_buff *skb, u8 *ol4_proto,
 
 	/* find outer header point */
 	l3.hdr = skb_network_header(skb);
-	l4_hdr = skb_inner_transport_header(skb);
+	l4_hdr = skb_transport_header(skb);
 
 	if (skb->protocol == htons(ETH_P_IPV6)) {
 		exthdr = l3.hdr + sizeof(*l3.v6);
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 3/9] net: hns3: Fixes the out of bounds access in hclge_map_tqp
From: Salil Mehta @ 2018-05-01 18:55 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil.lnk, netdev,
	linux-kernel, linuxarm, Huazhong Tan
In-Reply-To: <20180501185605.9584-1-salil.mehta@huawei.com>

From: Huazhong Tan <tanhuazhong@huawei.com>

This patch fixes the handling of the check when number of vports
are detected to be more than available TPQs. Current handling causes
an out of bounds access in hclge_map_tqp().

Fixes: 7df7dad633e2 ("net: hns3: Refactor the mapping of tqp to vport")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 2066dd7..c9e80ca 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1459,8 +1459,11 @@ static int hclge_alloc_vport(struct hclge_dev *hdev)
 	/* We need to alloc a vport for main NIC of PF */
 	num_vport = hdev->num_vmdq_vport + hdev->num_req_vfs + 1;
 
-	if (hdev->num_tqps < num_vport)
-		num_vport = hdev->num_tqps;
+	if (hdev->num_tqps < num_vport) {
+		dev_err(&hdev->pdev->dev, "tqps(%d) is less than vports(%d)",
+			hdev->num_tqps, num_vport);
+		return -EINVAL;
+	}
 
 	/* Alloc the same number of TQPs for every vport */
 	tqp_per_vport = hdev->num_tqps / num_vport;
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 4/9] net: hns3: Fixes the error legs in hclge_init_ae_dev function
From: Salil Mehta @ 2018-05-01 18:56 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil.lnk, netdev,
	linux-kernel, linuxarm, Huazhong Tan
In-Reply-To: <20180501185605.9584-1-salil.mehta@huawei.com>

From: Huazhong Tan <tanhuazhong@huawei.com>

This patch fixes some of the missed error legs in the initialization
function of the ae device. This might cause leaks in case of failure.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer
Support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 55 +++++++++++++---------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index c9e80ca..b5e0c58 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -5430,7 +5430,7 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 	hdev = devm_kzalloc(&pdev->dev, sizeof(*hdev), GFP_KERNEL);
 	if (!hdev) {
 		ret = -ENOMEM;
-		goto err_hclge_dev;
+		goto out;
 	}
 
 	hdev->pdev = pdev;
@@ -5443,38 +5443,38 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 	ret = hclge_pci_init(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "PCI init failed\n");
-		goto err_pci_init;
+		goto out;
 	}
 
 	/* Firmware command queue initialize */
 	ret = hclge_cmd_queue_init(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "Cmd queue init failed, ret = %d.\n", ret);
-		return ret;
+		goto err_pci_uninit;
 	}
 
 	/* Firmware command initialize */
 	ret = hclge_cmd_init(hdev);
 	if (ret)
-		goto err_cmd_init;
+		goto err_cmd_uninit;
 
 	ret = hclge_get_cap(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "get hw capability error, ret = %d.\n",
 			ret);
-		return ret;
+		goto err_cmd_uninit;
 	}
 
 	ret = hclge_configure(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "Configure dev error, ret = %d.\n", ret);
-		return ret;
+		goto err_cmd_uninit;
 	}
 
 	ret = hclge_init_msi(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "Init MSI/MSI-X error, ret = %d.\n", ret);
-		return ret;
+		goto err_cmd_uninit;
 	}
 
 	ret = hclge_misc_irq_init(hdev);
@@ -5482,69 +5482,69 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 		dev_err(&pdev->dev,
 			"Misc IRQ(vector0) init error, ret = %d.\n",
 			ret);
-		return ret;
+		goto err_msi_uninit;
 	}
 
 	ret = hclge_alloc_tqps(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "Allocate TQPs error, ret = %d.\n", ret);
-		return ret;
+		goto err_msi_irq_uninit;
 	}
 
 	ret = hclge_alloc_vport(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "Allocate vport error, ret = %d.\n", ret);
-		return ret;
+		goto err_msi_irq_uninit;
 	}
 
 	ret = hclge_map_tqp(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "Map tqp error, ret = %d.\n", ret);
-		return ret;
+		goto err_sriov_disable;
 	}
 
 	ret = hclge_mac_mdio_config(hdev);
 	if (ret) {
 		dev_warn(&hdev->pdev->dev,
 			 "mdio config fail ret=%d\n", ret);
-		return ret;
+		goto err_sriov_disable;
 	}
 
 	ret = hclge_mac_init(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "Mac init error, ret = %d\n", ret);
-		return ret;
+		goto err_mdiobus_unreg;
 	}
 
 	ret = hclge_config_tso(hdev, HCLGE_TSO_MSS_MIN, HCLGE_TSO_MSS_MAX);
 	if (ret) {
 		dev_err(&pdev->dev, "Enable tso fail, ret =%d\n", ret);
-		return ret;
+		goto err_mdiobus_unreg;
 	}
 
 	ret = hclge_init_vlan_config(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "VLAN init fail, ret =%d\n", ret);
-		return  ret;
+		goto err_mdiobus_unreg;
 	}
 
 	ret = hclge_tm_schd_init(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "tm schd init fail, ret =%d\n", ret);
-		return ret;
+		goto err_mdiobus_unreg;
 	}
 
 	hclge_rss_init_cfg(hdev);
 	ret = hclge_rss_init_hw(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "Rss init fail, ret =%d\n", ret);
-		return ret;
+		goto err_mdiobus_unreg;
 	}
 
 	ret = init_mgr_tbl(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "manager table init fail, ret =%d\n", ret);
-		return ret;
+		goto err_mdiobus_unreg;
 	}
 
 	hclge_dcb_ops_set(hdev);
@@ -5567,11 +5567,24 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 	pr_info("%s driver initialization finished.\n", HCLGE_DRIVER_NAME);
 	return 0;
 
-err_cmd_init:
+err_mdiobus_unreg:
+	if (hdev->hw.mac.phydev)
+		mdiobus_unregister(hdev->hw.mac.mdio_bus);
+err_sriov_disable:
+	if (IS_ENABLED(CONFIG_PCI_IOV))
+		hclge_disable_sriov(hdev);
+err_msi_irq_uninit:
+	hclge_misc_irq_uninit(hdev);
+err_msi_uninit:
+	pci_free_irq_vectors(pdev);
+err_cmd_uninit:
+	hclge_destroy_cmd_queue(&hdev->hw);
+err_pci_uninit:
+	pci_clear_master(pdev);
 	pci_release_regions(pdev);
-err_pci_init:
+	pci_disable_device(pdev);
 	pci_set_drvdata(pdev, NULL);
-err_hclge_dev:
+out:
 	return ret;
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 6/9] net: hns3: Fix to support autoneg only for port attached with phy
From: Salil Mehta @ 2018-05-01 18:56 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil.lnk, netdev,
	linux-kernel, linuxarm, Fuyun Liang
In-Reply-To: <20180501185605.9584-1-salil.mehta@huawei.com>

From: Fuyun Liang <liangfuyun1@huawei.com>

This patch adds a check to support autoneg(ethtool -A) only when PHY
is attached with the port.

Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility
Layer) Support")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index cc09713..a4e9991 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -5169,12 +5169,6 @@ static int hclge_set_pauseparam(struct hnae3_handle *handle, u32 auto_neg,
 	struct phy_device *phydev = hdev->hw.mac.phydev;
 	u32 fc_autoneg;
 
-	/* Only support flow control negotiation for netdev with
-	 * phy attached for now.
-	 */
-	if (!phydev)
-		return -EOPNOTSUPP;
-
 	fc_autoneg = hclge_get_autoneg(handle);
 	if (auto_neg != fc_autoneg) {
 		dev_info(&hdev->pdev->dev,
@@ -5193,6 +5187,12 @@ static int hclge_set_pauseparam(struct hnae3_handle *handle, u32 auto_neg,
 	if (!fc_autoneg)
 		return hclge_cfg_pauseparam(hdev, rx_en, tx_en);
 
+	/* Only support flow control negotiation for netdev with
+	 * phy attached for now.
+	 */
+	if (!phydev)
+		return -EOPNOTSUPP;
+
 	return phy_start_aneg(phydev);
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 7/9] net: hns3: fix a dead loop in hclge_cmd_csq_clean
From: Salil Mehta @ 2018-05-01 18:56 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil.lnk, netdev,
	linux-kernel, linuxarm, Huazhong Tan
In-Reply-To: <20180501185605.9584-1-salil.mehta@huawei.com>

From: Huazhong Tan <tanhuazhong@huawei.com>

If head has invlid value then a dead loop can be triggered in
hclge_cmd_csq_clean. This patch adds sanity check for this case.

Fixes: 68c0a5c70614 ("net: hns3: Add HNS3 IMP(Integrated Mgmt Proc) Cmd
Interface Support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c    | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index ff13d18..fab7068 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -31,6 +31,17 @@ static int hclge_ring_space(struct hclge_cmq_ring *ring)
 	return ring->desc_num - used - 1;
 }
 
+static int is_valid_csq_clean_head(struct hclge_cmq_ring *ring, int h)
+{
+	int u = ring->next_to_use;
+	int c = ring->next_to_clean;
+
+	if (unlikely(h >= ring->desc_num))
+		return 0;
+
+	return u > c ? (h > c && h <= u) : (h > c || h <= u);
+}
+
 static int hclge_alloc_cmd_desc(struct hclge_cmq_ring *ring)
 {
 	int size  = ring->desc_num * sizeof(struct hclge_desc);
@@ -141,6 +152,7 @@ static void hclge_cmd_init_regs(struct hclge_hw *hw)
 
 static int hclge_cmd_csq_clean(struct hclge_hw *hw)
 {
+	struct hclge_dev *hdev = (struct hclge_dev *)hw->back;
 	struct hclge_cmq_ring *csq = &hw->cmq.csq;
 	u16 ntc = csq->next_to_clean;
 	struct hclge_desc *desc;
@@ -149,6 +161,13 @@ static int hclge_cmd_csq_clean(struct hclge_hw *hw)
 
 	desc = &csq->desc[ntc];
 	head = hclge_read_dev(hw, HCLGE_NIC_CSQ_HEAD_REG);
+	rmb(); /* Make sure head is ready before touch any data */
+
+	if (!is_valid_csq_clean_head(csq, head)) {
+		dev_warn(&hdev->pdev->dev, "wrong head (%d, %d-%d)\n", head,
+			   csq->next_to_use, csq->next_to_clean);
+		return 0;
+	}
 
 	while (head != ntc) {
 		memset(desc, 0, sizeof(*desc));
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 8/9] net: hns3: Fix for packet loss due wrong filter config in VLAN tbls
From: Salil Mehta @ 2018-05-01 18:56 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil.lnk, netdev,
	linux-kernel, linuxarm, Yunsheng Lin
In-Reply-To: <20180501185605.9584-1-salil.mehta@huawei.com>

From: Yunsheng Lin <linyunsheng@huawei.com>

There are two level of vlan tables in hardware, one is port vlan
which is shared by all functions, the other one is function
vlan table, each function has it's own function vlan table.
Currently, PF sets the port vlan table, and vf sets the function
vlan table, which will cause packet lost problem.

This patch fixes this problem by setting both vlan table, and
use hdev->vlan_table to manage thet port vlan table.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 79 +++++++++++++++++-----
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  8 ++-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c |  7 +-
 3 files changed, 70 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index a4e9991..77d9e4c0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -4543,8 +4543,9 @@ static void hclge_enable_vlan_filter(struct hnae3_handle *handle, bool enable)
 	hclge_set_vlan_filter_ctrl(hdev, HCLGE_FILTER_TYPE_VF, enable);
 }
 
-int hclge_set_vf_vlan_common(struct hclge_dev *hdev, int vfid,
-			     bool is_kill, u16 vlan, u8 qos, __be16 proto)
+static int hclge_set_vf_vlan_common(struct hclge_dev *hdev, int vfid,
+				    bool is_kill, u16 vlan, u8 qos,
+				    __be16 proto)
 {
 #define HCLGE_MAX_VF_BYTES  16
 	struct hclge_vlan_filter_vf_cfg_cmd *req0;
@@ -4602,12 +4603,9 @@ int hclge_set_vf_vlan_common(struct hclge_dev *hdev, int vfid,
 	return -EIO;
 }
 
-static int hclge_set_port_vlan_filter(struct hnae3_handle *handle,
-				      __be16 proto, u16 vlan_id,
-				      bool is_kill)
+static int hclge_set_port_vlan_filter(struct hclge_dev *hdev, __be16 proto,
+				      u16 vlan_id, bool is_kill)
 {
-	struct hclge_vport *vport = hclge_get_vport(handle);
-	struct hclge_dev *hdev = vport->back;
 	struct hclge_vlan_filter_pf_cfg_cmd *req;
 	struct hclge_desc desc;
 	u8 vlan_offset_byte_val;
@@ -4627,22 +4625,66 @@ static int hclge_set_port_vlan_filter(struct hnae3_handle *handle,
 	req->vlan_offset_bitmap[vlan_offset_byte] = vlan_offset_byte_val;
 
 	ret = hclge_cmd_send(&hdev->hw, &desc, 1);
+	if (ret)
+		dev_err(&hdev->pdev->dev,
+			"port vlan command, send fail, ret =%d.\n", ret);
+	return ret;
+}
+
+static int hclge_set_vlan_filter_hw(struct hclge_dev *hdev, __be16 proto,
+				    u16 vport_id, u16 vlan_id, u8 qos,
+				    bool is_kill)
+{
+	u16 vport_idx, vport_num = 0;
+	int ret;
+
+	ret = hclge_set_vf_vlan_common(hdev, vport_id, is_kill, vlan_id,
+				       0, proto);
 	if (ret) {
 		dev_err(&hdev->pdev->dev,
-			"port vlan command, send fail, ret =%d.\n",
-			ret);
+			"Set %d vport vlan filter config fail, ret =%d.\n",
+			vport_id, ret);
 		return ret;
 	}
 
-	ret = hclge_set_vf_vlan_common(hdev, 0, is_kill, vlan_id, 0, proto);
-	if (ret) {
+	/* vlan 0 may be added twice when 8021q module is enabled */
+	if (!is_kill && !vlan_id &&
+	    test_bit(vport_id, hdev->vlan_table[vlan_id]))
+		return 0;
+
+	if (!is_kill && test_and_set_bit(vport_id, hdev->vlan_table[vlan_id])) {
 		dev_err(&hdev->pdev->dev,
-			"Set pf vlan filter config fail, ret =%d.\n",
-			ret);
-		return -EIO;
+			"Add port vlan failed, vport %d is already in vlan %d\n",
+			vport_id, vlan_id);
+		return -EINVAL;
 	}
 
-	return 0;
+	if (is_kill &&
+	    !test_and_clear_bit(vport_id, hdev->vlan_table[vlan_id])) {
+		dev_err(&hdev->pdev->dev,
+			"Delete port vlan failed, vport %d is not in vlan %d\n",
+			vport_id, vlan_id);
+		return -EINVAL;
+	}
+
+	for_each_set_bit(vport_idx, hdev->vlan_table[vlan_id], VLAN_N_VID)
+		vport_num++;
+
+	if ((is_kill && vport_num == 0) || (!is_kill && vport_num == 1))
+		ret = hclge_set_port_vlan_filter(hdev, proto, vlan_id,
+						 is_kill);
+
+	return ret;
+}
+
+int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
+			  u16 vlan_id, bool is_kill)
+{
+	struct hclge_vport *vport = hclge_get_vport(handle);
+	struct hclge_dev *hdev = vport->back;
+
+	return hclge_set_vlan_filter_hw(hdev, proto, vport->vport_id, vlan_id,
+					0, is_kill);
 }
 
 static int hclge_set_vf_vlan_filter(struct hnae3_handle *handle, int vfid,
@@ -4656,7 +4698,7 @@ static int hclge_set_vf_vlan_filter(struct hnae3_handle *handle, int vfid,
 	if (proto != htons(ETH_P_8021Q))
 		return -EPROTONOSUPPORT;
 
-	return hclge_set_vf_vlan_common(hdev, vfid, false, vlan, qos, proto);
+	return hclge_set_vlan_filter_hw(hdev, proto, vfid, vlan, qos, false);
 }
 
 static int hclge_set_vlan_tx_offload_cfg(struct hclge_vport *vport)
@@ -4821,7 +4863,7 @@ static int hclge_init_vlan_config(struct hclge_dev *hdev)
 	}
 
 	handle = &hdev->vport[0].nic;
-	return hclge_set_port_vlan_filter(handle, htons(ETH_P_8021Q), 0, false);
+	return hclge_set_vlan_filter(handle, htons(ETH_P_8021Q), 0, false);
 }
 
 static int hclge_en_hw_strip_rxvtag(struct hnae3_handle *handle, bool enable)
@@ -5604,6 +5646,7 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev)
 	set_bit(HCLGE_STATE_DOWN, &hdev->state);
 
 	hclge_stats_clear(hdev);
+	memset(hdev->vlan_table, 0, sizeof(hdev->vlan_table));
 
 	ret = hclge_cmd_init(hdev);
 	if (ret) {
@@ -6221,7 +6264,7 @@ static const struct hnae3_ae_ops hclge_ops = {
 	.get_fw_version = hclge_get_fw_version,
 	.get_mdix_mode = hclge_get_mdix_mode,
 	.enable_vlan_filter = hclge_enable_vlan_filter,
-	.set_vlan_filter = hclge_set_port_vlan_filter,
+	.set_vlan_filter = hclge_set_vlan_filter,
 	.set_vf_vlan_filter = hclge_set_vf_vlan_filter,
 	.enable_hw_strip_rxvtag = hclge_en_hw_strip_rxvtag,
 	.reset_event = hclge_reset_event,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 0f4157e..6432f754 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -12,6 +12,8 @@
 #include <linux/fs.h>
 #include <linux/types.h>
 #include <linux/phy.h>
+#include <linux/if_vlan.h>
+
 #include "hclge_cmd.h"
 #include "hnae3.h"
 
@@ -471,6 +473,7 @@ struct hclge_vlan_type_cfg {
 	u16 tx_in_vlan_type;
 };
 
+#define HCLGE_VPORT_NUM 256
 struct hclge_dev {
 	struct pci_dev *pdev;
 	struct hnae3_ae_dev *ae_dev;
@@ -562,6 +565,7 @@ struct hclge_dev {
 
 	u64 rx_pkts_for_led;
 	u64 tx_pkts_for_led;
+	unsigned long vlan_table[VLAN_N_VID][BITS_TO_LONGS(HCLGE_VPORT_NUM)];
 };
 
 /* VPort level vlan tag configuration for TX direction */
@@ -646,8 +650,8 @@ static inline int hclge_get_queue_id(struct hnae3_queue *queue)
 }
 
 int hclge_cfg_mac_speed_dup(struct hclge_dev *hdev, int speed, u8 duplex);
-int hclge_set_vf_vlan_common(struct hclge_dev *vport, int vfid,
-			     bool is_kill, u16 vlan, u8 qos, __be16 proto);
+int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
+			  u16 vlan_id, bool is_kill);
 
 int hclge_buffer_alloc(struct hclge_dev *hdev);
 int hclge_rss_init_hw(struct hclge_dev *hdev);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index a6f7ffa..7563335 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -264,19 +264,18 @@ static int hclge_set_vf_vlan_cfg(struct hclge_vport *vport,
 				 struct hclge_mbx_vf_to_pf_cmd *mbx_req,
 				 bool gen_resp)
 {
-	struct hclge_dev *hdev = vport->back;
 	int status = 0;
 
 	if (mbx_req->msg[1] == HCLGE_MBX_VLAN_FILTER) {
+		struct hnae3_handle *handle = &vport->nic;
 		u16 vlan, proto;
 		bool is_kill;
 
 		is_kill = !!mbx_req->msg[2];
 		memcpy(&vlan, &mbx_req->msg[3], sizeof(vlan));
 		memcpy(&proto, &mbx_req->msg[5], sizeof(proto));
-		status = hclge_set_vf_vlan_common(hdev, vport->vport_id,
-						  is_kill, vlan, 0,
-						  cpu_to_be16(proto));
+		status = hclge_set_vlan_filter(handle, cpu_to_be16(proto),
+					       vlan, is_kill);
 	}
 
 	if (gen_resp)
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 9/9] net: hns3: Remove packet statistics in the range of 8192~12287
From: Salil Mehta @ 2018-05-01 18:56 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil.lnk, netdev,
	linux-kernel, linuxarm, Xi Wang
In-Reply-To: <20180501185605.9584-1-salil.mehta@huawei.com>

From: Xi Wang <wangxi11@huawei.com>

Because the current statistics for size 8192~12287 are only valid for GE,
the ranges of 8192~9216 and 9217~12287 are valid only for LGE/CGE, and are
always 0 for GE interfaces. it is easy to cause confusion when viewing the
packet statistics using the command ethtool -S.

This patch removes the 8192~12287 range of packet statistics and uses the
8192~9216 and 9217~12287 ranges for statistics. This change depends on the
firmware upgrade.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c |  4 ----
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 12 ++++++------
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 77d9e4c0..dd5d65c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -304,8 +304,6 @@ static const struct hclge_comm_stats_str g_mac_stats_string[] = {
 		HCLGE_MAC_STATS_FIELD_OFF(mac_tx_2048_4095_oct_pkt_num)},
 	{"mac_tx_4096_8191_oct_pkt_num",
 		HCLGE_MAC_STATS_FIELD_OFF(mac_tx_4096_8191_oct_pkt_num)},
-	{"mac_tx_8192_12287_oct_pkt_num",
-		HCLGE_MAC_STATS_FIELD_OFF(mac_tx_8192_12287_oct_pkt_num)},
 	{"mac_tx_8192_9216_oct_pkt_num",
 		HCLGE_MAC_STATS_FIELD_OFF(mac_tx_8192_9216_oct_pkt_num)},
 	{"mac_tx_9217_12287_oct_pkt_num",
@@ -356,8 +354,6 @@ static const struct hclge_comm_stats_str g_mac_stats_string[] = {
 		HCLGE_MAC_STATS_FIELD_OFF(mac_rx_2048_4095_oct_pkt_num)},
 	{"mac_rx_4096_8191_oct_pkt_num",
 		HCLGE_MAC_STATS_FIELD_OFF(mac_rx_4096_8191_oct_pkt_num)},
-	{"mac_rx_8192_12287_oct_pkt_num",
-		HCLGE_MAC_STATS_FIELD_OFF(mac_rx_8192_12287_oct_pkt_num)},
 	{"mac_rx_8192_9216_oct_pkt_num",
 		HCLGE_MAC_STATS_FIELD_OFF(mac_rx_8192_9216_oct_pkt_num)},
 	{"mac_rx_9217_12287_oct_pkt_num",
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 6432f754..b7ee91d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -408,9 +408,9 @@ struct hclge_mac_stats {
 	u64 mac_tx_1519_2047_oct_pkt_num;
 	u64 mac_tx_2048_4095_oct_pkt_num;
 	u64 mac_tx_4096_8191_oct_pkt_num;
-	u64 mac_tx_8192_12287_oct_pkt_num; /* valid for GE MAC only */
-	u64 mac_tx_8192_9216_oct_pkt_num; /* valid for LGE & CGE MAC only */
-	u64 mac_tx_9217_12287_oct_pkt_num; /* valid for LGE & CGE MAC */
+	u64 rsv0;
+	u64 mac_tx_8192_9216_oct_pkt_num;
+	u64 mac_tx_9217_12287_oct_pkt_num;
 	u64 mac_tx_12288_16383_oct_pkt_num;
 	u64 mac_tx_1519_max_good_oct_pkt_num;
 	u64 mac_tx_1519_max_bad_oct_pkt_num;
@@ -435,9 +435,9 @@ struct hclge_mac_stats {
 	u64 mac_rx_1519_2047_oct_pkt_num;
 	u64 mac_rx_2048_4095_oct_pkt_num;
 	u64 mac_rx_4096_8191_oct_pkt_num;
-	u64 mac_rx_8192_12287_oct_pkt_num;/* valid for GE MAC only */
-	u64 mac_rx_8192_9216_oct_pkt_num; /* valid for LGE & CGE MAC only */
-	u64 mac_rx_9217_12287_oct_pkt_num; /* valid for LGE & CGE MAC only */
+	u64 rsv1;
+	u64 mac_rx_8192_9216_oct_pkt_num;
+	u64 mac_rx_9217_12287_oct_pkt_num;
 	u64 mac_rx_12288_16383_oct_pkt_num;
 	u64 mac_rx_1519_max_good_oct_pkt_num;
 	u64 mac_rx_1519_max_bad_oct_pkt_num;
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 5/9] net: hns3: fix for phy_addr error in hclge_mac_mdio_config
From: Salil Mehta @ 2018-05-01 18:56 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil.lnk, netdev,
	linux-kernel, linuxarm, Huazhong Tan
In-Reply-To: <20180501185605.9584-1-salil.mehta@huawei.com>

From: Huazhong Tan <tanhuazhong@huawei.com>

When phy exists, phy_addr must less than PHY_MAX_ADDR.
If not, hclge_mac_mdio_config should return error.
And for fiber(phy_addr=0xff), it does not need hclge_mac_mdio_config.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 12 +++++++-----
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c |  7 +++++--
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index b5e0c58..cc09713 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -5503,11 +5503,13 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 		goto err_sriov_disable;
 	}
 
-	ret = hclge_mac_mdio_config(hdev);
-	if (ret) {
-		dev_warn(&hdev->pdev->dev,
-			 "mdio config fail ret=%d\n", ret);
-		goto err_sriov_disable;
+	if (hdev->hw.mac.media_type == HNAE3_MEDIA_TYPE_COPPER) {
+		ret = hclge_mac_mdio_config(hdev);
+		if (ret) {
+			dev_err(&hdev->pdev->dev,
+				"mdio config fail ret=%d\n", ret);
+			goto err_sriov_disable;
+		}
 	}
 
 	ret = hclge_mac_init(hdev);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
index 682c2d6..9f7932e42 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
@@ -140,8 +140,11 @@ int hclge_mac_mdio_config(struct hclge_dev *hdev)
 	struct mii_bus *mdio_bus;
 	int ret;
 
-	if (hdev->hw.mac.phy_addr >= PHY_MAX_ADDR)
-		return 0;
+	if (hdev->hw.mac.phy_addr >= PHY_MAX_ADDR) {
+		dev_err(&hdev->pdev->dev, "phy_addr(%d) is too large.\n",
+			hdev->hw.mac.phy_addr);
+		return -EINVAL;
+	}
 
 	mdio_bus = devm_mdiobus_alloc(&hdev->pdev->dev);
 	if (!mdio_bus)
-- 
2.7.4

^ permalink raw reply related

* Re: Silently dropped UDP packets on kernel 4.14
From: Kristian Evensen @ 2018-05-01 18:58 UTC (permalink / raw)
  To: Netfilter Development Mailing list, Network Development
In-Reply-To: <CAKfDRXjop-G3qOZg+P6Mb4VKjvgJejScv9VHaJHB5OJBnqG30w@mail.gmail.com>

On Tue, May 1, 2018 at 8:50 PM, Kristian Evensen
<kristian.evensen@gmail.com> wrote:
> Does anyone have any idea of what could be wrong, where I should look
> or other things I can try? I tried to space the requests out a bit in
> time (I inserted a sleep 1 between them), and then the problem went
> away.

I should learn to always go through everything one last time before
sending an email. First of all, I see that both requests are treated
as new. Second, on my router, new requests are sent to user space for
marking, which explains the large delay in processing. When removing
the NFQUEUE-rule + handling and marking statically, my problem goes
away and I get an answer for both packets.

However, I do have one question. In my application, both packets are
assigned the same mark. Shouldn't they then match the same conntrack
entry, or am I missing something since that seems not to be the case?

BR,
Kristian

^ permalink raw reply

* Re: [PATCH net-next 0/9] Misc bug fixes for HNS3 Ethernet driver
From: David Miller @ 2018-05-01 19:08 UTC (permalink / raw)
  To: salil.mehta
  Cc: yisen.zhuang, lipeng321, mehta.salil.lnk, netdev, linux-kernel,
	linuxarm
In-Reply-To: <20180501185605.9584-1-salil.mehta@huawei.com>

From: Salil Mehta <salil.mehta@huawei.com>
Date: Tue, 1 May 2018 19:55:56 +0100

> This patch-set presents some miscellaneous bug fixs and cleanups for
> HNS3 Ethernet Driver.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next v6] Add Common Applications Kept Enhanced (cake) qdisc
From: Eric Dumazet @ 2018-05-01 19:12 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Eric Dumazet, Dave Taht,
	Cong Wang
  Cc: Linux Kernel Network Developers, Cake List
In-Reply-To: <878t932ont.fsf@toke.dk>

On 05/01/2018 11:53 AM, Toke Høiland-Jørgensen wrote:

> *sigh* - can do, I guess. Though I'm not sure what that is going to
> accomplish, exactly?

I guess that nobody really wants to really review Cake if
it is a file with 2700 lines of code and hundreds of variables/tunables.

Sure, we have big files in networking land, as a result of thousands of changes.

If you split it, then maybe the work can be split among reviewers as a result.

Or maybe David Miller can simply merge your patch as is, and hope for the best,
I really do not know.

It seems you guys spent years/months on work on this stuff, so what is the big deal
about presenting your work in the best possible way ?

Thanks.

^ permalink raw reply

* Re: [PATCH net-next v6] Add Common Applications Kept Enhanced (cake) qdisc
From: David Miller @ 2018-05-01 19:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: toke, dave.taht, xiyou.wangcong, netdev, cake
In-Reply-To: <4ec8da81-8671-f434-bada-27088b09ce7b@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 1 May 2018 12:12:45 -0700

> It seems you guys spent years/months on work on this stuff, so what
> is the big deal about presenting your work in the best possible way
> ?

+1

^ permalink raw reply

* Re: [PATCH net-next v6] Add Common Applications Kept Enhanced (cake) qdisc
From: Dave Taht @ 2018-05-01 19:22 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Eric Dumazet, Cong Wang, Linux Kernel Network Developers,
	Cake List
In-Reply-To: <878t932ont.fsf@toke.dk>

On Tue, May 1, 2018 at 11:53 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> Eric Dumazet <eric.dumazet@gmail.com> writes:
>
>> On 04/30/2018 02:27 PM, Dave Taht wrote:
>>
>>> I actually have a tc - bpf based ack filter, during the development of
>>> cake's ack-thinner, that I should submit one of these days. It
>>> proved to be of limited use.
>>>
>>> Probably the biggest mistake we made is by calling this cake feature a
>>> filter. It isn't.
>>>
>>> Maybe we should have called it a "thinner" or something like that? In
>>> order to properly "thin" or "reduce" an ack stream
>>> you have to have a queue to look at and some related state. TC filters
>>> do not operate on queues, qdiscs do. Thus the "ack-filter" here is
>>> deeply embedded into cake's flow isolation and queue structures.
>>
>> A feature eating packets _is_ a filter.
>>
>> Given that a qdisc only sees one direction, I really do not get why
>> ack-filter is so desirable in a packet scheduler.
>
> The ACK filter in CAKE is there to solve a very particular use case:
> Residential internet connections with bandwidths so asymmetric that it
> hurts TCP performance. It is not on by default; but rather meant to be
> configured by users which suffer from this particular ISP brokenness
> (which, sadly, does happen; there are ISPs who believe a 50/1 bandwidth
> ratio is reasonable). For those users, the ACK filter can help.
>
> We certainly do not advise people to turn it on in the general case! As
> you point, in general TCP performance is best improved in the TCP stack...
>
>> You have not provided any numbers to show how useful it is to maintain
>> this code
>
> You mean apart from that in the linked blog post and the paper?
> Here's an executive summary:
>
> On a 30/1 Mbps connection with a bidirectional traffic test (RRUL),
> turning on the ACK filter improves downstream throughput by ~20% and
> upstream throughput by ~12% in conservative mode and ~40% in aggressive
> mode, at the cost of ~5ms of inter-flow latency due to the increased
> congestion.

On a simulated 50/1 comcast connection, I got double the throughput
on a similar test, with no obvious glitches in the tcp flow, in the early stages
of development.

http://blog.cerowrt.org/post/ack_filtering/

I was very, very dubious about the value of ack thinning until that point,
but it was hard to argue with the data.

> In *really* pathological cases, the effect can be a lot more; for
> instance, the ACK filter increases the achievable downstream throughput
> on a link with 100 Kbps in the upstream direction by an order of
> magnitude (from ~2.5 Mbps to ~25 Mbps).
>
>> (probably still broken btw, considering it is changing some skb
>> attributes).
>
> As you may have noticed over the last few iterations, I've actually been
> trying to fix any brokenness. If you could be specific as to what is
> still broken, that would be helpful.
>
> (I'm assuming are referring to the calls to skb_set_inner* - but do you
> mean all of them, or just the ones that set inner == outer?)
>
>> On wifi (or any half duplex medium), you might gain something by
>> carefully sending ACK not too often, but ultimately this should be
>> done by TCP stack, in well controlled environment [1], instead of
>> middle-boxes happily playing/breaking some Internet standards.
>>
>> [1] TCP stack has the estimations of RTT, RWIN, throughput, and might
>> be able to avoid flooding the network with acks under some conditions.
>
> You are quite right that in general, TCP performance is best improved in
> the TCP stack. But home users are not generally in control of that; the
> ACK filter helps in those specific deployments (again, it's off by
> default, and you shouldn't turn it on in the general case!).
>
>> Say RTT is 100ms, and we receive 1 packet every 100 usec (no GRO
>> aggregation) Maybe we do not really _need_ to send 5000 ack per second
>> (or even 10,000 ack per second if a hole needs a repair)
>
> Yes, please do fix that.

:) I really would like to see cake tested at 10GigE and above, and its
performance improved overall. I tend to think we need more queues than 1024
at 40GigE+, and we presently run out of cpu (even unshaped) long
before we hit that point.

>
>> Also on wifi, the queue builds in the driver queues anyway, not in the
>> qdisc.
>
> Actually, we've fixed that (for some drivers); there is now no qdisc on
> WiFi interfaces, and we've integrated FQ-CoDel'ed queueing into the
> stack where it can be effective. But no, running CAKE with an ACK filter
> on a WiFi link is not going to be effective. Don't do that.

I share the belief with eric that thinning acks (either at the wifi qdisc or
in tcp) on wifi interfaces will help - given that the underlying wifi layer
is reliable and does retransmits, and the number of packets that can
fit into a wifi aggregate limited,
you really only need one ack per wifi aggregate per flow to keep the
tcp connection running.

That said, I'd much rather see fq_codel work with more wifi drivers
than pursue this.

>
>> So ACK filtering, if _really_ successful, would need to be
>> modularized.

Heh. I keep hoping ISPs will ship symmetric links and wifi antennas

> I really hope ACK filtering won't be "_really_ successful". Again, it is
> not meant to be a feature that's on everywhere, it's targeting a very
> specific use case that CAKE is optimised for: The home router use case.

Please note that I find cake far more general purpose than just this, the
ease of just slamming:

tc qdisc add dev eth0 root cake bandwidth 50mbit

on a link that needs it is far easier than the equivalent htb +
fq_codel + other filters, and more effective.

That mode is with nat on, some diffserv awareness (more correct than
pfifo_fast), no link layer compensation, no ack-filter, and "just
works".

Certainly the major use case to date has been on home routers. Every
feature in cake was based
on extensive feedback from that part of the field.

>> Please split Cake into a patch series.
>> Presumably putting the ack-filter on a patch of its own.
>
> *sigh* - can do, I guess. Though I'm not sure what that is going to
> accomplish, exactly?
>
> -Toke



-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

^ permalink raw reply

* Re: [PATCH V3 net-next 1/2] tcp: send in-queue bytes in cmsg upon read
From: Soheil Hassas Yeganeh @ 2018-05-01 19:28 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet,
	Willem de Bruijn
In-Reply-To: <20180501.143447.737462619555001271.davem@davemloft.net>

On Tue, May 1, 2018 at 2:34 PM, David Miller <davem@davemloft.net> wrote:
> From: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
> Date: Tue,  1 May 2018 10:11:27 -0400
>
>> +static inline int tcp_inq_hint(struct sock *sk)
>
> Please do not use 'inline' in foo.c files, let the compiler decide.
>
> Otherwise looks great, thanks.

Oops, sorry about this. Will send a v4.

^ permalink raw reply

* Re: [PATCH net-next v6] Add Common Applications Kept Enhanced (cake) qdisc
From: Toke Høiland-Jørgensen @ 2018-05-01 19:31 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet, Dave Taht, Cong Wang
  Cc: Linux Kernel Network Developers, Cake List
In-Reply-To: <4ec8da81-8671-f434-bada-27088b09ce7b@gmail.com>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On 05/01/2018 11:53 AM, Toke Høiland-Jørgensen wrote:
>
>> *sigh* - can do, I guess. Though I'm not sure what that is going to
>> accomplish, exactly?
>
>
> I guess that nobody really wants to really review Cake if
> it is a file with 2700 lines of code and hundreds of variables/tunables.
>
> Sure, we have big files in networking land, as a result of thousands
> of changes.
>
> If you split it, then maybe the work can be split among reviewers as a
> result.
>
> Or maybe David Miller can simply merge your patch as is, and hope for
> the best, I really do not know.
>
> It seems you guys spent years/months on work on this stuff, so what is
> the big deal about presenting your work in the best possible way ?

I was objecting to what felt like an arbitrary moving of goal posts
without an explanation. Now that you give one, that is fine of course.
I'll split it an resubmit.

Could you comment on specifically what you believe is broken in this,
please, so I can fix it in the same iteration?

+static inline struct tcphdr *cake_get_tcphdr(struct sk_buff *skb)
+{
+	struct ipv6hdr *ipv6h;
+	struct iphdr *iph;
+
+	/* check IPv6 header size immediately, since for IPv4 we need the space
+	 * for the TCP header anyway
+	 */
+	if (!pskb_may_pull(skb, skb_network_offset(skb) +
+				sizeof(struct ipv6hdr)))
+		return NULL;
+
+	iph = ip_hdr(skb);
+
+	if (iph->version == 4) {
+		/* special-case 6in4 tunnelling, as that is a common way to get
+		 * v6 connectivity in the home
+		 */
+		if (iph->protocol == IPPROTO_IPV6) {
+			if (!pskb_may_pull(skb, (skb_network_offset(skb) +
+						 ip_hdrlen(skb) +
+						 sizeof(struct ipv6hdr))))
+				return NULL;
+
+			ipv6h = (struct ipv6hdr *)(skb_network_header(skb) +
+						   ip_hdrlen(skb));
+
+			if (ipv6h->nexthdr != IPPROTO_TCP)
+				return NULL;
+
+			skb_set_inner_network_header(skb,
+						     skb_network_offset(skb) +
+						     ip_hdrlen(skb));
+			skb_set_inner_transport_header(skb,
+						skb_inner_network_offset(skb) +
+						sizeof(struct ipv6hdr));
+
+		} else if (iph->protocol == IPPROTO_TCP) {
+			/* we always set the inner headers so we can use those
+			 * unconditionally in the filtering logic
+			 */
+			skb_set_inner_network_header(skb,
+						     skb_network_offset(skb));
+			skb_set_inner_transport_header(skb,
+						       skb_network_offset(skb) +
+						       ip_hdrlen(skb));
+		} else {
+			return NULL;
+		}
+
+	} else if (iph->version == 6) {
+		ipv6h = (struct ipv6hdr *)iph;
+
+		if (ipv6h->nexthdr != IPPROTO_TCP)
+			return NULL;
+
+		/* we always set the inner headers so we can use those
+		 * unconditionally in the filtering logic
+		 */
+		skb_set_inner_network_header(skb,
+					     skb_network_offset(skb));
+		skb_set_inner_transport_header(skb,
+					       skb_network_offset(skb) +
+					       sizeof(struct ipv6hdr));
+
+	} else {
+		return NULL;
+	}
+
+	if (!pskb_may_pull(skb, skb_inner_transport_offset(skb) +
+				sizeof(struct tcphdr)) ||
+	    !pskb_may_pull(skb, skb_inner_transport_offset(skb) +
+				inner_tcp_hdrlen(skb)))
+		return NULL;
+
+	return inner_tcp_hdr(skb);
+}


Thanks,

-Toke

^ permalink raw reply

* [PATCH V4 net-next 1/2] tcp: send in-queue bytes in cmsg upon read
From: Soheil Hassas Yeganeh @ 2018-05-01 19:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: ycheng, ncardwell, edumazet, willemb, Soheil Hassas Yeganeh

From: Soheil Hassas Yeganeh <soheil@google.com>

Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read from the socket.

The number of bytes pending on the socket is directly
available through ioctl(FIONREAD/SIOCINQ) and can be
approximated using getsockopt(MEMINFO) (rmem_alloc includes
skb overheads in addition to application data). But, both of
these options add an extra syscall per recvmsg. Moreover,
ioctl(FIONREAD/SIOCINQ) takes the socket lock.

Add the TCP_INQ socket option to TCP. When this socket
option is set, recvmsg() relays the number of bytes available
on the socket for reading to the application via the
TCP_CM_INQ control message.

Calculate the number of bytes after releasing the socket lock
to include the processed backlog, if any. To avoid an extra
branch in the hot path of recvmsg() for this new control
message, move all cmsg processing inside an existing branch for
processing receive timestamps. Since the socket lock is not held
when calculating the size of receive queue, TCP_INQ is a hint.
For example, it can overestimate the queue size by one byte,
if FIN is received.

With this method, applications can start reading from the socket
using a small buffer, and then use larger buffers based on the
remaining data when needed.

V3 change-log:
	As suggested by David Miller, added loads with barrier
	to check whether we have multiple threads calling recvmsg
	in parallel. When that happens we lock the socket to
	calculate inq.
V4 change-log:
	Removed inline from a static function.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Suggested-by: David Miller <davem@davemloft.net>
---
 include/linux/tcp.h      |  2 +-
 include/uapi/linux/tcp.h |  3 +++
 net/ipv4/tcp.c           | 43 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 20585d5c4e1c3..807776928cb86 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -228,7 +228,7 @@ struct tcp_sock {
 		unused:2;
 	u8	nonagle     : 4,/* Disable Nagle algorithm?             */
 		thin_lto    : 1,/* Use linear timeouts for thin streams */
-		unused1	    : 1,
+		recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */
 		repair      : 1,
 		frto        : 1;/* F-RTO (RFC5682) activated in CA_Loss */
 	u8	repair_queue;
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index e9e8373b34b9d..29eb659aa77a1 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -123,6 +123,9 @@ enum {
 #define TCP_FASTOPEN_KEY	33	/* Set the key for Fast Open (cookie) */
 #define TCP_FASTOPEN_NO_COOKIE	34	/* Enable TFO without a TFO cookie */
 #define TCP_ZEROCOPY_RECEIVE	35
+#define TCP_INQ			36	/* Notify bytes available to read as a cmsg on read */
+
+#define TCP_CM_INQ		TCP_INQ
 
 struct tcp_repair_opt {
 	__u32	opt_code;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4028ddd14dd5a..868ed74a76a80 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1889,6 +1889,22 @@ static void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
 	}
 }
 
+static int tcp_inq_hint(struct sock *sk)
+{
+	const struct tcp_sock *tp = tcp_sk(sk);
+	u32 copied_seq = READ_ONCE(tp->copied_seq);
+	u32 rcv_nxt = READ_ONCE(tp->rcv_nxt);
+	int inq;
+
+	inq = rcv_nxt - copied_seq;
+	if (unlikely(inq < 0 || copied_seq != READ_ONCE(tp->copied_seq))) {
+		lock_sock(sk);
+		inq = tp->rcv_nxt - tp->copied_seq;
+		release_sock(sk);
+	}
+	return inq;
+}
+
 /*
  *	This routine copies from a sock struct into the user buffer.
  *
@@ -1905,13 +1921,14 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	u32 peek_seq;
 	u32 *seq;
 	unsigned long used;
-	int err;
+	int err, inq;
 	int target;		/* Read at least this many bytes */
 	long timeo;
 	struct sk_buff *skb, *last;
 	u32 urg_hole = 0;
 	struct scm_timestamping tss;
 	bool has_tss = false;
+	bool has_cmsg;
 
 	if (unlikely(flags & MSG_ERRQUEUE))
 		return inet_recv_error(sk, msg, len, addr_len);
@@ -1926,6 +1943,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	if (sk->sk_state == TCP_LISTEN)
 		goto out;
 
+	has_cmsg = tp->recvmsg_inq;
 	timeo = sock_rcvtimeo(sk, nonblock);
 
 	/* Urgent data needs to be handled specially. */
@@ -2112,6 +2130,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		if (TCP_SKB_CB(skb)->has_rxtstamp) {
 			tcp_update_recv_tstamps(skb, &tss);
 			has_tss = true;
+			has_cmsg = true;
 		}
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 			goto found_fin_ok;
@@ -2131,13 +2150,20 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	 * on connected socket. I was just happy when found this 8) --ANK
 	 */
 
-	if (has_tss)
-		tcp_recv_timestamp(msg, sk, &tss);
-
 	/* Clean up data we have read: This will do ACK frames. */
 	tcp_cleanup_rbuf(sk, copied);
 
 	release_sock(sk);
+
+	if (has_cmsg) {
+		if (has_tss)
+			tcp_recv_timestamp(msg, sk, &tss);
+		if (tp->recvmsg_inq) {
+			inq = tcp_inq_hint(sk);
+			put_cmsg(msg, SOL_TCP, TCP_CM_INQ, sizeof(inq), &inq);
+		}
+	}
+
 	return copied;
 
 out:
@@ -3006,6 +3032,12 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		tp->notsent_lowat = val;
 		sk->sk_write_space(sk);
 		break;
+	case TCP_INQ:
+		if (val > 1 || val < 0)
+			err = -EINVAL;
+		else
+			tp->recvmsg_inq = val;
+		break;
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -3431,6 +3463,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 	case TCP_NOTSENT_LOWAT:
 		val = tp->notsent_lowat;
 		break;
+	case TCP_INQ:
+		val = tp->recvmsg_inq;
+		break;
 	case TCP_SAVE_SYN:
 		val = tp->save_syn;
 		break;
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* [PATCH V4 net-next 2/2] selftest: add test for TCP_INQ
From: Soheil Hassas Yeganeh @ 2018-05-01 19:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: ycheng, ncardwell, edumazet, willemb, Soheil Hassas Yeganeh
In-Reply-To: <20180501193916.113642-1-soheil.kdev@gmail.com>

From: Soheil Hassas Yeganeh <soheil@google.com>

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
---
 tools/testing/selftests/net/Makefile  |   3 +-
 tools/testing/selftests/net/tcp_inq.c | 189 ++++++++++++++++++++++++++
 2 files changed, 191 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/tcp_inq.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index df9102ec7b7af..0a1821f8dfb18 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -9,7 +9,7 @@ TEST_PROGS += fib_tests.sh fib-onlink-tests.sh in_netns.sh pmtu.sh udpgso.sh
 TEST_PROGS += udpgso_bench.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
-TEST_GEN_FILES += tcp_mmap
+TEST_GEN_FILES += tcp_mmap tcp_inq
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict
 TEST_GEN_PROGS += udpgso udpgso_bench_tx udpgso_bench_rx
@@ -18,3 +18,4 @@ include ../lib.mk
 
 $(OUTPUT)/reuseport_bpf_numa: LDFLAGS += -lnuma
 $(OUTPUT)/tcp_mmap: LDFLAGS += -lpthread
+$(OUTPUT)/tcp_inq: LDFLAGS += -lpthread
diff --git a/tools/testing/selftests/net/tcp_inq.c b/tools/testing/selftests/net/tcp_inq.c
new file mode 100644
index 0000000000000..d044b29ddabcc
--- /dev/null
+++ b/tools/testing/selftests/net/tcp_inq.c
@@ -0,0 +1,189 @@
+/*
+ * Copyright 2018 Google Inc.
+ * Author: Soheil Hassas Yeganeh (soheil@google.com)
+ *
+ * Simple example on how to use TCP_INQ and TCP_CM_INQ.
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ */
+#define _GNU_SOURCE
+
+#include <error.h>
+#include <netinet/in.h>
+#include <netinet/tcp.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+#ifndef TCP_INQ
+#define TCP_INQ 36
+#endif
+
+#ifndef TCP_CM_INQ
+#define TCP_CM_INQ TCP_INQ
+#endif
+
+#define BUF_SIZE 8192
+#define CMSG_SIZE 32
+
+static int family = AF_INET6;
+static socklen_t addr_len = sizeof(struct sockaddr_in6);
+static int port = 4974;
+
+static void setup_loopback_addr(int family, struct sockaddr_storage *sockaddr)
+{
+	struct sockaddr_in6 *addr6 = (void *) sockaddr;
+	struct sockaddr_in *addr4 = (void *) sockaddr;
+
+	switch (family) {
+	case PF_INET:
+		memset(addr4, 0, sizeof(*addr4));
+		addr4->sin_family = AF_INET;
+		addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+		addr4->sin_port = htons(port);
+		break;
+	case PF_INET6:
+		memset(addr6, 0, sizeof(*addr6));
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_addr = in6addr_loopback;
+		addr6->sin6_port = htons(port);
+		break;
+	default:
+		error(1, 0, "illegal family");
+	}
+}
+
+void *start_server(void *arg)
+{
+	int server_fd = (int)(unsigned long)arg;
+	struct sockaddr_in addr;
+	socklen_t addrlen = sizeof(addr);
+	char *buf;
+	int fd;
+	int r;
+
+	buf = malloc(BUF_SIZE);
+
+	for (;;) {
+		fd = accept(server_fd, (struct sockaddr *)&addr, &addrlen);
+		if (fd == -1) {
+			perror("accept");
+			break;
+		}
+		do {
+			r = send(fd, buf, BUF_SIZE, 0);
+		} while (r < 0 && errno == EINTR);
+		if (r < 0)
+			perror("send");
+		if (r != BUF_SIZE)
+			fprintf(stderr, "can only send %d bytes\n", r);
+		/* TCP_INQ can overestimate in-queue by one byte if we send
+		 * the FIN packet. Sleep for 1 second, so that the client
+		 * likely invoked recvmsg().
+		 */
+		sleep(1);
+		close(fd);
+	}
+
+	free(buf);
+	close(server_fd);
+	pthread_exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+	struct sockaddr_storage listen_addr, addr;
+	int c, one = 1, inq = -1;
+	pthread_t server_thread;
+	char cmsgbuf[CMSG_SIZE];
+	struct iovec iov[1];
+	struct cmsghdr *cm;
+	struct msghdr msg;
+	int server_fd, fd;
+	char *buf;
+
+	while ((c = getopt(argc, argv, "46p:")) != -1) {
+		switch (c) {
+		case '4':
+			family = PF_INET;
+			addr_len = sizeof(struct sockaddr_in);
+			break;
+		case '6':
+			family = PF_INET6;
+			addr_len = sizeof(struct sockaddr_in6);
+			break;
+		case 'p':
+			port = atoi(optarg);
+			break;
+		}
+	}
+
+	server_fd = socket(family, SOCK_STREAM, 0);
+	if (server_fd < 0)
+		error(1, errno, "server socket");
+	setup_loopback_addr(family, &listen_addr);
+	if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR,
+		       &one, sizeof(one)) != 0)
+		error(1, errno, "setsockopt(SO_REUSEADDR)");
+	if (bind(server_fd, (const struct sockaddr *)&listen_addr,
+		 addr_len) == -1)
+		error(1, errno, "bind");
+	if (listen(server_fd, 128) == -1)
+		error(1, errno, "listen");
+	if (pthread_create(&server_thread, NULL, start_server,
+			   (void *)(unsigned long)server_fd) != 0)
+		error(1, errno, "pthread_create");
+
+	fd = socket(family, SOCK_STREAM, 0);
+	if (fd < 0)
+		error(1, errno, "client socket");
+	setup_loopback_addr(family, &addr);
+	if (connect(fd, (const struct sockaddr *)&addr, addr_len) == -1)
+		error(1, errno, "connect");
+	if (setsockopt(fd, SOL_TCP, TCP_INQ, &one, sizeof(one)) != 0)
+		error(1, errno, "setsockopt(TCP_INQ)");
+
+	msg.msg_name = NULL;
+	msg.msg_namelen = 0;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsgbuf;
+	msg.msg_controllen = sizeof(cmsgbuf);
+	msg.msg_flags = 0;
+
+	buf = malloc(BUF_SIZE);
+	iov[0].iov_base = buf;
+	iov[0].iov_len = BUF_SIZE / 2;
+
+	if (recvmsg(fd, &msg, 0) != iov[0].iov_len)
+		error(1, errno, "recvmsg");
+	if (msg.msg_flags & MSG_CTRUNC)
+		error(1, 0, "control message is truncated");
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm))
+		if (cm->cmsg_level == SOL_TCP && cm->cmsg_type == TCP_CM_INQ)
+			inq = *((int *) CMSG_DATA(cm));
+
+	if (inq != BUF_SIZE - iov[0].iov_len) {
+		fprintf(stderr, "unexpected inq: %d\n", inq);
+		exit(1);
+	}
+
+	printf("PASSED\n");
+	free(buf);
+	close(fd);
+	return 0;
+}
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* Re: [PATCH net-next v6] Add Common Applications Kept Enhanced (cake) qdisc
From: Eric Dumazet @ 2018-05-01 19:41 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Eric Dumazet, Dave Taht,
	Cong Wang
  Cc: Linux Kernel Network Developers, Cake List
In-Reply-To: <871sev2mvx.fsf@toke.dk>



On 05/01/2018 12:31 PM, Toke Høiland-Jørgensen wrote:

> Could you comment on specifically what you believe is broken in this,
> please, so I can fix it in the same iteration?
> 

Apart from the various pskb_may_pull() this helper should not change skb layout.

Ideally, the skb should be const and you would use skb_header_pointer() to make clear
you do not ever write this skb.

This would make the reviewer job pretty easy, as no side effect can possibly happen.


> +static inline struct tcphdr *cake_get_tcphdr(struct sk_buff *skb)
> +{
> +	struct ipv6hdr *ipv6h;
> +	struct iphdr *iph;
> +
> +	/* check IPv6 header size immediately, since for IPv4 we need the space
> +	 * for the TCP header anyway
> +	 */
> +	if (!pskb_may_pull(skb, skb_network_offset(skb) +
> +				sizeof(struct ipv6hdr)))
> +		return NULL;
> +
> +	iph = ip_hdr(skb);
> +
> +	if (iph->version == 4) {
> +		/* special-case 6in4 tunnelling, as that is a common way to get
> +		 * v6 connectivity in the home
> +		 */
> +		if (iph->protocol == IPPROTO_IPV6) {
> +			if (!pskb_may_pull(skb, (skb_network_offset(skb) +
> +						 ip_hdrlen(skb) +
> +						 sizeof(struct ipv6hdr))))
> +				return NULL;
> +
> +			ipv6h = (struct ipv6hdr *)(skb_network_header(skb) +
> +						   ip_hdrlen(skb));
> +
> +			if (ipv6h->nexthdr != IPPROTO_TCP)
> +				return NULL;
> +
> +			skb_set_inner_network_header(skb,
> +						     skb_network_offset(skb) +
> +						     ip_hdrlen(skb));
> +			skb_set_inner_transport_header(skb,
> +						skb_inner_network_offset(skb) +
> +						sizeof(struct ipv6hdr));

This is not allowed for a dissector.

^ permalink raw reply

* Re: [RFC net-next 0/5] Support for PHY test modes
From: Andrew Lunn @ 2018-05-01 19:59 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: David Miller, netdev, rmk, linux-kernel, cphealy, nikita.yoush,
	vivien.didelot, Nisar.Sayed, UNGLinuxDriver
In-Reply-To: <cd910177-bff9-261a-78e7-aa0a2c6532b5@gmail.com>

> # echo 4 > /sys/class/net/gphy/operstate
> # ip link show gphy
> 4: gphy@eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
> qdisc noqueue switchid 00000000 state TESTING mode DEFAULT group default
> qlen 1000
>     link/ether 00:10:18:de:38:1f brd ff:ff:ff:ff:ff:ff

This looks good.

I stopped using ifconfig years ago, so i personally don't care about
it. If you are using old tools, you have to expect some surprises and
missing information.

	Andrew

^ permalink raw reply

* [PATCH net] net/tls: Don't recursively call push_record during tls_write_space callbacks
From: Dave Watson @ 2018-05-01 20:05 UTC (permalink / raw)
  To: David S. Miller, Andre Tomt, netdev, borisp, Aviad Yehezkel

It is reported that in some cases, write_space may be called in
do_tcp_sendpages, such that we recursively invoke do_tcp_sendpages again:

[  660.468802]  ? do_tcp_sendpages+0x8d/0x580
[  660.468826]  ? tls_push_sg+0x74/0x130 [tls]
[  660.468852]  ? tls_push_record+0x24a/0x390 [tls]
[  660.468880]  ? tls_write_space+0x6a/0x80 [tls]
...

tls_push_sg already does a loop over all sending sg's, so ignore
any tls_write_space notifications until we are done sending.
We then have to call the previous write_space to wake up
poll() waiters after we are done with the send loop.

Reported-by: Andre Tomt <andre@tomt.net>
Signed-off-by: Dave Watson <davejwatson@fb.com>
---
 include/net/tls.h  | 1 +
 net/tls/tls_main.c | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/include/net/tls.h b/include/net/tls.h
index 3da8e13..b400d0bb 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -148,6 +148,7 @@ struct tls_context {
 	struct scatterlist *partially_sent_record;
 	u16 partially_sent_offset;
 	unsigned long flags;
+	bool in_tcp_sendpages;
 
 	u16 pending_open_record_frags;
 	int (*push_pending_record)(struct sock *sk, int flags);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 0d37997..cc03e00 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -114,6 +114,7 @@ int tls_push_sg(struct sock *sk,
 	size = sg->length - offset;
 	offset += sg->offset;
 
+	ctx->in_tcp_sendpages = true;
 	while (1) {
 		if (sg_is_last(sg))
 			sendpage_flags = flags;
@@ -148,6 +149,8 @@ int tls_push_sg(struct sock *sk,
 	}
 
 	clear_bit(TLS_PENDING_CLOSED_RECORD, &ctx->flags);
+	ctx->in_tcp_sendpages = false;
+	ctx->sk_write_space(sk);
 
 	return 0;
 }
@@ -217,6 +220,10 @@ static void tls_write_space(struct sock *sk)
 {
 	struct tls_context *ctx = tls_get_ctx(sk);
 
+	/* We are already sending pages, ignore notification */
+	if (ctx->in_tcp_sendpages)
+		return;
+
 	if (!sk->sk_write_pending && tls_is_pending_closed_record(ctx)) {
 		gfp_t sk_allocation = sk->sk_allocation;
 		int rc;
-- 
2.9.5

^ permalink raw reply related

* RE: [RFC net-next 4/5] net: phy: Add support for IEEE standard test modes
From: Woojung.Huh @ 2018-05-01 20:07 UTC (permalink / raw)
  To: f.fainelli, netdev
  Cc: andrew, rmk, linux-kernel, davem, cphealy, nikita.yoush,
	vivien.didelot, Nisar.Sayed, UNGLinuxDriver
In-Reply-To: <66f30fca-f485-7f52-7441-3c8cf1718640@gmail.com>

Hi Florian,

> Not sure I completely understand your suggestion, do you mean that I
> should break down the body of that function above such that there are
> per-speed lower level functions? Something like the pseudo-code below:
> 
> genphy_set_test() {
> 	switch (mode) {
> 	case PHY_STD_TEST_MODE_100BASET2_1:
> 	..
> 	case PHY_STD_TEST_MODE_100BASET2_3:
> 		return genphy_set_100baset2();
> 
> 	case PHY_STD_TEST_MODE_1000BASET_1:
> 	..
> 	case PHY_STD_TEST_MODE_1000BASET_4:
> 		return genphy_set_1000baset();
> 
> 	case PHY_STD_TEST_MODE_8021BWQCQ_1:
> 		return genphy_set_100baset1();
> 
> }
Yes, I should write pseudo code. Sorry about confusion.
User can override this function or expand to other modes.

Thanks.
Woojung


^ permalink raw reply

* [PATCH bpf-nex] tools: bpftool: change time format for program 'loaded at:' information
From: Quentin Monnet @ 2018-05-01 20:18 UTC (permalink / raw)
  To: ast, borkmann; +Cc: netdev, oss-drivers, quentin.monnet

To make eBPF program load time easier to parse from "bpftool prog"
output for machines, change the time format used by the program. The
format now differs for plain and JSON version:

- Plain version uses a string formatted according to ISO 8601.
- JSON uses the number of seconds since the Epoch, wich is less friendly
  for humans but even easier to process.

Example output:

    # ./bpftool prog
    41298: xdp  tag a04f5eef06a7f555 dev foo
            loaded_at 2018-04-18T17:19:47+0100  uid 0
            xlated 16B  not jited  memlock 4096B

    # ./bpftool prog -p
    [{
            "id": 41298,
            "type": "xdp",
            "tag": "a04f5eef06a7f555",
            "gpl_compatible": false,
            "dev": {
                "ifindex": 14,
                "ns_dev": 3,
                "ns_inode": 4026531993,
                "ifname": "foo"
            },
            "loaded_at": 1524068387,
            "uid": 0,
            "bytes_xlated": 16,
            "jited": false,
            "bytes_memlock": 4096
        }
    ]

Previously, "Apr 18/17:19" would be used at both places.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/prog.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index e71a0a11afde..9bdfdf2d3fbe 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -96,7 +96,10 @@ static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
 		return;
 	}
 
-	strftime(buf, size, "%b %d/%H:%M", &load_tm);
+	if (json_output)
+		strftime(buf, size, "%s", &load_tm);
+	else
+		strftime(buf, size, "%FT%T%z", &load_tm);
 }
 
 static int prog_fd_by_tag(unsigned char *tag)
@@ -245,7 +248,8 @@ static void print_prog_json(struct bpf_prog_info *info, int fd)
 		print_boot_time(info->load_time, buf, sizeof(buf));
 
 		/* Piggy back on load_time, since 0 uid is a valid one */
-		jsonw_string_field(json_wtr, "loaded_at", buf);
+		jsonw_name(json_wtr, "loaded_at");
+		jsonw_printf(json_wtr, "%s", buf);
 		jsonw_uint_field(json_wtr, "uid", info->created_by_uid);
 	}
 
-- 
2.7.4

^ permalink raw reply related

* Re: [RFC PATCH v3 bpf-next 2/5] bpf/verifier: rewrite subprog boundary detection
From: Edward Cree @ 2018-05-01 20:40 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Daniel Borkmann, netdev
In-Reply-To: <20180417234826.egydr2sg2rewzvyu@ast-mbp>

On 18/04/18 00:48, Alexei Starovoitov wrote:
> as I was saying before this is no go.
> subprogno is meaningless in the hierarchy of: prog -> func -> bb -> insn
> Soon bpf will have libraries and this field would need to become
> a pointer back to bb or func structure creating unnecessary circular dependency.
I'm afraid I don't follow.  Why can't func numbers (and later, bb numbers)
 be per-prog?  When verifier is linking multiple progs together it will
 necessarily have the subprog-info for each prog, and when making cross-
 prog calls it'll have to already know which prog it's calling into; I
 don't see any reason why the index into a prog's subprog_info array
 should become "meaningless" in such a setup.
Besides, subprogno is how the rest of the verifier currently identifies a
 func, and in the absence of any indication of how anything different
 will be implemented, that's what an incremental patch has to work with.

If you're worried about the SPOT violation from having both
 aux->subprogno and subprog_info->start... well, we could actually get
 rid of the latter!  Uses of it are:
* carving up insn arrays in jit_subprogs().  Could be done based on
  aux->subprogno instead (v1 of this series did that)
* checking CALL destination is at start of function.  That could be done
  by putting a flag in the aux_data to mark "this insn is at the start of
  its subprog".  Doesn't even need to increase memory usage: it could be
  done by ORing a flag (0x8000, say) into aux->subprogno; or we could
  replace 'bool seen;' with 'u8 flags;' again with no extra memory used.
* a few verbose() messages.
That would have another nice consequence, in that adjust_subprog_starts()
 could go away - another code simplification resulting from use of the
 right data structures.

Btw, sorry for delay in responding; got bogged down in some sfc driver
 work.

-Ed

^ permalink raw reply

* Re: [PATCH net-next v7 1/3] vmcore: add API to collect hardware dump in second kernel
From: kbuild test robot @ 2018-05-01 20:44 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: kbuild-all, netdev, kexec, linux-fsdevel, linux-kernel, davem,
	viro, ebiederm, stephen, akpm, torvalds, ganeshgr, nirranjan,
	indranil, Rahul Lakkireddy
In-Reply-To: <b6065b53c5446d98ee55e09119f6821f979418d4.1525197408.git.rahul.lakkireddy@chelsio.com>

[-- Attachment #1: Type: text/plain, Size: 730 bytes --]

Hi Rahul,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Rahul-Lakkireddy/vmcore-add-API-to-collect-hardware-dump-in-second-kernel/20180502-023638
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

>> ./usr/include/linux/vmcore.h:9: found __[us]{8,16,32,64} type without #include <linux/types.h>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6293 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox