Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH V3 net-next 04/10] net: hns3: change GFP flag during lock period
From: Huazhong Tan @ 2019-07-27  5:46 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
	Yufeng Mo, lipeng 00277521, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>

From: Yufeng Mo <moyufeng@huawei.com>

When allocating memory, the GFP_KERNEL cannot be used during the
spin_lock period. This is because it may cause scheduling when holding
spin_lock. This patch changes GFP flag to GFP_ATOMIC in this case.

Fixes: dd74f815dd41 ("net: hns3: Add support for rule add/delete for flow director")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: lipeng 00277521 <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 3c64d70..14199c4 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -5796,7 +5796,7 @@ static int hclge_add_fd_entry_by_arfs(struct hnae3_handle *handle, u16 queue_id,
 			return -ENOSPC;
 		}
 
-		rule = kzalloc(sizeof(*rule), GFP_KERNEL);
+		rule = kzalloc(sizeof(*rule), GFP_ATOMIC);
 		if (!rule) {
 			spin_unlock_bh(&hdev->fd_rule_lock);
 
-- 
2.7.4


^ permalink raw reply related

* [PATCH V3 net-next 06/10] net: hns3: add debug messages to identify eth down cause
From: Huazhong Tan @ 2019-07-27  5:46 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
	Yonglong Liu, Peng Li, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>

From: Yonglong Liu <liuyonglong@huawei.com>

Some times just see the eth interface have been down/up via
dmesg, but can not know why the eth down. So adds some debug
messages to identify the cause for this.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c       | 18 ++++++++++++++++++
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c    | 19 +++++++++++++++++++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c    | 11 +++++++++++
 3 files changed, 48 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 4d58c53..973c57b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -459,6 +459,9 @@ static int hns3_nic_net_open(struct net_device *netdev)
 		h->ae_algo->ops->set_timer_task(priv->ae_handle, true);
 
 	hns3_config_xps(priv);
+
+	netif_info(h, drv, netdev, "net open\n");
+
 	return 0;
 }
 
@@ -519,6 +522,8 @@ static int hns3_nic_net_stop(struct net_device *netdev)
 	if (test_and_set_bit(HNS3_NIC_STATE_DOWN, &priv->state))
 		return 0;
 
+	netif_info(h, drv, netdev, "net stop\n");
+
 	if (h->ae_algo->ops->set_timer_task)
 		h->ae_algo->ops->set_timer_task(priv->ae_handle, false);
 
@@ -1550,6 +1555,8 @@ static int hns3_setup_tc(struct net_device *netdev, void *type_data)
 	h = hns3_get_handle(netdev);
 	kinfo = &h->kinfo;
 
+	netif_info(h, drv, netdev, "setup tc: num_tc=%u\n", tc);
+
 	return (kinfo->dcb_ops && kinfo->dcb_ops->setup_tc) ?
 		kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : -EOPNOTSUPP;
 }
@@ -1593,6 +1600,10 @@ static int hns3_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan,
 	struct hnae3_handle *h = hns3_get_handle(netdev);
 	int ret = -EIO;
 
+	netif_info(h, drv, netdev,
+		   "set vf vlan: vf=%d, vlan=%u, qos=%u, vlan_proto=%u\n",
+		   vf, vlan, qos, vlan_proto);
+
 	if (h->ae_algo->ops->set_vf_vlan_filter)
 		ret = h->ae_algo->ops->set_vf_vlan_filter(h, vf, vlan,
 							  qos, vlan_proto);
@@ -1611,6 +1622,9 @@ static int hns3_nic_change_mtu(struct net_device *netdev, int new_mtu)
 	if (!h->ae_algo->ops->set_mtu)
 		return -EOPNOTSUPP;
 
+	netif_info(h, drv, netdev,
+		   "change mtu from %u to %d\n", netdev->mtu, new_mtu);
+
 	ret = h->ae_algo->ops->set_mtu(h, new_mtu);
 	if (ret)
 		netdev_err(netdev, "failed to change MTU in hardware %d\n",
@@ -4395,6 +4409,10 @@ int hns3_set_channels(struct net_device *netdev,
 	if (kinfo->rss_size == new_tqp_num)
 		return 0;
 
+	netif_info(h, drv, netdev,
+		   "set channels: tqp_num=%u, rxfh=%d\n",
+		   new_tqp_num, rxfh_configured);
+
 	ret = hns3_reset_notify(h, HNAE3_DOWN_CLIENT);
 	if (ret)
 		return ret;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index e71c92b..8553200 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -311,6 +311,8 @@ static void hns3_self_test(struct net_device *ndev,
 	if (eth_test->flags != ETH_TEST_FL_OFFLINE)
 		return;
 
+	netif_info(h, drv, ndev, "self test start");
+
 	st_param[HNAE3_LOOP_APP][0] = HNAE3_LOOP_APP;
 	st_param[HNAE3_LOOP_APP][1] =
 			h->flags & HNAE3_SUPPORT_APP_LOOPBACK;
@@ -374,6 +376,8 @@ static void hns3_self_test(struct net_device *ndev,
 
 	if (if_running)
 		ndev->netdev_ops->ndo_open(ndev);
+
+	netif_info(h, drv, ndev, "self test end\n");
 }
 
 static int hns3_get_sset_count(struct net_device *netdev, int stringset)
@@ -604,6 +608,10 @@ static int hns3_set_pauseparam(struct net_device *netdev,
 {
 	struct hnae3_handle *h = hns3_get_handle(netdev);
 
+	netif_info(h, drv, netdev,
+		   "set pauseparam: autoneg=%u, rx:%u, tx:%u\n",
+		   param->autoneg, param->rx_pause, param->tx_pause);
+
 	if (h->ae_algo->ops->set_pauseparam)
 		return h->ae_algo->ops->set_pauseparam(h, param->autoneg,
 						       param->rx_pause,
@@ -743,6 +751,11 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
 	if (cmd->base.speed == SPEED_1000 && cmd->base.duplex == DUPLEX_HALF)
 		return -EINVAL;
 
+	netif_info(handle, drv, netdev,
+		   "set link(%s): autoneg=%u, speed=%u, duplex=%u\n",
+		   netdev->phydev ? "phy" : "mac",
+		   cmd->base.autoneg, cmd->base.speed, cmd->base.duplex);
+
 	/* Only support ksettings_set for netdev with phy attached for now */
 	if (netdev->phydev)
 		return phy_ethtool_ksettings_set(netdev->phydev, cmd);
@@ -984,6 +997,9 @@ static int hns3_nway_reset(struct net_device *netdev)
 		return -EINVAL;
 	}
 
+	netif_info(handle, drv, netdev,
+		   "nway reset (using %s)\n", phy ? "phy" : "mac");
+
 	if (phy)
 		return genphy_restart_aneg(phy);
 
@@ -1308,6 +1324,9 @@ static int hns3_set_fecparam(struct net_device *netdev,
 	if (!ops->set_fec)
 		return -EOPNOTSUPP;
 	fec_mode = eth_to_loc_fec(fec->fec);
+
+	netif_info(handle, drv, netdev, "set fecparam: mode=%u\n", fec_mode);
+
 	return ops->set_fec(handle, fec_mode);
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
index bac4ce1..59774e1 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
@@ -201,6 +201,7 @@ static int hclge_client_setup_tc(struct hclge_dev *hdev)
 static int hclge_ieee_setets(struct hnae3_handle *h, struct ieee_ets *ets)
 {
 	struct hclge_vport *vport = hclge_get_vport(h);
+	struct net_device *netdev = h->kinfo.netdev;
 	struct hclge_dev *hdev = vport->back;
 	bool map_changed = false;
 	u8 num_tc = 0;
@@ -215,6 +216,8 @@ static int hclge_ieee_setets(struct hnae3_handle *h, struct ieee_ets *ets)
 		return ret;
 
 	if (map_changed) {
+		netif_info(h, drv, netdev, "set ets\n");
+
 		ret = hclge_notify_client(hdev, HNAE3_DOWN_CLIENT);
 		if (ret)
 			return ret;
@@ -300,6 +303,7 @@ static int hclge_ieee_getpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
 static int hclge_ieee_setpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
 {
 	struct hclge_vport *vport = hclge_get_vport(h);
+	struct net_device *netdev = h->kinfo.netdev;
 	struct hclge_dev *hdev = vport->back;
 	u8 i, j, pfc_map, *prio_tc;
 
@@ -325,6 +329,10 @@ static int hclge_ieee_setpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
 	hdev->tm_info.hw_pfc_map = pfc_map;
 	hdev->tm_info.pfc_en = pfc->pfc_en;
 
+	netif_info(h, drv, netdev,
+		   "set pfc: pfc_en=%u, pfc_map=%u, num_tc=%u\n",
+		   pfc->pfc_en, pfc_map, hdev->tm_info.num_tc);
+
 	hclge_tm_pfc_info_update(hdev);
 
 	return hclge_pause_setup_hw(hdev, false);
@@ -345,8 +353,11 @@ static u8 hclge_getdcbx(struct hnae3_handle *h)
 static u8 hclge_setdcbx(struct hnae3_handle *h, u8 mode)
 {
 	struct hclge_vport *vport = hclge_get_vport(h);
+	struct net_device *netdev = h->kinfo.netdev;
 	struct hclge_dev *hdev = vport->back;
 
+	netif_info(h, drv, netdev, "set dcbx: mode=%u\n", mode);
+
 	/* No support for LLD_MANAGED modes or CEE */
 	if ((mode & DCB_CAP_DCBX_LLD_MANAGED) ||
 	    (mode & DCB_CAP_DCBX_VER_CEE) ||
-- 
2.7.4


^ permalink raw reply related

* [PATCH V3 net-next 02/10] net: hns3: add a check for get_reset_level
From: Huazhong Tan @ 2019-07-27  5:46 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
	Guangbin Huang, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin@huawei.com>

For some cases, ops->get_reset_level may not be implemented, so we
should check whether it is NULL before calling get_reset_level.

Signed-off-by: Guangbin Huang <huangguangbin@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 08af782..4d58c53 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1963,7 +1963,7 @@ static pci_ers_result_t hns3_slot_reset(struct pci_dev *pdev)
 
 	ops = ae_dev->ops;
 	/* request the reset */
-	if (ops->reset_event) {
+	if (ops->reset_event && ops->get_reset_level) {
 		if (ae_dev->hw_err_reset_req) {
 			reset_type = ops->get_reset_level(ae_dev,
 						&ae_dev->hw_err_reset_req);
-- 
2.7.4


^ permalink raw reply related

* [PATCH V3 net-next 05/10] net: hns3: modify firmware version display format
From: Huazhong Tan @ 2019-07-27  5:46 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
	Yufeng Mo, Peng Li, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>

From: Yufeng Mo <moyufeng@huawei.com>

This patch modifies firmware version display format in
hclge(vf)_cmd_init() and hns3_get_drvinfo(). Also, adds
some optimizations for firmware version display format.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h              |  9 +++++++++
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c       | 15 +++++++++++++--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c   | 10 +++++++++-
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c | 10 +++++++++-
 4 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 48c7b70..a4624db 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -179,6 +179,15 @@ struct hnae3_vector_info {
 #define HNAE3_RING_GL_RX 0
 #define HNAE3_RING_GL_TX 1
 
+#define HNAE3_FW_VERSION_BYTE3_SHIFT	24
+#define HNAE3_FW_VERSION_BYTE3_MASK	GENMASK(31, 24)
+#define HNAE3_FW_VERSION_BYTE2_SHIFT	16
+#define HNAE3_FW_VERSION_BYTE2_MASK	GENMASK(23, 16)
+#define HNAE3_FW_VERSION_BYTE1_SHIFT	8
+#define HNAE3_FW_VERSION_BYTE1_MASK	GENMASK(15, 8)
+#define HNAE3_FW_VERSION_BYTE0_SHIFT	0
+#define HNAE3_FW_VERSION_BYTE0_MASK	GENMASK(7, 0)
+
 struct hnae3_ring_chain_node {
 	struct hnae3_ring_chain_node *next;
 	u32 tqp_index;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 5bff98a..e71c92b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -527,6 +527,7 @@ static void hns3_get_drvinfo(struct net_device *netdev,
 {
 	struct hns3_nic_priv *priv = netdev_priv(netdev);
 	struct hnae3_handle *h = priv->ae_handle;
+	u32 fw_version;
 
 	if (!h->ae_algo->ops->get_fw_version) {
 		netdev_err(netdev, "could not get fw version!\n");
@@ -545,8 +546,18 @@ static void hns3_get_drvinfo(struct net_device *netdev,
 		sizeof(drvinfo->bus_info));
 	drvinfo->bus_info[ETHTOOL_BUSINFO_LEN - 1] = '\0';
 
-	snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), "0x%08x",
-		 priv->ae_handle->ae_algo->ops->get_fw_version(h));
+	fw_version = priv->ae_handle->ae_algo->ops->get_fw_version(h);
+
+	snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
+		 "%lu.%lu.%lu.%lu",
+		 hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE3_MASK,
+				 HNAE3_FW_VERSION_BYTE3_SHIFT),
+		 hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE2_MASK,
+				 HNAE3_FW_VERSION_BYTE2_SHIFT),
+		 hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE1_MASK,
+				 HNAE3_FW_VERSION_BYTE1_SHIFT),
+		 hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE0_MASK,
+				 HNAE3_FW_VERSION_BYTE0_SHIFT));
 }
 
 static u32 hns3_get_link(struct net_device *netdev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index 22f6acd..d9858f2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -419,7 +419,15 @@ int hclge_cmd_init(struct hclge_dev *hdev)
 	}
 	hdev->fw_version = version;
 
-	dev_info(&hdev->pdev->dev, "The firmware version is %08x\n", version);
+	dev_info(&hdev->pdev->dev, "The firmware version is %lu.%lu.%lu.%lu\n",
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE3_MASK,
+				 HNAE3_FW_VERSION_BYTE3_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE2_MASK,
+				 HNAE3_FW_VERSION_BYTE2_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE1_MASK,
+				 HNAE3_FW_VERSION_BYTE1_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE0_MASK,
+				 HNAE3_FW_VERSION_BYTE0_SHIFT));
 
 	return 0;
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
index 652b796..8f21eb3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
@@ -405,7 +405,15 @@ int hclgevf_cmd_init(struct hclgevf_dev *hdev)
 	}
 	hdev->fw_version = version;
 
-	dev_info(&hdev->pdev->dev, "The firmware version is %08x\n", version);
+	dev_info(&hdev->pdev->dev, "The firmware version is %lu.%lu.%lu.%lu\n",
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE3_MASK,
+				 HNAE3_FW_VERSION_BYTE3_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE2_MASK,
+				 HNAE3_FW_VERSION_BYTE2_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE1_MASK,
+				 HNAE3_FW_VERSION_BYTE1_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE0_MASK,
+				 HNAE3_FW_VERSION_BYTE0_SHIFT));
 
 	return 0;
 
-- 
2.7.4


^ permalink raw reply related

* [PATCH V3 net-next 03/10] net: hns3: remove upgrade reset level when reset fail
From: Huazhong Tan @ 2019-07-27  5:46 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
	Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>

Currently, hclge_reset_err_handle() will assert a global reset
when the failing count is smaller than MAX_RESET_FAIL_CNT, which
will affect other running functions.

So this patch removes this upgrading, and uses re-scheduling reset
task to do it.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 28 +++++++---------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 3fde5471..3c64d70 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3305,7 +3305,7 @@ static int hclge_reset_prepare_wait(struct hclge_dev *hdev)
 	return ret;
 }
 
-static bool hclge_reset_err_handle(struct hclge_dev *hdev, bool is_timeout)
+static bool hclge_reset_err_handle(struct hclge_dev *hdev)
 {
 #define MAX_RESET_FAIL_CNT 5
 
@@ -3322,20 +3322,11 @@ static bool hclge_reset_err_handle(struct hclge_dev *hdev, bool is_timeout)
 		return false;
 	} else if (hdev->reset_fail_cnt < MAX_RESET_FAIL_CNT) {
 		hdev->reset_fail_cnt++;
-		if (is_timeout) {
-			set_bit(hdev->reset_type, &hdev->reset_pending);
-			dev_info(&hdev->pdev->dev,
-				 "re-schedule to wait for hw reset done\n");
-			return true;
-		}
-
-		dev_info(&hdev->pdev->dev, "Upgrade reset level\n");
-		hclge_clear_reset_cause(hdev);
-		set_bit(HNAE3_GLOBAL_RESET, &hdev->default_reset_request);
-		mod_timer(&hdev->reset_timer,
-			  jiffies + HCLGE_RESET_INTERVAL);
-
-		return false;
+		set_bit(hdev->reset_type, &hdev->reset_pending);
+		dev_info(&hdev->pdev->dev,
+			 "re-schedule reset task(%d)\n",
+			 hdev->reset_fail_cnt);
+		return true;
 	}
 
 	hclge_clear_reset_cause(hdev);
@@ -3382,7 +3373,6 @@ static int hclge_reset_stack(struct hclge_dev *hdev)
 static void hclge_reset(struct hclge_dev *hdev)
 {
 	struct hnae3_ae_dev *ae_dev = pci_get_drvdata(hdev->pdev);
-	bool is_timeout = false;
 	int ret;
 
 	/* Initialize ae_dev reset status as well, in case enet layer wants to
@@ -3410,10 +3400,8 @@ static void hclge_reset(struct hclge_dev *hdev)
 	if (ret)
 		goto err_reset;
 
-	if (hclge_reset_wait(hdev)) {
-		is_timeout = true;
+	if (hclge_reset_wait(hdev))
 		goto err_reset;
-	}
 
 	hdev->rst_stats.hw_reset_done_cnt++;
 
@@ -3465,7 +3453,7 @@ static void hclge_reset(struct hclge_dev *hdev)
 err_reset_lock:
 	rtnl_unlock();
 err_reset:
-	if (hclge_reset_err_handle(hdev, is_timeout))
+	if (hclge_reset_err_handle(hdev))
 		hclge_reset_task_schedule(hdev);
 }
 
-- 
2.7.4


^ permalink raw reply related

* [PATCH V3 net-next 01/10] net: hns3: add reset checking before set channels
From: Huazhong Tan @ 2019-07-27  5:46 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
	Jian Shen, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>

From: Jian Shen <shenjian15@huawei.com>

hns3_set_channels() should check the resetting status firstly,
since the device will reinitialize when resetting. If the
reset has not completed, the hns3_set_channels() may access
invalid memory.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 69f7ef8..08af782 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -4378,6 +4378,9 @@ int hns3_set_channels(struct net_device *netdev,
 	u16 org_tqp_num;
 	int ret;
 
+	if (hns3_nic_resetting(netdev))
+		return -EBUSY;
+
 	if (ch->rx_count || ch->tx_count)
 		return -EINVAL;
 
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: kbuild test robot @ 2019-07-27  5:20 UTC (permalink / raw)
  To: Himadri Pandya
  Cc: kbuild-all, mikelley, kys, haiyangz, sthemmin, sashal, davem,
	linux-hyperv, netdev, linux-kernel, Himadri Pandya
In-Reply-To: <20190725051125.10605-1-himadri18.07@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4160 bytes --]

Hi Himadri,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc1 next-20190726]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Himadri-Pandya/hv_sock-use-HV_HYP_PAGE_SIZE-instead-of-PAGE_SIZE_4K/20190726-085229
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> net/vmw_vsock/hyperv_transport.c:58:28: error: 'HV_HYP_PAGE_SIZE' undeclared here (not in a function); did you mean 'HV_MESSAGE_SIZE'?
    #define HVS_SEND_BUF_SIZE (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
                               ^
>> net/vmw_vsock/hyperv_transport.c:65:10: note: in expansion of macro 'HVS_SEND_BUF_SIZE'
     u8 data[HVS_SEND_BUF_SIZE];
             ^~~~~~~~~~~~~~~~~
   In file included from include/linux/list.h:9:0,
                    from include/linux/module.h:9,
                    from net/vmw_vsock/hyperv_transport.c:11:
   net/vmw_vsock/hyperv_transport.c: In function 'hvs_open_connection':
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
     __builtin_choose_expr(__safe_cmp(x, y), \
     ^
   include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
    #define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
                              ^~~~~~~~~~~~~
>> net/vmw_vsock/hyperv_transport.c:390:12: note: in expansion of macro 'max_t'
      sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
               ^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
     __builtin_choose_expr(__safe_cmp(x, y), \
     ^
   include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
    #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
                              ^~~~~~~~~~~~~
>> net/vmw_vsock/hyperv_transport.c:391:12: note: in expansion of macro 'min_t'
      sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
               ^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
     __builtin_choose_expr(__safe_cmp(x, y), \
     ^
   include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
    #define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
                              ^~~~~~~~~~~~~
   net/vmw_vsock/hyperv_transport.c:393:12: note: in expansion of macro 'max_t'
      rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
               ^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
     __builtin_choose_expr(__safe_cmp(x, y), \
     ^
   include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
    #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
                              ^~~~~~~~~~~~~
   net/vmw_vsock/hyperv_transport.c:394:12: note: in expansion of macro 'min_t'
      rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
               ^~~~~
   net/vmw_vsock/hyperv_transport.c: In function 'hvs_stream_enqueue':
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
     __builtin_choose_expr(__safe_cmp(x, y), \
     ^
   include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
    #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
                              ^~~~~~~~~~~~~
   net/vmw_vsock/hyperv_transport.c:681:14: note: in expansion of macro 'min_t'
      to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
                 ^~~~~

vim +58 net/vmw_vsock/hyperv_transport.c

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69531 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next 01/10] libbpf: add .BTF.ext offset relocation section loading
From: Andrii Nakryiko @ 2019-07-27  5:11 UTC (permalink / raw)
  To: Song Liu
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song, Kernel Team
In-Reply-To: <B01B98E5-CDFB-4E3A-BD58-DBA3113C3C3F@fb.com>

On Wed, Jul 24, 2019 at 10:20 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Jul 24, 2019, at 5:37 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Jul 24, 2019 at 5:00 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >>
> >>
> >>> On Jul 24, 2019, at 12:27 PM, Andrii Nakryiko <andriin@fb.com> wrote:
> >>>
> >>> Add support for BPF CO-RE offset relocations. Add section/record
> >>> iteration macros for .BTF.ext. These macro are useful for iterating over
> >>> each .BTF.ext record, either for dumping out contents or later for BPF
> >>> CO-RE relocation handling.
> >>>
> >>> To enable other parts of libbpf to work with .BTF.ext contents, moved
> >>> a bunch of type definitions into libbpf_internal.h.
> >>>
> >>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >>> ---
> >>> tools/lib/bpf/btf.c             | 64 +++++++++--------------
> >>> tools/lib/bpf/btf.h             |  4 ++
> >>> tools/lib/bpf/libbpf_internal.h | 91 +++++++++++++++++++++++++++++++++
> >>> 3 files changed, 118 insertions(+), 41 deletions(-)
> >>>
> >
> > [...]
> >
> >>> +
> >>> static int btf_ext_parse_hdr(__u8 *data, __u32 data_size)
> >>> {
> >>>      const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
> >>> @@ -1004,6 +979,13 @@ struct btf_ext *btf_ext__new(__u8 *data, __u32 size)
> >>>      if (err)
> >>>              goto done;
> >>>
> >>> +     /* check if there is offset_reloc_off/offset_reloc_len fields */
> >>> +     if (btf_ext->hdr->hdr_len < sizeof(struct btf_ext_header))
> >>
> >> This check will break when we add more optional sections to btf_ext_header.
> >> Maybe use offsetof() instead?
> >
> > I didn't do it, because there are no fields after offset_reloc_len.
> > But now I though that maybe it would be ok to add zero-sized marker
> > field, kind of like marking off various versions of btf_ext header?
> >
> > Alternatively, I can add offsetofend() macro somewhere in libbpf_internal.h.
> >
> > Do you have any preference?
>
> We only need a stable number to compare against. offsetofend() works.
> Or we can simply have something like
>
>     if (btf_ext->hdr->hdr_len <= offsetof(struct btf_ext_header, offset_reloc_off))
>           goto done;
> or
>     if (btf_ext->hdr->hdr_len < offsetof(struct btf_ext_header, offset_reloc_len))
>           goto done;
>
> Does this make sense?

I think offsetofend() is the cleanest solution, I'll do just that.

>
> Thanks,
> Song

^ permalink raw reply

* Re: [PATCH V2 net-next 07/11] net: hns3: adds debug messages to identify eth down cause
From: Joe Perches @ 2019-07-27  3:14 UTC (permalink / raw)
  To: liuyonglong, Saeed Mahameed, tanhuazhong@huawei.com,
	davem@davemloft.net
  Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
	salil.mehta@huawei.com, linuxarm@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <f517dc69-6356-98fe-fb7a-0427728814bb@huawei.com>

On Sat, 2019-07-27 at 10:28 +0800, liuyonglong wrote:
> On 2019/7/27 6:18, Joe Perches wrote:
> > On Fri, 2019-07-26 at 22:00 +0000, Saeed Mahameed wrote:
> > > On Fri, 2019-07-26 at 11:24 +0800, Huazhong Tan wrote:
> > > > From: Yonglong Liu <liuyonglong@huawei.com>
> > > > 
> > > > Some times just see the eth interface have been down/up via
> > > > dmesg, but can not know why the eth down. So adds some debug
> > > > messages to identify the cause for this.
> > []
> > > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> > > []
> > > > @@ -459,6 +459,10 @@ static int hns3_nic_net_open(struct net_device
> > > > *netdev)
> > > >  		h->ae_algo->ops->set_timer_task(priv->ae_handle, true);
> > > >  
> > > >  	hns3_config_xps(priv);
> > > > +
> > > > +	if (netif_msg_drv(h))
> > > > +		netdev_info(netdev, "net open\n");
> > > > +
> > > 
> > > to make sure this is only intended for debug, and to avoid repetition.
> > > #define hns3_dbg(__dev, format, args...)			\
> > > ({								\
> > > 	if (netif_msg_drv(h))					\
> > > 		netdev_info(h->netdev, format, ##args);         \
> > > })
> > 
> > 	netif_dbg(h, drv, h->netdev, "net open\n")
> > 
> 
> Hi, Saeed && Joe:
> For our cases, maybe netif_info() can be use for HNS3 drivers?
> netif_dbg need to open dynamic debug options additional.

Your code, your choice.

I do think littering dmesg with "net open" style messages
and such may be unnecessary.  KERN_DEBUG seems a more
appropriate log level.



^ permalink raw reply

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Andrew Lunn @ 2019-07-27  3:02 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Horatiu Vultur, Nikolay Aleksandrov, roopa, davem, bridge, netdev,
	linux-kernel
In-Reply-To: <20190726195010.7x75rr74v7ph3m6m@lx-anielsen.microsemi.net>

> As you properly guessed, this model is quite different from what we are used to.

Yes, it takes a while to get the idea that the hardware is just an
accelerator for what the Linux stack can already do. And if the switch
cannot do some feature, pass the frame to Linux so it can handle it.

You need to keep in mind that there could be other ports in the bridge
than switch ports, and those ports might be interested in the
multicast traffic. Hence the CPU needs to see the traffic. But IGMP
snooping can be used to optimise this. But you still need to be
careful, eg. IPv6 Neighbour discovery has often been broken on
mv88e6xxx because we have been too aggressive with filtering
multicast.

	Andrew

^ permalink raw reply

* Re: memory leak in kobject_set_name_vargs (2)
From: Qian Cai @ 2019-07-27  2:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: syzbot, Catalin Marinas, David Miller, Dmitry Vyukov, Herbert Xu,
	kuznet, Kalle Valo, Linux List Kernel Mailing, Linux-MM,
	luciano.coelho, Netdev, steffen.klassert, syzkaller-bugs,
	yoshfuji, Wang Hai, Andy Shevchenko, David S. Miller
In-Reply-To: <CAHk-=why-PdP_HNbskRADMp1bnj+FwUDYpUZSYoNLNHMRPtoVA@mail.gmail.com>



> On Jul 26, 2019, at 10:29 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> On Fri, Jul 26, 2019 at 4:26 PM syzbot
> <syzbot+ad8ca40ecd77896d51e2@syzkaller.appspotmail.com> wrote:
>> 
>> syzbot has bisected this bug to:
>> 
>> commit 0e034f5c4bc408c943f9c4a06244415d75d7108c
>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>> Date:   Wed May 18 18:51:25 2016 +0000
>> 
>>     iwlwifi: fix mis-merge that breaks the driver
> 
> While this bisection looks more likely than the other syzbot entry
> that bisected to a version change, I don't think it is correct eitger.
> 
> The bisection ended up doing a lot of "git bisect skip" because of the
> 
>    undefined reference to `nf_nat_icmp_reply_translation'
> 
> issue. Also, the memory leak doesn't seem to be entirely reliable:
> when the bisect does 10 runs to verify that some test kernel is bad,
> there are a couple of cases where only one or two of the ten run
> failed.
> 
> Which makes me wonder if one or two of the "everything OK" runs were
> actually buggy, but just happened to have all ten pass…

Real bisection should point to,

8ed633b9baf9e (“Revert "net-sysfs: Fix memory leak in netdev_register_kobject”")

I did encounter those memory leak and comes up with a similar fix in,

6b70fc94afd1 ("net-sysfs: Fix memory leak in netdev_register_kobject”)

but those error handling paths are tricky that seems nobody did much testing there, so it will
keep hitting other bugs in upper functions.

^ permalink raw reply

* Re: memory leak in kobject_set_name_vargs (2)
From: Linus Torvalds @ 2019-07-27  2:29 UTC (permalink / raw)
  To: syzbot
  Cc: Catalin Marinas, David Miller, Dmitry Vyukov, Herbert Xu, kuznet,
	Kalle Valo, Linux List Kernel Mailing, Linux-MM, luciano.coelho,
	Netdev, steffen.klassert, syzkaller-bugs, yoshfuji
In-Reply-To: <00000000000083ffc4058e9dddf0@google.com>

On Fri, Jul 26, 2019 at 4:26 PM syzbot
<syzbot+ad8ca40ecd77896d51e2@syzkaller.appspotmail.com> wrote:
>
> syzbot has bisected this bug to:
>
> commit 0e034f5c4bc408c943f9c4a06244415d75d7108c
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Wed May 18 18:51:25 2016 +0000
>
>      iwlwifi: fix mis-merge that breaks the driver

While this bisection looks more likely than the other syzbot entry
that bisected to a version change, I don't think it is correct eitger.

The bisection ended up doing a lot of "git bisect skip" because of the

    undefined reference to `nf_nat_icmp_reply_translation'

issue. Also, the memory leak doesn't seem to be entirely reliable:
when the bisect does 10 runs to verify that some test kernel is bad,
there are a couple of cases where only one or two of the ten run
failed.

Which makes me wonder if one or two of the "everything OK" runs were
actually buggy, but just happened to have all ten pass...

               Linus

^ permalink raw reply

* Re: [PATCH V2 net-next 07/11] net: hns3: adds debug messages to identify eth down cause
From: liuyonglong @ 2019-07-27  2:28 UTC (permalink / raw)
  To: Joe Perches, Saeed Mahameed, tanhuazhong@huawei.com,
	davem@davemloft.net
  Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
	salil.mehta@huawei.com, linuxarm@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <05602c954c689ffcd796e9468c52bca6fa4efe3f.camel@perches.com>



On 2019/7/27 6:18, Joe Perches wrote:
> On Fri, 2019-07-26 at 22:00 +0000, Saeed Mahameed wrote:
>> On Fri, 2019-07-26 at 11:24 +0800, Huazhong Tan wrote:
>>> From: Yonglong Liu <liuyonglong@huawei.com>
>>>
>>> Some times just see the eth interface have been down/up via
>>> dmesg, but can not know why the eth down. So adds some debug
>>> messages to identify the cause for this.
> []
>>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
>> []
>>> @@ -459,6 +459,10 @@ static int hns3_nic_net_open(struct net_device
>>> *netdev)
>>>  		h->ae_algo->ops->set_timer_task(priv->ae_handle, true);
>>>  
>>>  	hns3_config_xps(priv);
>>> +
>>> +	if (netif_msg_drv(h))
>>> +		netdev_info(netdev, "net open\n");
>>> +
>>
>> to make sure this is only intended for debug, and to avoid repetition.
>> #define hns3_dbg(__dev, format, args...)			\
>> ({								\
>> 	if (netif_msg_drv(h))					\
>> 		netdev_info(h->netdev, format, ##args);         \
>> })
> 
> 	netif_dbg(h, drv, h->netdev, "net open\n")
> 

Hi, Saeed && Joe:
For our cases, maybe netif_info() can be use for HNS3 drivers?
netif_dbg need to open dynamic debug options additional.


^ permalink raw reply

* Re: [PATCH bpf-next v5 0/6] xdp: Add devmap_hash map type
From: Alexei Starovoitov @ 2019-07-27  2:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Alexei Starovoitov, Network Development,
	David Miller, Jesper Dangaard Brouer, Jakub Kicinski,
	Björn Töpel, Yonghong Song
In-Reply-To: <156415721066.13581.737309854787645225.stgit@alrua-x1>

On Fri, Jul 26, 2019 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> This series adds a new map type, devmap_hash, that works like the existing
> devmap type, but using a hash-based indexing scheme. This is useful for the use
> case where a devmap is indexed by ifindex (for instance for use with the routing
> table lookup helper). For this use case, the regular devmap needs to be sized
> after the maximum ifindex number, not the number of devices in it. A hash-based
> indexing scheme makes it possible to size the map after the number of devices it
> should contain instead.
>
> This was previously part of my patch series that also turned the regular
> bpf_redirect() helper into a map-based one; for this series I just pulled out
> the patches that introduced the new map type.
>
> Changelog:
>
> v5:
>
> - Dynamically set the number of hash buckets by rounding up max_entries to the
>   nearest power of two (mirroring the regular hashmap), as suggested by Jesper.

fyi I'm waiting for Jesper to review this new version.

^ permalink raw reply

* Re: next-20190723: bpf/seccomp - systemd/journald issue?
From: Alexei Starovoitov @ 2019-07-27  2:24 UTC (permalink / raw)
  To: sedat.dilek
  Cc: Yonghong Song, Alexei Starovoitov, Daniel Borkmann, Martin Lau,
	Song Liu, netdev@vger.kernel.org, bpf@vger.kernel.org,
	Clang-Built-Linux ML, Kees Cook, Nick Desaulniers,
	Nathan Chancellor
In-Reply-To: <CA+icZUUe0QE9QGMom1iQwuG8nM7Oi4Mq0GKqrLvebyxfUmj6RQ@mail.gmail.com>

On Fri, Jul 26, 2019 at 2:19 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Fri, Jul 26, 2019 at 11:10 PM Yonghong Song <yhs@fb.com> wrote:
> >
> >
> >
> > On 7/26/19 2:02 PM, Sedat Dilek wrote:
> > > On Fri, Jul 26, 2019 at 10:38 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > >>
> > >> Hi Yonghong Song,
> > >>
> > >> On Fri, Jul 26, 2019 at 5:45 PM Yonghong Song <yhs@fb.com> wrote:
> > >>>
> > >>>
> > >>>
> > >>> On 7/26/19 1:26 AM, Sedat Dilek wrote:
> > >>>> Hi,
> > >>>>
> > >>>> I have opened a new issue in the ClangBuiltLinux issue tracker.
> > >>>
> > >>> Glad to know clang 9 has asm goto support and now It can compile
> > >>> kernel again.
> > >>>
> > >>
> > >> Yupp.
> > >>
> > >>>>
> > >>>> I am seeing a problem in the area bpf/seccomp causing
> > >>>> systemd/journald/udevd services to fail.
> > >>>>
> > >>>> [Fri Jul 26 08:08:43 2019] systemd[453]: systemd-udevd.service: Failed
> > >>>> to connect stdout to the journal socket, ignoring: Connection refused
> > >>>>
> > >>>> This happens when I use the (LLVM) LLD ld.lld-9 linker but not with
> > >>>> BFD linker ld.bfd on Debian/buster AMD64.
> > >>>> In both cases I use clang-9 (prerelease).
> > >>>
> > >>> Looks like it is a lld bug.
> > >>>
> > >>> I see the stack trace has __bpf_prog_run32() which is used by
> > >>> kernel bpf interpreter. Could you try to enable bpf jit
> > >>>     sysctl net.core.bpf_jit_enable = 1
> > >>> If this passed, it will prove it is interpreter related.
> > >>>
> > >>
> > >> After...
> > >>
> > >> sysctl -w net.core.bpf_jit_enable=1
> > >>
> > >> I can start all failed systemd services.
> > >>
> > >> systemd-journald.service
> > >> systemd-udevd.service
> > >> haveged.service
> > >>
> > >> This is in maintenance mode.
> > >>
> > >> What is next: Do set a permanent sysctl setting for net.core.bpf_jit_enable?
> > >>
> > >
> > > This is what I did:
> >
> > I probably won't have cycles to debug this potential lld issue.
> > Maybe you already did, I suggest you put enough reproducible
> > details in the bug you filed against lld so they can take a look.
> >
>
> I understand and will put the journalctl-log into the CBL issue
> tracker and update informations.
>
> Thanks for your help understanding the BPF correlations.
>
> Is setting 'net.core.bpf_jit_enable = 2' helpful here?

jit_enable=1 is enough.
Or use CONFIG_BPF_JIT_ALWAYS_ON to workaround.

It sounds like clang miscompiles interpreter.
modprobe test_bpf
should be able to point out which part of interpreter is broken.

^ permalink raw reply

* [PATCH] tcp: add new tcp_mtu_probe_floor sysctl
From: Josh Hunt @ 2019-07-27  2:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, edumazet, Josh Hunt

The current implementation of TCP MTU probing can considerably
underestimate the MTU on lossy connections allowing the MSS to get down to
48. We have found that in almost all of these cases on our networks these
paths can handle much larger MTUs meaning the connections are being
artificially limited. Even though TCP MTU probing can raise the MSS back up
we have seen this not to be the case causing connections to be "stuck" with
an MSS of 48 when heavy loss is present.

Prior to pushing out this change we could not keep TCP MTU probing enabled
b/c of the above reasons. Now with a reasonble floor set we've had it
enabled for the past 6 months.

The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
administrators the ability to control the floor of MSS probing.

Signed-off-by: Josh Hunt <johunt@akamai.com>
---
 Documentation/networking/ip-sysctl.txt | 6 ++++++
 include/net/netns/ipv4.h               | 1 +
 net/ipv4/sysctl_net_ipv4.c             | 9 +++++++++
 net/ipv4/tcp_ipv4.c                    | 1 +
 net/ipv4/tcp_timer.c                   | 2 +-
 5 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index df33674799b5..49e95f438ed7 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -256,6 +256,12 @@ tcp_base_mss - INTEGER
 	Path MTU discovery (MTU probing).  If MTU probing is enabled,
 	this is the initial MSS used by the connection.
 
+tcp_mtu_probe_floor - INTEGER
+	If MTU probing is enabled this caps the minimum MSS used for search_low
+	for the connection.
+
+	Default : 48
+
 tcp_min_snd_mss - INTEGER
 	TCP SYN and SYNACK messages usually advertise an ADVMSS option,
 	as described in RFC 1122 and RFC 6691.
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index bc24a8ec1ce5..c0c0791b1912 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -116,6 +116,7 @@ struct netns_ipv4 {
 	int sysctl_tcp_l3mdev_accept;
 #endif
 	int sysctl_tcp_mtu_probing;
+	int sysctl_tcp_mtu_probe_floor;
 	int sysctl_tcp_base_mss;
 	int sysctl_tcp_min_snd_mss;
 	int sysctl_tcp_probe_threshold;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0b980e841927..59ded25acd04 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -820,6 +820,15 @@ static struct ctl_table ipv4_net_table[] = {
 		.extra2		= &tcp_min_snd_mss_max,
 	},
 	{
+		.procname	= "tcp_mtu_probe_floor",
+		.data		= &init_net.ipv4.sysctl_tcp_mtu_probe_floor,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &tcp_min_snd_mss_min,
+		.extra2		= &tcp_min_snd_mss_max,
+	},
+	{
 		.procname	= "tcp_probe_threshold",
 		.data		= &init_net.ipv4.sysctl_tcp_probe_threshold,
 		.maxlen		= sizeof(int),
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d57641cb3477..e0a372676329 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2637,6 +2637,7 @@ static int __net_init tcp_sk_init(struct net *net)
 	net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS;
 	net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD;
 	net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL;
+	net->ipv4.sysctl_tcp_mtu_probe_floor = TCP_MIN_SND_MSS;
 
 	net->ipv4.sysctl_tcp_keepalive_time = TCP_KEEPALIVE_TIME;
 	net->ipv4.sysctl_tcp_keepalive_probes = TCP_KEEPALIVE_PROBES;
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index c801cd37cc2a..dbd9d2d0ee63 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -154,7 +154,7 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
 	} else {
 		mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
 		mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
-		mss = max(mss, 68 - tcp_sk(sk)->tcp_header_len);
+		mss = max(mss, net->ipv4.sysctl_tcp_mtu_probe_floor);
 		mss = max(mss, net->ipv4.sysctl_tcp_min_snd_mss);
 		icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);
 	}
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH] isdn/gigaset: check endpoint null in gigaset_probe
From: Phong Tran @ 2019-07-27  1:56 UTC (permalink / raw)
  To: Paul Bolle, isdn, gregkh
  Cc: tranmanphong, gigaset307x-common, netdev, linux-kernel,
	linux-kernel-mentees, syzbot+35b1c403a14f5c89eba7
In-Reply-To: <1876196a0e7fc665f0f50d5e9c0e2641f713e089.camel@tiscali.nl>

On 7/26/19 9:22 PM, Paul Bolle wrote:
> Phong Tran schreef op vr 26-07-2019 om 20:35 [+0700]:
>> This fixed the potential reference NULL pointer while using variable
>> endpoint.
>>
>> Reported-by: syzbot+35b1c403a14f5c89eba7@syzkaller.appspotmail.com
>> Tested by syzbot:
>> https://groups.google.com/d/msg/syzkaller-bugs/wnHG8eRNWEA/Qn2HhjNdBgAJ
>>
>> Signed-off-by: Phong Tran <tranmanphong@gmail.com>
>> ---
>>   drivers/isdn/gigaset/usb-gigaset.c | 9 +++++++++
> 
> This is now drivers/staging/isdn/gigaset/usb-gigaset.c.

this patch was created base on branch 
kasan/usb-fuzzer-usb-testing-2019.07.11 [1]
I did not notice about the driver was moved to staging.

> 
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/isdn/gigaset/usb-gigaset.c b/drivers/isdn/gigaset/usb-gigaset.c
>> index 1b9b43659bdf..2e011f3db59e 100644
>> --- a/drivers/isdn/gigaset/usb-gigaset.c
>> +++ b/drivers/isdn/gigaset/usb-gigaset.c
>> @@ -703,6 +703,10 @@ static int gigaset_probe(struct usb_interface *interface,
>>   	usb_set_intfdata(interface, cs);
>>   
>>   	endpoint = &hostif->endpoint[0].desc;
>> +        if (!endpoint) {
>> +		dev_err(cs->dev, "Couldn't get control endpoint\n");
>> +		return -ENODEV;
>> +	}
> 
> When can this happen? Is this one of those bugs that one can only trigger with
> a specially crafted (evil) usb device?
> 

Yes, in my understanding, this only happens with random test of syzbot.

>>   	buffer_size = le16_to_cpu(endpoint->wMaxPacketSize);
>>   	ucs->bulk_out_size = buffer_size;
>> @@ -722,6 +726,11 @@ static int gigaset_probe(struct usb_interface *interface,
>>   	}
>>   
>>   	endpoint = &hostif->endpoint[1].desc;
>> +        if (!endpoint) {
>> +		dev_err(cs->dev, "Endpoint not available\n");
>> +		retval = -ENODEV;
>> +		goto error;
>> +	}
>>   
>>   	ucs->busy = 0;
>>   
> 
> Please note that I'm very close to getting cut off from the ISDN network, so
> the chances of being able to testi this on a live system are getting small.
> 

This bug can be invalid now. Do you agree?
There is an instruction to report invalid bug to syzbot [2].

> Thanks,
> 
> 
> Paul Bolle
> 


[1] 
https://github.com/google/kasan/commits/usb-fuzzer-usb-testing-2019.07.11
[2] 
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot

Thanks,
Phong

^ permalink raw reply

* Re: [PATCH] b43legacy: Remove pointless cond_resched() wrapper
From: Larry Finger @ 2019-07-27  1:52 UTC (permalink / raw)
  To: Thomas Gleixner, netdev; +Cc: b43-dev, Kalle Valo
In-Reply-To: <alpine.DEB.2.21.1907262157500.1791@nanos.tec.linutronix.de>

On 7/26/19 3:00 PM, Thomas Gleixner wrote:
> cond_resched() can be used unconditionally. If CONFIG_PREEMPT is set, it
> becomes a NOP scheduler wise.
> 
> Also the B43_BUG_ON() in that wrapper is a homebrewn variant of
> __might_sleep() which is part of cond_resched() already.
> 
> Remove the wrapper and invoke cond_resched() directly.
> 
> Found while looking for CONFIG_PREEMPT dependent code treewide.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: netdev@vger.kernel.org
> Cc: b43-dev@lists.infradead.org
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: Larry Finger <Larry.Finger@lwfinger.net>

Reviewed- and Tested-by: Larry Finger <Larry.Finger@lwfinger.net>

Thanks.

Larry

^ permalink raw reply

* Re: [PATCH bpf-next v10 06/10] bpf,landlock: Add a new map type: inode
From: Alexei Starovoitov @ 2019-07-27  1:40 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andy Lutomirski, Arnaldo Carvalho de Melo, Casey Schaufler,
	Daniel Borkmann, David Drysdale, David S . Miller,
	Eric W . Biederman, James Morris, Jann Horn, John Johansen,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Stephen Smalley, Tejun Heo,
	Tetsuo Handa, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-fsdevel, linux-security-module,
	netdev
In-Reply-To: <20190721213116.23476-7-mic@digikod.net>

On Sun, Jul 21, 2019 at 11:31:12PM +0200, Mickaël Salaün wrote:
> FIXME: 64-bits in the doc
> 
> This new map store arbitrary values referenced by inode keys.  The map
> can be updated from user space with file descriptor pointing to inodes
> tied to a file system.  From an eBPF (Landlock) program point of view,
> such a map is read-only and can only be used to retrieved a value tied
> to a given inode.  This is useful to recognize an inode tagged by user
> space, without access right to this inode (i.e. no need to have a write
> access to this inode).
> 
> Add dedicated BPF functions to handle this type of map:
> * bpf_inode_htab_map_update_elem()
> * bpf_inode_htab_map_lookup_elem()
> * bpf_inode_htab_map_delete_elem()
> 
> This new map require a dedicated helper inode_map_lookup_elem() because
> of the key which is a pointer to an opaque data (only provided by the
> kernel).  This act like a (physical or cryptographic) key, which is why
> it is also not allowed to get the next key.
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>

there are too many things to comment on.
Let's do this patch.

imo inode_map concept is interesting, but see below...

> +
> +	/*
> +	 * Limit number of entries in an inode map to the maximum number of
> +	 * open files for the current process. The maximum number of file
> +	 * references (including all inode maps) for a process is then
> +	 * (RLIMIT_NOFILE - 1) * RLIMIT_NOFILE. If the process' RLIMIT_NOFILE
> +	 * is 0, then any entry update is forbidden.
> +	 *
> +	 * An eBPF program can inherit all the inode map FD. The worse case is
> +	 * to fill a bunch of arraymaps, create an eBPF program, close the
> +	 * inode map FDs, and start again. The maximum number of inode map
> +	 * entries can then be close to RLIMIT_NOFILE^3.
> +	 */
> +	if (attr->max_entries > rlimit(RLIMIT_NOFILE))
> +		return -EMFILE;

rlimit is checked, but no fd are consumed.
Once created such inode map_fd can be passed to a different process.
map_fd can be pinned into bpffs.
etc.
what the value of the check?

> +
> +	/* decorelate UAPI from kernel API */
> +	attr->key_size = sizeof(struct inode *);
> +
> +	return htab_map_alloc_check(attr);
> +}
> +
> +static void inode_htab_put_key(void *key)
> +{
> +	struct inode **inode = key;
> +
> +	if ((*inode)->i_state & I_FREEING)
> +		return;

checking the state without take a lock? isn't it racy?

> +	iput(*inode);
> +}
> +
> +/* called from syscall or (never) from eBPF program */
> +static int map_get_next_no_key(struct bpf_map *map, void *key, void *next_key)
> +{
> +	/* do not leak a file descriptor */

what this comment suppose to mean?

> +	return -ENOTSUPP;
> +}
> +
> +/* must call iput(inode) after this call */
> +static struct inode *inode_from_fd(int ufd, bool check_access)
> +{
> +	struct inode *ret;
> +	struct fd f;
> +	int deny;
> +
> +	f = fdget(ufd);
> +	if (unlikely(!f.file))
> +		return ERR_PTR(-EBADF);
> +	/* TODO?: add this check when called from an eBPF program too (already
> +	* checked by the LSM parent hooks anyway) */
> +	if (unlikely(IS_PRIVATE(file_inode(f.file)))) {
> +		ret = ERR_PTR(-EINVAL);
> +		goto put_fd;
> +	}
> +	/* check if the FD is tied to a mount point */
> +	/* TODO?: add this check when called from an eBPF program too */
> +	if (unlikely(f.file->f_path.mnt->mnt_flags & MNT_INTERNAL)) {
> +		ret = ERR_PTR(-EINVAL);
> +		goto put_fd;
> +	}

a bunch of TODOs do not inspire confidence.

> +	if (check_access) {
> +		/*
> +		* must be allowed to access attributes from this file to then
> +		* be able to compare an inode to its map entry
> +		*/
> +		deny = security_inode_getattr(&f.file->f_path);
> +		if (deny) {
> +			ret = ERR_PTR(deny);
> +			goto put_fd;
> +		}
> +	}
> +	ret = file_inode(f.file);
> +	ihold(ret);
> +
> +put_fd:
> +	fdput(f);
> +	return ret;
> +}
> +
> +/*
> + * The key is a FD when called from a syscall, but an inode address when called
> + * from an eBPF program.
> + */
> +
> +/* called from syscall */
> +int bpf_inode_fd_htab_map_lookup_elem(struct bpf_map *map, int *key, void *value)
> +{
> +	void *ptr;
> +	struct inode *inode;
> +	int ret;
> +
> +	/* check inode access */
> +	inode = inode_from_fd(*key, true);
> +	if (IS_ERR(inode))
> +		return PTR_ERR(inode);
> +
> +	rcu_read_lock();
> +	ptr = htab_map_lookup_elem(map, &inode);
> +	iput(inode);
> +	if (IS_ERR(ptr)) {
> +		ret = PTR_ERR(ptr);
> +	} else if (!ptr) {
> +		ret = -ENOENT;
> +	} else {
> +		ret = 0;
> +		copy_map_value(map, value, ptr);
> +	}
> +	rcu_read_unlock();
> +	return ret;
> +}
> +
> +/* called from kernel */

wrong comment?
kernel side cannot call it, right?

> +int bpf_inode_ptr_locked_htab_map_delete_elem(struct bpf_map *map,
> +		struct inode **key, bool remove_in_inode)
> +{
> +	if (remove_in_inode)
> +		landlock_inode_remove_map(*key, map);
> +	return htab_map_delete_elem(map, key);
> +}
> +
> +/* called from syscall */
> +int bpf_inode_fd_htab_map_delete_elem(struct bpf_map *map, int *key)
> +{
> +	struct inode *inode;
> +	int ret;
> +
> +	/* do not check inode access (similar to directory check) */
> +	inode = inode_from_fd(*key, false);
> +	if (IS_ERR(inode))
> +		return PTR_ERR(inode);
> +	ret = bpf_inode_ptr_locked_htab_map_delete_elem(map, &inode, true);
> +	iput(inode);
> +	return ret;
> +}
> +
> +/* called from syscall */
> +int bpf_inode_fd_htab_map_update_elem(struct bpf_map *map, int *key, void *value,
> +		u64 map_flags)
> +{
> +	struct inode *inode;
> +	int ret;
> +
> +	WARN_ON_ONCE(!rcu_read_lock_held());
> +
> +	/* check inode access */
> +	inode = inode_from_fd(*key, true);
> +	if (IS_ERR(inode))
> +		return PTR_ERR(inode);
> +	ret = htab_map_update_elem(map, &inode, value, map_flags);
> +	if (!ret)
> +		ret = landlock_inode_add_map(inode, map);
> +	iput(inode);
> +	return ret;
> +}
> +
> +static void inode_htab_map_free(struct bpf_map *map)
> +{
> +	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
> +	struct hlist_nulls_node *n;
> +	struct hlist_nulls_head *head;
> +	struct htab_elem *l;
> +	int i;
> +
> +	for (i = 0; i < htab->n_buckets; i++) {
> +		head = select_bucket(htab, i);
> +		hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
> +			landlock_inode_remove_map(*((struct inode **)l->key), map);
> +		}
> +	}
> +	htab_map_free(map);
> +}

user space can delete the map.
that will trigger inode_htab_map_free() which will call
landlock_inode_remove_map().
which will simply itereate the list and delete from the list.

While in parallel inode can be destoyed and hook_inode_free_security()
will be called.
I think nothing that protects from this race.

> +
> +/*
> + * We need a dedicated helper to deal with inode maps because the key is a
> + * pointer to an opaque data, only provided by the kernel.  This really act
> + * like a (physical or cryptographic) key, which is why it is also not allowed
> + * to get the next key with map_get_next_key().

inode pointer is like cryptographic key? :)

> + */
> +BPF_CALL_2(bpf_inode_map_lookup_elem, struct bpf_map *, map, void *, key)
> +{
> +	WARN_ON_ONCE(!rcu_read_lock_held());
> +	return (unsigned long)htab_map_lookup_elem(map, &key);
> +}
> +
> +const struct bpf_func_proto bpf_inode_map_lookup_elem_proto = {
> +	.func		= bpf_inode_map_lookup_elem,
> +	.gpl_only	= false,
> +	.pkt_access	= true,

pkt_access ? :)

> +	.ret_type	= RET_PTR_TO_MAP_VALUE_OR_NULL,
> +	.arg1_type	= ARG_CONST_MAP_PTR,
> +	.arg2_type	= ARG_PTR_TO_INODE,
> +};
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index b2a8cb14f28e..e46441c42b68 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -801,6 +801,8 @@ static int map_lookup_elem(union bpf_attr *attr)
>  	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
>  		   map->map_type == BPF_MAP_TYPE_STACK) {
>  		err = map->ops->map_peek_elem(map, value);
> +	} else if (map->map_type == BPF_MAP_TYPE_INODE) {
> +		err = bpf_inode_fd_htab_map_lookup_elem(map, key, value);
>  	} else {
>  		rcu_read_lock();
>  		if (map->ops->map_lookup_elem_sys_only)
> @@ -951,6 +953,10 @@ static int map_update_elem(union bpf_attr *attr)
>  	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
>  		   map->map_type == BPF_MAP_TYPE_STACK) {
>  		err = map->ops->map_push_elem(map, value, attr->flags);
> +	} else if (map->map_type == BPF_MAP_TYPE_INODE) {
> +		rcu_read_lock();
> +		err = bpf_inode_fd_htab_map_update_elem(map, key, value, attr->flags);
> +		rcu_read_unlock();
>  	} else {
>  		rcu_read_lock();
>  		err = map->ops->map_update_elem(map, key, value, attr->flags);
> @@ -1006,7 +1012,10 @@ static int map_delete_elem(union bpf_attr *attr)
>  	preempt_disable();
>  	__this_cpu_inc(bpf_prog_active);
>  	rcu_read_lock();
> -	err = map->ops->map_delete_elem(map, key);
> +	if (map->map_type == BPF_MAP_TYPE_INODE)
> +		err = bpf_inode_fd_htab_map_delete_elem(map, key);
> +	else
> +		err = map->ops->map_delete_elem(map, key);
>  	rcu_read_unlock();
>  	__this_cpu_dec(bpf_prog_active);
>  	preempt_enable();
> @@ -1018,6 +1027,22 @@ static int map_delete_elem(union bpf_attr *attr)
>  	return err;
>  }
>  
> +int bpf_inode_ptr_unlocked_htab_map_delete_elem(struct bpf_map *map,
> +						struct inode **key, bool remove_in_inode)
> +{
> +	int err;
> +
> +	preempt_disable();
> +	__this_cpu_inc(bpf_prog_active);
> +	rcu_read_lock();
> +	err = bpf_inode_ptr_locked_htab_map_delete_elem(map, key, remove_in_inode);
> +	rcu_read_unlock();
> +	__this_cpu_dec(bpf_prog_active);
> +	preempt_enable();
> +	maybe_wait_bpf_programs(map);

if that function was actually doing synchronize_rcu() the consequences
would have been unpleasant. Fortunately it's a nop in this case.
Please read the code carefully before copy-paste.
Also what do you think the reason of bpf_prog_active above?
What is the reason of rcu_read_lock above?

I think the patch set needs to shrink at least in half to be reviewable.
The way you tie seccomp and lsm is probably the biggest obstacle
than any of the bugs above.
Can you drop seccomp ? and do it as normal lsm ?


^ permalink raw reply

* Re: [PATCH v2] net: dsa: qca8k: enable port flow control
From: xiaofeis @ 2019-07-27  1:15 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: vkoul, netdev
In-Reply-To: <20190726132919.GB18223@lunn.ch>

On 2019-07-26 21:29, Andrew Lunn wrote:
>> I didn't compile it on this tree, same code is just compiled and 
>> tested on
>> kernel v4.14.
> 
> For kernel development work, v4.14 is dead. It died 12th November
> 2017. It gets backports of bug fixes, but kernel developers otherwise
> don't touch it.
> 
>> We are working on one google project, all the change is
>> required to upstream by Google.
>> But if I do the change based on the new type for kernel 5.3, then the 
>> commit
>> can't be used directly for Google's project.
> 
> So you will need to backport the change. In this case, you will have a
> very different patch in v4.14 than in mainline, due to changes like
> this. That is part of the pain in using such an old kernel.
> 
> You should use the function
> 
> void phy_support_asym_pause(struct phy_device *phydev);
> 
> to indicate the MAC supports asym pause.
> 
>    Andrew

Hi Andrew

Thanks a lot, you are correct. phy_support_asym_pause is the API to do 
this.

Very appreciate for all your patinet explaination and good suggestion.

Thanks
Xiaofeis

^ permalink raw reply

* Re: [PATCH net-next v3 1/3] flow_offload: move tc indirect block to flow offload
From: Jakub Kicinski @ 2019-07-27  0:56 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, fw, netfilter-devel, netdev
In-Reply-To: <1564148047-6428-2-git-send-email-wenxu@ucloud.cn>

On Fri, 26 Jul 2019 21:34:05 +0800, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> move tc indirect block to flow_offload and rename
> it to flow indirect block.The nf_tables can use the
> indr block architecture.
> 
> Signed-off-by: wenxu <wenxu@ucloud.cn>

> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index 00b9aab..66f89bc 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -4,6 +4,7 @@
>  #include <linux/kernel.h>
>  #include <linux/list.h>
>  #include <net/flow_dissector.h>
> +#include <linux/rhashtable.h>
>  
>  struct flow_match {
>  	struct flow_dissector	*dissector;
> @@ -366,4 +367,42 @@ static inline void flow_block_init(struct flow_block *flow_block)
>  	INIT_LIST_HEAD(&flow_block->cb_list);
>  }
>  
> +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
> +				      enum tc_setup_type type, void *type_data);
> +
> +struct flow_indr_block_cb {
> +	struct list_head list;
> +	void *cb_priv;
> +	flow_indr_block_bind_cb_t *cb;
> +	void *cb_ident;
> +};
> +
> +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev,
> +				       struct flow_block *flow_block,
> +				       struct flow_indr_block_cb *indr_block_cb,
> +				       enum flow_block_command command);
> +
> +struct flow_indr_block_dev {
> +	struct rhash_head ht_node;
> +	struct net_device *dev;
> +	unsigned int refcnt;
> +	struct list_head cb_list;
> +	flow_indr_block_ing_cmd_t *ing_cmd_cb;
> +	struct flow_block *flow_block;

TC can only have one block per device. Now with nftables offload we can
have multiple blocks. Could you elaborate how this is solved?

> +};

^ permalink raw reply

* Re: [PATCH net-next v3 2/3] flow_offload: support get tcf block immediately
From: Jakub Kicinski @ 2019-07-27  0:52 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, fw, netfilter-devel, netdev
In-Reply-To: <1564148047-6428-3-git-send-email-wenxu@ucloud.cn>

On Fri, 26 Jul 2019 21:34:06 +0800, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> Because the new flow-indr-block can't get the tcf_block
> directly.
> It provide a callback to find the tcf block immediately
> when the device register and contain a ingress block.
> 
> Signed-off-by: wenxu <wenxu@ucloud.cn>

Please CC people who gave you feedback on your subsequent submissions.

> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index 66f89bc..3b2e848 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -391,6 +391,10 @@ struct flow_indr_block_dev {
>  	struct flow_block *flow_block;
>  };
>  
> +typedef void flow_indr_get_default_block_t(struct flow_indr_block_dev *indr_dev);
> +
> +void flow_indr_set_default_block_cb(flow_indr_get_default_block_t *cb);
> +
>  struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev);
>  
>  int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
> diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
> index 9f1ae67..db8469d 100644
> --- a/net/core/flow_offload.c
> +++ b/net/core/flow_offload.c
> @@ -298,6 +298,14 @@ struct flow_indr_block_dev *
>  }
>  EXPORT_SYMBOL(flow_indr_block_dev_lookup);
>  
> +static flow_indr_get_default_block_t *flow_indr_get_default_block;

This static variable which can only be set to the TC's callback really
is not a great API design :/

> +void flow_indr_set_default_block_cb(flow_indr_get_default_block_t *cb)
> +{
> +	flow_indr_get_default_block = cb;
> +}
> +EXPORT_SYMBOL(flow_indr_set_default_block_cb);
> +
>  static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *dev)
>  {
>  	struct flow_indr_block_dev *indr_dev;
> @@ -312,6 +320,10 @@ static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *de
>  
>  	INIT_LIST_HEAD(&indr_dev->cb_list);
>  	indr_dev->dev = dev;
> +
> +	if (flow_indr_get_default_block)
> +		flow_indr_get_default_block(indr_dev);
> +
>  	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
>  				   flow_indr_setup_block_ht_params)) {
>  		kfree(indr_dev);


^ permalink raw reply

* Re: [PATCH] iplink: document 'change' option to ip link
From: Matteo Croce @ 2019-07-27  0:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20190726215959.6312-1-stephen@networkplumber.org>

On Sat, Jul 27, 2019 at 12:00 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> Add the command alias "change" to man page.
> Don't show it on usage, since it is not commonly used.
>
> Reported-off-by: Matteo Croce <mcroce@redhat.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  man/man8/ip-link.8.in | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
> index 883d88077d66..a8ae72d2b097 100644
> --- a/man/man8/ip-link.8.in
> +++ b/man/man8/ip-link.8.in
> @@ -1815,6 +1815,11 @@ can move the system to an unpredictable state. The solution
>  is to avoid changing several parameters with one
>  .B ip link set
>  call.
> +The modifier
> +.B change
> +is equivalent to
> +.BR "set" .
> +
>
>  .TP
>  .BI dev " DEVICE "
> --
> 2.20.1
>

Acked-by: Matteo Croce <mcroce@redhat.com>

-- 
Matteo Croce
per aspera ad upstream

^ permalink raw reply

* Re: [PATCH bpf-next 6/9] selftests/bpf: abstract away test log output
From: Alexei Starovoitov @ 2019-07-27  0:34 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Andrii Nakryiko, Andrii Nakryiko, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team
In-Reply-To: <20190726222652.GG24397@mini-arch>

On Fri, Jul 26, 2019 at 03:26:52PM -0700, Stanislav Fomichev wrote:
> On 07/26, Andrii Nakryiko wrote:
> > On Fri, Jul 26, 2019 at 2:31 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 07/26, Andrii Nakryiko wrote:
> > > > This patch changes how test output is printed out. By default, if test
> > > > had no errors, the only output will be a single line with test number,
> > > > name, and verdict at the end, e.g.:
> > > >
> > > >   #31 xdp:OK
> > > >
> > > > If test had any errors, all log output captured during test execution
> > > > will be output after test completes.
> > > >
> > > > It's possible to force output of log with `-v` (`--verbose`) option, in
> > > > which case output won't be buffered and will be output immediately.
> > > >
> > > > To support this, individual tests are required to use helper methods for
> > > > logging: `test__printf()` and `test__vprintf()`.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > ---
> > > >  .../selftests/bpf/prog_tests/bpf_obj_id.c     |   6 +-
> > > >  .../bpf/prog_tests/bpf_verif_scale.c          |  31 ++--
> > > >  .../bpf/prog_tests/get_stack_raw_tp.c         |   4 +-
> > > >  .../selftests/bpf/prog_tests/l4lb_all.c       |   2 +-
> > > >  .../selftests/bpf/prog_tests/map_lock.c       |  10 +-
> > > >  .../selftests/bpf/prog_tests/send_signal.c    |   8 +-
> > > >  .../selftests/bpf/prog_tests/spinlock.c       |   2 +-
> > > >  .../bpf/prog_tests/stacktrace_build_id.c      |   4 +-
> > > >  .../bpf/prog_tests/stacktrace_build_id_nmi.c  |   4 +-
> > > >  .../selftests/bpf/prog_tests/xdp_noinline.c   |   3 +-
> > > >  tools/testing/selftests/bpf/test_progs.c      | 135 +++++++++++++-----
> > > >  tools/testing/selftests/bpf/test_progs.h      |  37 ++++-
> > > >  12 files changed, 173 insertions(+), 73 deletions(-)
> > > >
> > 
> > [...]
> > 
> > > >               error_cnt++;
> > > > -             printf("test_l4lb:FAIL:stats %lld %lld\n", bytes, pkts);
> > > > +             test__printf("test_l4lb:FAIL:stats %lld %lld\n", bytes, pkts);
> > > #define printf(...) test__printf(...) in tests.h?
> > >
> > > A bit ugly, but no need to retrain everyone to use new printf wrappers.
> > 
> > I try to reduce amount of magic and surprising things, not add new
> > ones :) I also led by example and converted all current instances of
> > printf usage to test__printf, so anyone new will just copy/paste good
> > example, hopefully. Even if not, this non-buffered output will be
> > immediately obvious to anyone who just runs `sudo ./test_progs`.
> 
> [..]
> > And
> > author of new test with this problem should hopefully be the first and
> > the only one to catch and fix this.
> Yeah, that is my only concern, that regular printfs will eventually
> creep in. It's already confusing to go to/from printf/printk.
> 
> 2c:
> 
> I'm coming from a perspective of tools/testing/selftests/kselftest.h
> which is supposed to be a generic framework with custom
> printf variants (ksft_print_msg), but I still see a bunch of tests
> calling printf :-/
> 
> 	grep -ril ksft_exit_fail_msg selftests/ | xargs -n1 grep -w printf
> 
> Since we don't expect regular buffered io from the tests anyway
> it might be easier just to add a bit of magic and call it a day.

I think #define printf()
is not a good style in general.
glibc functions should never be #define-d.


^ permalink raw reply

* Re: [PATCH bpf-next 4/9] libbpf: add libbpf_swap_print to get previous print func
From: Alexei Starovoitov @ 2019-07-27  0:30 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stanislav Fomichev, Andrii Nakryiko, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team
In-Reply-To: <CAEf4BzYoiL7XAXFdLaf5TDDas42u+jUTy2WydgmJT7WiniqOqQ@mail.gmail.com>

On Fri, Jul 26, 2019 at 02:47:28PM -0700, Andrii Nakryiko wrote:
> On Fri, Jul 26, 2019 at 2:28 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 07/26, Andrii Nakryiko wrote:
> > > libbpf_swap_print allows to restore previously set print function.
> > > This is useful when running many independent test with one default print
> > > function, but overriding log verbosity for particular subset of tests.
> > Can we change the return type of libbpf_set_print instead and return
> > the old function from it? Will it break ABI?
> 
> Yeah, thought about that, but I wasn't sure about ABI breakage. It
> seems like it shouldn't, so I'll just change libbpf_set_print
> signature instead.

I think it's ok to change return value of libbpf_set_print() from void
to useful pointer.
This function is not marked as __attribute__((__warn_unused_result__)),
so there should be no abi issues.

Please double check by compiler perf with different gcc-s as Arnaldo's setup does.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox