Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 5/6] net: hns3: prevent unnecessary MAC TNL interrupt
From: Huazhong Tan @ 2019-08-16  8:09 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, Huazhong Tan
In-Reply-To: <1565942982-12105-1-git-send-email-tanhuazhong@huawei.com>

MAC TNL interrupt is used to collect statistic info about
link status changing suddenly when netdev is running.

But when stopping netdev, the enabled MAC TNL interrupt is
unnecessary, and may add some noises to the statistic info.
So this patch disables it before stopping MAC.

Fixes: a63457878b12 ("net: hns3: Add handling of MAC tunnel interruption")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 24b59f0..9d64c43 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -6380,6 +6380,8 @@ static void hclge_ae_stop(struct hnae3_handle *handle)
 	for (i = 0; i < handle->kinfo.num_tqps; i++)
 		hclge_reset_tqp(handle, i);
 
+	hclge_config_mac_tnl_int(hdev, false);
+
 	/* Mac disable */
 	hclge_cfg_mac_mode(hdev, false);
 
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next 1/6] net: hns3: add or modify comments
From: Huazhong Tan @ 2019-08-16  8:09 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, Guojia Liao, Zhongzhu Liu, Weihang Li,
	Guangbin Huang, Huazhong Tan
In-Reply-To: <1565942982-12105-1-git-send-email-tanhuazhong@huawei.com>

From: Guojia Liao <liaoguojia@huawei.com>

To explain some code, this patch adds some comments, and modifies or
merges some comments to make them more neat.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h                | 12 ++++++------
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c            |  8 ++++----
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h            |  8 ++++----
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c         |  2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c |  3 +++
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c    |  8 ++++----
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c     |  2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  |  9 ++++-----
 8 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 6c9fd58..3e21533 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -85,10 +85,10 @@ struct hnae3_queue {
 	void __iomem *io_base;
 	struct hnae3_ae_algo *ae_algo;
 	struct hnae3_handle *handle;
-	int tqp_index;	/* index in a handle */
-	u32 buf_size;	/* size for hnae_desc->addr, preset by AE */
-	u16 tx_desc_num;/* total number of tx desc */
-	u16 rx_desc_num;/* total number of rx desc */
+	int tqp_index;		/* index in a handle */
+	u32 buf_size;		/* size for hnae_desc->addr, preset by AE */
+	u16 tx_desc_num;	/* total number of tx desc */
+	u16 rx_desc_num;	/* total number of rx desc */
 };
 
 struct hns3_mac_stats {
@@ -96,7 +96,7 @@ struct hns3_mac_stats {
 	u64 rx_pause_cnt;
 };
 
-/*hnae3 loop mode*/
+/* hnae3 loop mode */
 enum hnae3_loop {
 	HNAE3_LOOP_APP,
 	HNAE3_LOOP_SERIAL_SERDES,
@@ -621,7 +621,7 @@ struct hnae3_handle {
 	struct pci_dev *pdev;
 	void *priv;
 	struct hnae3_ae_algo *ae_algo;  /* the class who provides this handle */
-	u64 flags; /* Indicate the capabilities for this handle*/
+	u64 flags; /* Indicate the capabilities for this handle */
 
 	union {
 		struct net_device *netdev; /* first member */
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 1750f80..a11d514 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -229,9 +229,9 @@ static void hns3_vector_gl_rl_init(struct hns3_enet_tqp_vector *tqp_vector,
 	/* initialize the configuration for interrupt coalescing.
 	 * 1. GL (Interrupt Gap Limiter)
 	 * 2. RL (Interrupt Rate Limiter)
+	 *
+	 * Default: enable interrupt coalescing self-adaptive and GL
 	 */
-
-	/* Default: enable interrupt coalescing self-adaptive and GL */
 	tqp_vector->tx_group.coal.gl_adapt_enable = 1;
 	tqp_vector->rx_group.coal.gl_adapt_enable = 1;
 
@@ -4207,8 +4207,8 @@ int hns3_nic_reset_all_ring(struct hnae3_handle *h)
 static void hns3_store_coal(struct hns3_nic_priv *priv)
 {
 	/* ethtool only support setting and querying one coal
-	 * configuation for now, so save the vector 0' coal
-	 * configuation here in order to restore it.
+	 * configuration for now, so save the vector 0' coal
+	 * configuration here in order to restore it.
 	 */
 	memcpy(&priv->tx_coal, &priv->tqp_vector[0].tx_group.coal,
 	       sizeof(struct hns3_enet_coalesce));
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index 5b0ee1f..e37e64e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -302,7 +302,7 @@ struct hns3_desc_cb {
 	dma_addr_t dma; /* dma address of this desc */
 	void *buf;      /* cpu addr for a desc */
 
-	/* priv data for the desc, e.g. skb when use with ip stack*/
+	/* priv data for the desc, e.g. skb when use with ip stack */
 	void *priv;
 	u32 page_offset;
 	u32 length;     /* length of the buffer */
@@ -325,11 +325,11 @@ enum hns3_pkt_l3type {
 	HNS3_L3_TYPE_MAC_PAUSE,
 	HNS3_L3_TYPE_PFC_PAUSE,/* 0x9*/
 
-	/* reserved for 0xA~0xB*/
+	/* reserved for 0xA~0xB */
 
 	HNS3_L3_TYPE_CNM = 0xc,
 
-	/* reserved for 0xD~0xE*/
+	/* reserved for 0xD~0xE */
 
 	HNS3_L3_TYPE_PARSE_FAIL	= 0xf /* must be last */
 };
@@ -354,7 +354,7 @@ enum hns3_pkt_ol3type {
 	HNS3_OL3_TYPE_IPV4_OPT = 4,
 	HNS3_OL3_TYPE_IPV6_EXT,
 
-	/* reserved for 0x6~0xE*/
+	/* reserved for 0x6~0xE */
 
 	HNS3_OL3_TYPE_PARSE_FAIL = 0xf	/* must be last */
 };
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 185ff32..677bfe06 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -635,7 +635,7 @@ static void hns3_get_ksettings(struct hnae3_handle *h,
 					     &cmd->base.speed,
 					     &cmd->base.duplex);
 
-	/* 2.get link mode*/
+	/* 2.get link mode */
 	if (ops->get_link_mode)
 		ops->get_link_mode(h,
 				   cmd->link_modes.supported,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
index f0295d1..025184a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
@@ -897,14 +897,17 @@ static void hclge_dbg_fd_tcam_read(struct hclge_dev *hdev, u8 stage,
 	dev_info(&hdev->pdev->dev, " read result tcam key %s(%u):\n",
 		 sel_x ? "x" : "y", loc);
 
+	/* tcam_data0 ~ tcam_data1 */
 	req = (u32 *)req1->tcam_data;
 	for (i = 0; i < 2; i++)
 		dev_info(&hdev->pdev->dev, "%08x\n", *req++);
 
+	/* tcam_data2 ~ tcam_data7 */
 	req = (u32 *)req2->tcam_data;
 	for (i = 0; i < 6; i++)
 		dev_info(&hdev->pdev->dev, "%08x\n", *req++);
 
+	/* tcam_data8 ~ tcam_data12 */
 	req = (u32 *)req3->tcam_data;
 	for (i = 0; i < 5; i++)
 		dev_info(&hdev->pdev->dev, "%08x\n", *req++);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index a3ca0e6..1d8dee1 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2810,9 +2810,9 @@ static u32 hclge_check_event_cause(struct hclge_dev *hdev, u32 *clearval)
 	 * defer the processing of the mailbox events. Since, we would have not
 	 * cleared RX CMDQ event this time we would receive again another
 	 * interrupt from H/W just for the mailbox.
+	 *
+	 * check for vector0 reset event sources
 	 */
-
-	/* check for vector0 reset event sources */
 	if (BIT(HCLGE_VECTOR0_IMPRESET_INT_B) & rst_src_reg) {
 		dev_info(&hdev->pdev->dev, "IMP reset interrupt\n");
 		set_bit(HNAE3_IMP_RESET, &hdev->reset_pending);
@@ -8000,7 +8000,7 @@ int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
 		return -EBUSY;
 	}
 
-	/* When port base vlan enabled, we use port base vlan as the vlan
+	/* when port base vlan enabled, we use port base vlan as the vlan
 	 * filter entry. In this case, we don't update vlan filter table
 	 * when user add new vlan or remove exist vlan, just update the vport
 	 * vlan list. The vlan id in vlan list will be writen in vlan filter
@@ -8019,7 +8019,7 @@ int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
 			hclge_add_vport_vlan_table(vport, vlan_id,
 						   writen_to_tbl);
 	} else if (is_kill) {
-		/* When remove hw vlan filter failed, record the vlan id,
+		/* when remove hw vlan filter failed, record the vlan id,
 		 * and try to remove it from hw later, to be consistence
 		 * with stack
 		 */
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index 5a7221e..f5da28a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -479,7 +479,7 @@ static void hclge_mbx_reset_vf_queue(struct hclge_vport *vport,
 
 	hclge_reset_vf_queue(vport, queue_id);
 
-	/* send response msg to VF after queue reset complete*/
+	/* send response msg to VF after queue reset complete */
 	hclge_gen_resp_to_vf(vport, mbx_req, 0, NULL, 0);
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index d8b8281..defc905 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1269,7 +1269,7 @@ static int hclgevf_set_vlan_filter(struct hnae3_handle *handle,
 				   HCLGE_MBX_VLAN_FILTER, msg_data,
 				   HCLGEVF_VLAN_MBX_MSG_LEN, false, NULL, 0);
 
-	/* When remove hw vlan filter failed, record the vlan id,
+	/* when remove hw vlan filter failed, record the vlan id,
 	 * and try to remove it from hw later, to be consistence
 	 * with stack.
 	 */
@@ -1561,7 +1561,7 @@ static int hclgevf_reset(struct hclgevf_dev *hdev)
 
 	rtnl_lock();
 
-	/* now, re-initialize the nic client and ae device*/
+	/* now, re-initialize the nic client and ae device */
 	ret = hclgevf_reset_stack(hdev);
 	if (ret) {
 		dev_err(&hdev->pdev->dev, "failed to reset VF stack\n");
@@ -1784,9 +1784,8 @@ static void hclgevf_reset_service_task(struct work_struct *work)
 		 * 1b and 2. cases but we will not get any intimation about 1a
 		 * from PF as cmdq would be in unreliable state i.e. mailbox
 		 * communication between PF and VF would be broken.
-		 */
-
-		/* if we are never geting into pending state it means either:
+		 *
+		 * if we are never geting into pending state it means either:
 		 * 1. PF is not receiving our request which could be due to IMP
 		 *    reset
 		 * 2. PF is screwed
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next 2/6] net: hns3: modify redundant initialization of variable
From: Huazhong Tan @ 2019-08-16  8:09 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, Guojia Liao, Guangbin Huang, Huzhong Tan
In-Reply-To: <1565942982-12105-1-git-send-email-tanhuazhong@huawei.com>

From: Guojia Liao <liaoguojia@huawei.com>

Some temporary variables do not need to be initialized that
they will be set before used, so this patch deletes the
initialization value of these temporary variables.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huzhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.c               | 7 +++----
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c        | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c    | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c     | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c  | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 2 +-
 6 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.c b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
index 908d4f4..6bbba15 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
@@ -104,7 +104,6 @@ int hnae3_register_client(struct hnae3_client *client)
 {
 	struct hnae3_client *client_tmp;
 	struct hnae3_ae_dev *ae_dev;
-	int ret = 0;
 
 	if (!client)
 		return -ENODEV;
@@ -123,7 +122,7 @@ int hnae3_register_client(struct hnae3_client *client)
 		/* if the client could not be initialized on current port, for
 		 * any error reasons, move on to next available port
 		 */
-		ret = hnae3_init_client_instance(client, ae_dev);
+		int ret = hnae3_init_client_instance(client, ae_dev);
 		if (ret)
 			dev_err(&ae_dev->pdev->dev,
 				"match and instantiation failed for port, ret = %d\n",
@@ -164,7 +163,7 @@ void hnae3_register_ae_algo(struct hnae3_ae_algo *ae_algo)
 	const struct pci_device_id *id;
 	struct hnae3_ae_dev *ae_dev;
 	struct hnae3_client *client;
-	int ret = 0;
+	int ret;
 
 	if (!ae_algo)
 		return;
@@ -258,7 +257,7 @@ int hnae3_register_ae_dev(struct hnae3_ae_dev *ae_dev)
 	const struct pci_device_id *id;
 	struct hnae3_ae_algo *ae_algo;
 	struct hnae3_client *client;
-	int ret = 0;
+	int ret;
 
 	if (!ae_dev)
 		return -ENODEV;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 677bfe06..4260653 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -749,7 +749,7 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
 {
 	struct hnae3_handle *handle = hns3_get_handle(netdev);
 	const struct hnae3_ae_ops *ops = handle->ae_algo->ops;
-	int ret = 0;
+	int ret;
 
 	/* Chip don't support this mode. */
 	if (cmd->base.speed == SPEED_1000 && cmd->base.duplex == DUPLEX_HALF)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 05a4cdb..5ce9a8a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -1557,8 +1557,8 @@ int hclge_config_rocee_ras_interrupt(struct hclge_dev *hdev, bool en)
 
 static void hclge_handle_rocee_ras_error(struct hnae3_ae_dev *ae_dev)
 {
-	enum hnae3_reset_type reset_type = HNAE3_NONE_RESET;
 	struct hclge_dev *hdev = ae_dev->priv;
+	enum hnae3_reset_type reset_type;
 
 	if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
 	    hdev->pdev->revision < 0x21)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index f30d112..e829101 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -404,8 +404,8 @@ static int hclge_tm_port_shaper_cfg(struct hclge_dev *hdev)
 {
 	struct hclge_port_shapping_cmd *shap_cfg_cmd;
 	struct hclge_desc desc;
-	u32 shapping_para = 0;
 	u8 ir_u, ir_b, ir_s;
+	u32 shapping_para;
 	int ret;
 
 	ret = hclge_shaper_para_calc(hdev->hw.mac.speed,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
index 55d3c78..4c2c945 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
@@ -43,7 +43,7 @@ static int hclgevf_cmd_csq_clean(struct hclgevf_hw *hw)
 {
 	struct hclgevf_dev *hdev = container_of(hw, struct hclgevf_dev, hw);
 	struct hclgevf_cmq_ring *csq = &hw->cmq.csq;
-	int clean = 0;
+	int clean;
 	u32 head;
 
 	head = hclgevf_read_dev(hw, HCLGEVF_NIC_CSQ_HEAD_REG);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index defc905..594cae8 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -2302,7 +2302,7 @@ static void hclgevf_uninit_msi(struct hclgevf_dev *hdev)
 
 static int hclgevf_misc_irq_init(struct hclgevf_dev *hdev)
 {
-	int ret = 0;
+	int ret;
 
 	hclgevf_get_misc_vector(hdev);
 
-- 
2.7.4


^ permalink raw reply related

* RE: [PATCH net-next] r8152: divide the tx and rx bottom functions
From: Hayes Wang @ 2019-08-16  8:10 UTC (permalink / raw)
  To: Eric Dumazet, netdev@vger.kernel.org
  Cc: nic_swsd, linux-kernel@vger.kernel.org
In-Reply-To: <9749764d-7815-b673-0fc4-22475601efec@gmail.com>

Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Friday, August 16, 2019 2:40 PM
[...]
> tasklet and NAPI are scheduled on the same core (the current
> cpu calling napi_schedule() or tasklet_schedule())
> 
> I would rather not add this dubious tasklet, and instead try to understand
> what is wrong in this driver ;)
> 
> The various napi_schedule() calls are suspect IMO.

The original method as following.

static int r8152_poll(struct napi_struct *napi, int budget)
{
	struct r8152 *tp = container_of(napi, struct r8152, napi);
	int work_done;

	work_done = rx_bottom(tp, budget); <-- RX
	bottom_half(tp); <-- Tx (tx_bottom)
	[...]

The rx_bottom and tx_bottom would only be called in r8152_poll.
That is, tx_bottom wouldn't be run unless rx_bottom is finished.
And, rx_bottom would be called if tx_bottom is running.

If the traffic is busy. rx_bottom or tx_bottom may take a lot
of time to deal with the packets. And the one would increase
the latency time for the other one.

Therefore, when I separate the tx_bottom and rx_bottom to
different tasklet and napi, the callback functions of tx and
rx may schedule the tasklet and napi to different cpu. Then,
the rx_bottom and tx_bottom may be run at the same time.

Take our arm platform for example. There are five cpus to
handle the interrupt of USB host controller. When the rx is
completed, cpu #1 may handle the interrupt and napi would
be scheduled. When the tx is finished, cpu #2 may handle
the interrupt and the tasklet is scheduled. Then, napi is
run on cpu #1, and tasklet is run on cpu #2.

> Also rtl8152_start_xmit() uses skb_queue_tail(&tp->tx_queue, skb);
> 
> But I see nothing really kicking the transmit if tx_free is empty ?

Tx callback function "write_bulk_callback" would deal with it.
The callback function would check if there are packets waiting
to be sent.

Best Regards,
Hayes

^ permalink raw reply

* Re: [PATCH net] batman-adv: fix uninit-value in batadv_netlink_get_ifindex()
From: Simon Wunderlich @ 2019-08-16  8:07 UTC (permalink / raw)
  To: David Miller; +Cc: edumazet, netdev, eric.dumazet, syzkaller, mareklindner, a
In-Reply-To: <20190814.123625.170482147366456100.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 605 bytes --]

Hi David,

yes, you will get it through me. I'll schedule a pull request next week.

Cheers,
      Simon

On Wednesday, August 14, 2019 6:36:25 PM CEST David Miller wrote:
> From: Eric Dumazet <edumazet@google.com>
> Date: Mon, 12 Aug 2019 04:57:27 -0700
> 
> > batadv_netlink_get_ifindex() needs to make sure user passed
> > a correct u32 attribute.
> 
>  ...
> 
> > Fixes: b60620cf567b ("batman-adv: netlink: hardif query")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Reported-by: syzbot <syzkaller@googlegroups.com>
> 
> Simon, I assume I will get this ultimately from you.
> 
> Thanks.


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* RE: [PATCH] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Tony Chuang @ 2019-08-16  8:06 UTC (permalink / raw)
  To: Tony Chuang, Jian-Hong Pan, Kalle Valo, David S . Miller
  Cc: linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux@endlessm.com
In-Reply-To: <F7CD281DE3E379468C6D07993EA72F84D18929BF@RTITMBSVM04.realtek.com.tw>

Hi,

A few more questions below

> > From: Jian-Hong Pan [mailto:jian-hong@endlessm.com]
> >
> > There is a mass of jobs between spin lock and unlock in the hardware
> > IRQ which will occupy much time originally. To make system work more
> > efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
> > reduce the time in hardware IRQ.
> >
> > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> > ---
> >  drivers/net/wireless/realtek/rtw88/pci.c | 36 +++++++++++++++++++-----
> >  1 file changed, 29 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/wireless/realtek/rtw88/pci.c
> > b/drivers/net/wireless/realtek/rtw88/pci.c
> > index 00ef229552d5..355606b167c6 100644
> > --- a/drivers/net/wireless/realtek/rtw88/pci.c
> > +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> > @@ -866,12 +866,29 @@ static irqreturn_t rtw_pci_interrupt_handler(int
> irq,
> > void *dev)
> >  {
> >  	struct rtw_dev *rtwdev = dev;
> >  	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> > -	u32 irq_status[4];
> > +	unsigned long flags;
> >
> > -	spin_lock(&rtwpci->irq_lock);
> > +	spin_lock_irqsave(&rtwpci->irq_lock, flags);

I think you can use 'spin_lock()' here as it's in IRQ context?

> >  	if (!rtwpci->irq_enabled)
> >  		goto out;
> >
> > +	/* disable RTW PCI interrupt to avoid more interrupts before the end of
> > +	 * thread function
> > +	 */
> > +	rtw_pci_disable_interrupt(rtwdev, rtwpci);


Why do we need rtw_pci_disable_interrupt() here.
Have you done any experiment and decided to add this.
If you have can you share your results to me?


> > +out:
> > +	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);

spin_unlock()

> > +
> > +	return IRQ_WAKE_THREAD;
> > +}
> > +
> > +static irqreturn_t rtw_pci_interrupt_threadfn(int irq, void *dev)
> > +{
> > +	struct rtw_dev *rtwdev = dev;
> > +	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> > +	unsigned long flags;
> > +	u32 irq_status[4];
> > +
> >  	rtw_pci_irq_recognized(rtwdev, rtwpci, irq_status);
> >
> >  	if (irq_status[0] & IMR_MGNTDOK)
> > @@ -891,8 +908,11 @@ static irqreturn_t rtw_pci_interrupt_handler(int
> irq,
> > void *dev)
> >  	if (irq_status[0] & IMR_ROK)
> >  		rtw_pci_rx_isr(rtwdev, rtwpci, RTW_RX_QUEUE_MPDU);
> >
> > -out:
> > -	spin_unlock(&rtwpci->irq_lock);
> > +	/* all of the jobs for this interrupt have been done */
> > +	spin_lock_irqsave(&rtwpci->irq_lock, flags);
> 
> Shouldn't we protect the ISRs above?
> 
> This patch could actually reduce the time of IRQ.
> But I think I need to further test it with PCI MSI interrupt.
> https://patchwork.kernel.org/patch/11081539/
> 
> Maybe we could drop the "rtw_pci_[enable/disable]_interrupt" when MSI
> Is enabled with this patch.
> 
> > +	if (rtw_flag_check(rtwdev, RTW_FLAG_RUNNING))
> > +		rtw_pci_enable_interrupt(rtwdev, rtwpci);
> > +	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
> >
> >  	return IRQ_HANDLED;
> >  }
> > @@ -1152,8 +1172,10 @@ static int rtw_pci_probe(struct pci_dev *pdev,
> >  		goto err_destroy_pci;
> >  	}
> >
> > -	ret = request_irq(pdev->irq, &rtw_pci_interrupt_handler,
> > -			  IRQF_SHARED, KBUILD_MODNAME, rtwdev);
> > +	ret = devm_request_threaded_irq(rtwdev->dev, pdev->irq,
> > +					rtw_pci_interrupt_handler,
> > +					rtw_pci_interrupt_threadfn,
> > +					IRQF_SHARED, KBUILD_MODNAME, rtwdev);
> >  	if (ret) {
> >  		ieee80211_unregister_hw(hw);
> >  		goto err_destroy_pci;
> > @@ -1192,7 +1214,7 @@ static void rtw_pci_remove(struct pci_dev
> *pdev)
> >  	rtw_pci_disable_interrupt(rtwdev, rtwpci);
> >  	rtw_pci_destroy(rtwdev, pdev);
> >  	rtw_pci_declaim(rtwdev, pdev);
> > -	free_irq(rtwpci->pdev->irq, rtwdev);
> > +	devm_free_irq(rtwdev->dev, rtwpci->pdev->irq, rtwdev);
> >  	rtw_core_deinit(rtwdev);
> >  	ieee80211_free_hw(hw);
> >  }
> > --
> > 2.20.1
> 
> Yan-Hsuan
> 

Thanks
Yan-Hsuan


^ permalink raw reply

* RE: [PATCH] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Tony Chuang @ 2019-08-16  7:54 UTC (permalink / raw)
  To: Jian-Hong Pan, Kalle Valo, David S . Miller
  Cc: linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux@endlessm.com
In-Reply-To: <20190816063109.4699-1-jian-hong@endlessm.com>

> From: Jian-Hong Pan [mailto:jian-hong@endlessm.com]
> 
> There is a mass of jobs between spin lock and unlock in the hardware
> IRQ which will occupy much time originally. To make system work more
> efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
> reduce the time in hardware IRQ.
> 
> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> ---
>  drivers/net/wireless/realtek/rtw88/pci.c | 36 +++++++++++++++++++-----
>  1 file changed, 29 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/wireless/realtek/rtw88/pci.c
> b/drivers/net/wireless/realtek/rtw88/pci.c
> index 00ef229552d5..355606b167c6 100644
> --- a/drivers/net/wireless/realtek/rtw88/pci.c
> +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> @@ -866,12 +866,29 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq,
> void *dev)
>  {
>  	struct rtw_dev *rtwdev = dev;
>  	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> -	u32 irq_status[4];
> +	unsigned long flags;
> 
> -	spin_lock(&rtwpci->irq_lock);
> +	spin_lock_irqsave(&rtwpci->irq_lock, flags);
>  	if (!rtwpci->irq_enabled)
>  		goto out;
> 
> +	/* disable RTW PCI interrupt to avoid more interrupts before the end of
> +	 * thread function
> +	 */
> +	rtw_pci_disable_interrupt(rtwdev, rtwpci);
> +out:
> +	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
> +
> +	return IRQ_WAKE_THREAD;
> +}
> +
> +static irqreturn_t rtw_pci_interrupt_threadfn(int irq, void *dev)
> +{
> +	struct rtw_dev *rtwdev = dev;
> +	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> +	unsigned long flags;
> +	u32 irq_status[4];
> +
>  	rtw_pci_irq_recognized(rtwdev, rtwpci, irq_status);
> 
>  	if (irq_status[0] & IMR_MGNTDOK)
> @@ -891,8 +908,11 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq,
> void *dev)
>  	if (irq_status[0] & IMR_ROK)
>  		rtw_pci_rx_isr(rtwdev, rtwpci, RTW_RX_QUEUE_MPDU);
> 
> -out:
> -	spin_unlock(&rtwpci->irq_lock);
> +	/* all of the jobs for this interrupt have been done */
> +	spin_lock_irqsave(&rtwpci->irq_lock, flags);

Shouldn't we protect the ISRs above?

This patch could actually reduce the time of IRQ.
But I think I need to further test it with PCI MSI interrupt.
https://patchwork.kernel.org/patch/11081539/

Maybe we could drop the "rtw_pci_[enable/disable]_interrupt" when MSI
Is enabled with this patch.

> +	if (rtw_flag_check(rtwdev, RTW_FLAG_RUNNING))
> +		rtw_pci_enable_interrupt(rtwdev, rtwpci);
> +	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
> 
>  	return IRQ_HANDLED;
>  }
> @@ -1152,8 +1172,10 @@ static int rtw_pci_probe(struct pci_dev *pdev,
>  		goto err_destroy_pci;
>  	}
> 
> -	ret = request_irq(pdev->irq, &rtw_pci_interrupt_handler,
> -			  IRQF_SHARED, KBUILD_MODNAME, rtwdev);
> +	ret = devm_request_threaded_irq(rtwdev->dev, pdev->irq,
> +					rtw_pci_interrupt_handler,
> +					rtw_pci_interrupt_threadfn,
> +					IRQF_SHARED, KBUILD_MODNAME, rtwdev);
>  	if (ret) {
>  		ieee80211_unregister_hw(hw);
>  		goto err_destroy_pci;
> @@ -1192,7 +1214,7 @@ static void rtw_pci_remove(struct pci_dev *pdev)
>  	rtw_pci_disable_interrupt(rtwdev, rtwpci);
>  	rtw_pci_destroy(rtwdev, pdev);
>  	rtw_pci_declaim(rtwdev, pdev);
> -	free_irq(rtwpci->pdev->irq, rtwdev);
> +	devm_free_irq(rtwdev->dev, rtwpci->pdev->irq, rtwdev);
>  	rtw_core_deinit(rtwdev);
>  	ieee80211_free_hw(hw);
>  }
> --
> 2.20.1

Yan-Hsuan

^ permalink raw reply

* [PATCH] be2net: eliminate enable field from be_aic_obj
From: Ivan Vecera @ 2019-08-16  7:15 UTC (permalink / raw)
  To: netdev; +Cc: sathya.perla, poros, sriharsha.basavapatna

Adaptive coalescing is managed per adapter not per event queue so it
does not needed to store 'enable' flag for each event queue.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/emulex/benet/be.h         | 2 +-
 drivers/net/ethernet/emulex/benet/be_ethtool.c | 7 ++++---
 drivers/net/ethernet/emulex/benet/be_main.c    | 7 ++++---
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index f287b5da5546..cf3e6f2892ff 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -192,7 +192,6 @@ struct be_eq_obj {
 } ____cacheline_aligned_in_smp;
 
 struct be_aic_obj {		/* Adaptive interrupt coalescing (AIC) info */
-	bool enable;
 	u32 min_eqd;		/* in usecs */
 	u32 max_eqd;		/* in usecs */
 	u32 prev_eqd;		/* in usecs */
@@ -589,6 +588,7 @@ struct be_adapter {
 
 	struct be_drv_stats drv_stats;
 	struct be_aic_obj aic_obj[MAX_EVT_QS];
+	bool aic_enabled;
 	u8 vlan_prio_bmap;	/* Available Priority BitMap */
 	u16 recommended_prio_bits;/* Recommended Priority bits in vlan tag */
 	struct be_dma_mem rx_filter; /* Cmd DMA mem for rx-filter */
diff --git a/drivers/net/ethernet/emulex/benet/be_ethtool.c b/drivers/net/ethernet/emulex/benet/be_ethtool.c
index 492f8769ac12..5bb5abf99588 100644
--- a/drivers/net/ethernet/emulex/benet/be_ethtool.c
+++ b/drivers/net/ethernet/emulex/benet/be_ethtool.c
@@ -329,8 +329,8 @@ static int be_get_coalesce(struct net_device *netdev,
 	et->tx_coalesce_usecs_high = aic->max_eqd;
 	et->tx_coalesce_usecs_low = aic->min_eqd;
 
-	et->use_adaptive_rx_coalesce = aic->enable;
-	et->use_adaptive_tx_coalesce = aic->enable;
+	et->use_adaptive_rx_coalesce = adapter->aic_enabled;
+	et->use_adaptive_tx_coalesce = adapter->aic_enabled;
 
 	return 0;
 }
@@ -346,8 +346,9 @@ static int be_set_coalesce(struct net_device *netdev,
 	struct be_eq_obj *eqo;
 	int i;
 
+	adapter->aic_enabled = et->use_adaptive_rx_coalesce;
+
 	for_all_evt_queues(adapter, eqo, i) {
-		aic->enable = et->use_adaptive_rx_coalesce;
 		aic->max_eqd = min(et->rx_coalesce_usecs_high, BE_MAX_EQD);
 		aic->min_eqd = min(et->rx_coalesce_usecs_low, aic->max_eqd);
 		aic->et_eqd = min(et->rx_coalesce_usecs, aic->max_eqd);
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 314e9868b861..39eb7d525043 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2147,7 +2147,7 @@ static int be_get_new_eqd(struct be_eq_obj *eqo)
 	int i;
 
 	aic = &adapter->aic_obj[eqo->idx];
-	if (!aic->enable) {
+	if (!adapter->aic_enabled) {
 		if (aic->jiffies)
 			aic->jiffies = 0;
 		eqd = aic->et_eqd;
@@ -2204,7 +2204,7 @@ static u32 be_get_eq_delay_mult_enc(struct be_eq_obj *eqo)
 	int eqd;
 	u32 mult_enc;
 
-	if (!aic->enable)
+	if (!adapter->aic_enabled)
 		return 0;
 
 	if (jiffies_to_msecs(now - aic->jiffies) < 1)
@@ -2959,6 +2959,8 @@ static int be_evt_queues_create(struct be_adapter *adapter)
 				    max(adapter->cfg_num_rx_irqs,
 					adapter->cfg_num_tx_irqs));
 
+	adapter->aic_enabled = true;
+
 	for_all_evt_queues(adapter, eqo, i) {
 		int numa_node = dev_to_node(&adapter->pdev->dev);
 
@@ -2966,7 +2968,6 @@ static int be_evt_queues_create(struct be_adapter *adapter)
 		eqo->adapter = adapter;
 		eqo->idx = i;
 		aic->max_eqd = BE_MAX_EQD;
-		aic->enable = true;
 
 		eq = &eqo->q;
 		rc = be_queue_alloc(adapter, eq, EVNT_Q_LEN,
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH] can: rcar_can: Remove unused platform data support
From: Marc Kleine-Budde @ 2019-08-16  7:06 UTC (permalink / raw)
  To: Geert Uytterhoeven, Wolfgang Grandegger, Sergei Shtylyov
  Cc: David S . Miller, Wolfram Sang, linux-can, linux-renesas-soc,
	netdev
In-Reply-To: <20190814092221.12959-1-geert+renesas@glider.be>


[-- Attachment #1.1: Type: text/plain, Size: 714 bytes --]

On 8/14/19 11:22 AM, Geert Uytterhoeven wrote:
> All R-Car platforms use DT for describing CAN controllers.
> R-Car CAN platform data support was never used in any upstream kernel.
> 
> Move the Clock Select Register settings enum into the driver, and remove
> platform data support and the corresponding header file.
> 
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

Applied to can-next-testing.

tnx,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH] can: flexcan: add LPSR mode support for i.MX7D
From: Marc Kleine-Budde @ 2019-08-16  7:05 UTC (permalink / raw)
  To: Joakim Zhang, linux-can@vger.kernel.org
  Cc: wg@grandegger.com, netdev@vger.kernel.org, dl-linux-imx
In-Reply-To: <DB7PR04MB46188FBD9DC87E620BBA8838E6AF0@DB7PR04MB4618.eurprd04.prod.outlook.com>


[-- Attachment #1.1: Type: text/plain, Size: 442 bytes --]

On 8/16/19 8:47 AM, Joakim Zhang wrote:
> Kindly Ping...

Can you please rebase both PM related patches to linux-can/testing and
post them in a series.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* RE: [PATCH] can: flexcan: add LPSR mode support for i.MX7D
From: Joakim Zhang @ 2019-08-16  6:47 UTC (permalink / raw)
  To: mkl@pengutronix.de, linux-can@vger.kernel.org
  Cc: wg@grandegger.com, netdev@vger.kernel.org, dl-linux-imx
In-Reply-To: <20190731055401.15454-1-qiangqing.zhang@nxp.com>


Kindly Ping...

Best Regards,
Joakim Zhang

> -----Original Message-----
> From: Joakim Zhang
> Sent: 2019年7月31日 13:57
> To: mkl@pengutronix.de; linux-can@vger.kernel.org
> Cc: wg@grandegger.com; netdev@vger.kernel.org; dl-linux-imx
> <linux-imx@nxp.com>; Joakim Zhang <qiangqing.zhang@nxp.com>
> Subject: [PATCH] can: flexcan: add LPSR mode support for i.MX7D
> 
> For i.MX7D LPSR mode, the controller will lost power and got the configuration
> state lost after system resume back. (coming i.MX8QM/QXP will also
> completely power off the domain, the controller state will be lost and needs
> restore).
> So we need to set pinctrl state again and re-start chip to do re-configuration
> after resume.
> 
> For wakeup case, it should not set pinctrl to sleep state by
> pinctrl_pm_select_sleep_state.
> For interface is not up before suspend case, we don't need re-configure as it
> will be configured by user later by interface up.
> 
> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> ---
>  drivers/net/can/flexcan.c | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c index
> c21b3507123e..228d07e84ddc 100644
> --- a/drivers/net/can/flexcan.c
> +++ b/drivers/net/can/flexcan.c
> @@ -26,6 +26,7 @@
>  #include <linux/platform_device.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/regulator/consumer.h>
> +#include <linux/pinctrl/consumer.h>
>  #include <linux/regmap.h>
> 
>  #define DRV_NAME			"flexcan"
> @@ -1889,7 +1890,7 @@ static int __maybe_unused flexcan_suspend(struct
> device *device)  {
>  	struct net_device *dev = dev_get_drvdata(device);
>  	struct flexcan_priv *priv = netdev_priv(dev);
> -	int err = 0;
> +	int err;
> 
>  	if (netif_running(dev)) {
>  		/* if wakeup is enabled, enter stop mode @@ -1899,25 +1900,27
> @@ static int __maybe_unused flexcan_suspend(struct device *device)
>  			enable_irq_wake(dev->irq);
>  			flexcan_enter_stop_mode(priv);
>  		} else {
> -			err = flexcan_chip_disable(priv);
> +			flexcan_chip_stop(dev);
> +
> +			err = pm_runtime_force_suspend(device);
>  			if (err)
>  				return err;
> 
> -			err = pm_runtime_force_suspend(device);
> +			pinctrl_pm_select_sleep_state(device);
>  		}
>  		netif_stop_queue(dev);
>  		netif_device_detach(dev);
>  	}
>  	priv->can.state = CAN_STATE_SLEEPING;
> 
> -	return err;
> +	return 0;
>  }
> 
>  static int __maybe_unused flexcan_resume(struct device *device)  {
>  	struct net_device *dev = dev_get_drvdata(device);
>  	struct flexcan_priv *priv = netdev_priv(dev);
> -	int err = 0;
> +	int err;
> 
>  	priv->can.state = CAN_STATE_ERROR_ACTIVE;
>  	if (netif_running(dev)) {
> @@ -1926,15 +1929,19 @@ static int __maybe_unused
> flexcan_resume(struct device *device)
>  		if (device_may_wakeup(device)) {
>  			disable_irq_wake(dev->irq);
>  		} else {
> +			pinctrl_pm_select_default_state(device);
> +
>  			err = pm_runtime_force_resume(device);
>  			if (err)
>  				return err;
> 
> -			err = flexcan_chip_enable(priv);
> +			err = flexcan_chip_start(dev);
> +			if (err)
> +				return err;
>  		}
>  	}
> 
> -	return err;
> +	return 0;
>  }
> 
>  static int __maybe_unused flexcan_runtime_suspend(struct device *device)
> --
> 2.17.1


^ permalink raw reply

* RE: [PATCH V2] can: flexcan: fix deadlock when using self wakeup
From: Joakim Zhang @ 2019-08-16  6:47 UTC (permalink / raw)
  To: mkl@pengutronix.de, linux-can@vger.kernel.org, sean@geanix.com
  Cc: wg@grandegger.com, netdev@vger.kernel.org, dl-linux-imx
In-Reply-To: <20190730024834.31182-1-qiangqing.zhang@nxp.com>


Kindly Ping...

Best Regards,
Joakim Zhang

> -----Original Message-----
> From: Joakim Zhang
> Sent: 2019年7月30日 10:52
> To: mkl@pengutronix.de; linux-can@vger.kernel.org; sean@geanix.com
> Cc: wg@grandegger.com; netdev@vger.kernel.org; dl-linux-imx
> <linux-imx@nxp.com>; Joakim Zhang <qiangqing.zhang@nxp.com>
> Subject: [PATCH V2] can: flexcan: fix deadlock when using self wakeup
> 
> As reproted by Sean Nyekjaer below:
> When suspending, when there is still can traffic on the interfaces the flexcan
> immediately wakes the platform again. As it should :-). But it throws this error
> msg:
> [ 3169.378661] PM: noirq suspend of devices failed
> 
> On the way down to suspend the interface that throws the error message does
> call flexcan_suspend but fails to call flexcan_noirq_suspend. That means the
> flexcan_enter_stop_mode is called, but on the way out of suspend the driver
> only calls flexcan_resume and skips flexcan_noirq_resume, thus it doesn't call
> flexcan_exit_stop_mode. This leaves the flexcan in stop mode, and with the
> current driver it can't recover from this even with a soft reboot, it requires a
> hard reboot.
> 
> The best way to exit stop mode is in Wake Up interrupt context, and then
> suspend() and resume() functions can be symmetric. However, stop mode
> request and ack will be controlled by SCU(System Control Unit)
> firmware(manage clock,power,stop mode, etc. by Cortex-M4 core) in coming
> i.MX8(QM/QXP). And SCU firmware interface can't be available in interrupt
> context.
> 
> For compatibillity, the wake up mechanism can't be symmetric, so we need
> in_stop_mode hack.
> 
> Fixes: de3578c198c6 ("can: flexcan: add self wakeup support")
> Reported-by: Sean Nyekjaer <sean@geanix.com>
> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> 
> Changelog:
> V1->V2:
> 	* add Reported-by tag.
> 	* rebase on patch: can:flexcan:fix stop mode acknowledgment.
> ---
>  drivers/net/can/flexcan.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c index
> fcec8bcb53d6..1dbec868d3ea 100644
> --- a/drivers/net/can/flexcan.c
> +++ b/drivers/net/can/flexcan.c
> @@ -282,6 +282,7 @@ struct flexcan_priv {
>  	const struct flexcan_devtype_data *devtype_data;
>  	struct regulator *reg_xceiver;
>  	struct flexcan_stop_mode stm;
> +	bool in_stop_mode;
> 
>  	/* Read and Write APIs */
>  	u32 (*read)(void __iomem *addr);
> @@ -1635,6 +1636,8 @@ static int __maybe_unused flexcan_suspend(struct
> device *device)
>  			err = flexcan_enter_stop_mode(priv);
>  			if (err)
>  				return err;
> +
> +			priv->in_stop_mode = true;
>  		} else {
>  			err = flexcan_chip_disable(priv);
>  			if (err)
> @@ -1659,6 +1662,15 @@ static int __maybe_unused flexcan_resume(struct
> device *device)
>  		netif_device_attach(dev);
>  		netif_start_queue(dev);
>  		if (device_may_wakeup(device)) {
> +			if (priv->in_stop_mode) {
> +				flexcan_enable_wakeup_irq(priv, false);
> +				err = flexcan_exit_stop_mode(priv);
> +				if (err)
> +					return  err;
> +
> +				priv->in_stop_mode = false;
> +			}
> +
>  			disable_irq_wake(dev->irq);
>  		} else {
>  			err = flexcan_chip_enable(priv);
> @@ -1674,6 +1686,11 @@ static int __maybe_unused
> flexcan_noirq_suspend(struct device *device)
>  	struct net_device *dev = dev_get_drvdata(device);
>  	struct flexcan_priv *priv = netdev_priv(dev);
> 
> +	/* Need to enable wakeup interrupt in noirq suspend stage. Otherwise,
> +	 * it will trigger continuously wakeup interrupt if the wakeup event
> +	 * comes before noirq suspend stage, and simultaneously it has enter
> +	 * the stop mode.
> +	 */
>  	if (netif_running(dev) && device_may_wakeup(device))
>  		flexcan_enable_wakeup_irq(priv, true);
> 
> @@ -1686,11 +1703,17 @@ static int __maybe_unused
> flexcan_noirq_resume(struct device *device)
>  	struct flexcan_priv *priv = netdev_priv(dev);
>  	int err;
> 
> +	/* Need to exit stop mode in noirq resume stage. Otherwise, it will
> +	 * trigger continuously wakeup interrupt if the wakeup event comes,
> +	 * and simultaneously it has still in stop mode.
> +	 */
>  	if (netif_running(dev) && device_may_wakeup(device)) {
>  		flexcan_enable_wakeup_irq(priv, false);
>  		err = flexcan_exit_stop_mode(priv);
>  		if (err)
>  			return err;
> +
> +		priv->in_stop_mode = false;
>  	}
> 
>  	return 0;
> --
> 2.17.1


^ permalink raw reply

* Re: [PATCH] can: flexcan: update hardware feature information for i.MX8QM
From: Marc Kleine-Budde @ 2019-08-16  6:45 UTC (permalink / raw)
  To: Joakim Zhang, linux-can@vger.kernel.org
  Cc: wg@grandegger.com, netdev@vger.kernel.org, dl-linux-imx
In-Reply-To: <20190815075638.23148-1-qiangqing.zhang@nxp.com>


[-- Attachment #1.1: Type: text/plain, Size: 497 bytes --]

On 8/15/19 9:58 AM, Joakim Zhang wrote:
> Update hardware feature information for i.MX8QM.
> 
> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>

Squashed into "can: flexcan: add imx8qm support"

Tnx,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2 4/5] r8152: support skb_add_rx_frag
From: Eric Dumazet @ 2019-08-16  6:42 UTC (permalink / raw)
  To: Hayes Wang, netdev; +Cc: nic_swsd, linux-kernel, linux-usb
In-Reply-To: <1394712342-15778-299-albertk@realtek.com>



On 8/13/19 5:42 AM, Hayes Wang wrote:
> Use skb_add_rx_frag() to reduce the memory copy for rx data.
> 
> Use a new list of rx_used to store the rx buffer which couldn't be
> reused yet.
> 
> Besides, the total number of rx buffer may be increased or decreased
> dynamically. And it is limited by RTL8152_MAX_RX_AGG.
> 
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>
>

...

>  			skb->protocol = eth_type_trans(skb, netdev);
>  			rtl_rx_vlan_tag(rx_desc, skb);
>  			if (work_done < budget) {
>  				napi_gro_receive(napi, skb);
>  				work_done++;
>  				stats->rx_packets++;
> -				stats->rx_bytes += pkt_len;
> +				stats->rx_bytes += skb->len;

use-after-free. skb is no longer in your hands after napi_gro_receive()


^ permalink raw reply

* Re: [PATCH net-next] r8152: divide the tx and rx bottom functions
From: Eric Dumazet @ 2019-08-16  6:39 UTC (permalink / raw)
  To: Hayes Wang, netdev; +Cc: nic_swsd, linux-kernel
In-Reply-To: <1394712342-15778-301-Taiwan-albertk@realtek.com>



On 8/14/19 10:30 AM, Hayes Wang wrote:
> Move the tx bottom function from NAPI to a new tasklet. Then, for
> multi-cores, the bottom functions of tx and rx may be run at same
> time with different cores. This is used to improve performance.
> 
>  

tasklet and NAPI are scheduled on the same core (the current
cpu calling napi_schedule() or tasklet_schedule())

I would rather not add this dubious tasklet, and instead try to understand
what is wrong in this driver ;)

The various napi_schedule() calls are suspect IMO.

Also rtl8152_start_xmit() uses skb_queue_tail(&tp->tx_queue, skb);

But I see nothing really kicking the transmit if tx_free is empty ?

tx_bottom() is only called from bottom_half() (called from r8152_poll())



^ permalink raw reply

* Re: [PATCH] can: flexcan: disable completely the ECC mechanism
From: Marc Kleine-Budde @ 2019-08-16  6:37 UTC (permalink / raw)
  To: Joakim Zhang, linux-can@vger.kernel.org
  Cc: wg@grandegger.com, netdev@vger.kernel.org, stefan@agner.ch,
	dl-linux-imx
In-Reply-To: <20190815075806.23212-1-qiangqing.zhang@nxp.com>


[-- Attachment #1.1: Type: text/plain, Size: 799 bytes --]

On 8/15/19 10:00 AM, Joakim Zhang wrote:
> The ECC(memory error detection and correction) mechanism can be
> activated or not, controlled by the ECCDIS bit in CAN_MECR. When
> disabled, updates on indications and reporting registers are stopped.
> So if want to disable ECC completely, had better assert ECCDIS bit,
> not just mask the related interrupts.
> 
> Fixes:cdce844865be("can: flexcan: add vf610 support for FlexCAN")
> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>

Added to can.

tnx
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Jian-Hong Pan @ 2019-08-16  6:31 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller
  Cc: linux-wireless, netdev, linux-kernel, linux, Jian-Hong Pan

There is a mass of jobs between spin lock and unlock in the hardware
IRQ which will occupy much time originally. To make system work more
efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
reduce the time in hardware IRQ.

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 36 +++++++++++++++++++-----
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index 00ef229552d5..355606b167c6 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -866,12 +866,29 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq, void *dev)
 {
 	struct rtw_dev *rtwdev = dev;
 	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
-	u32 irq_status[4];
+	unsigned long flags;
 
-	spin_lock(&rtwpci->irq_lock);
+	spin_lock_irqsave(&rtwpci->irq_lock, flags);
 	if (!rtwpci->irq_enabled)
 		goto out;
 
+	/* disable RTW PCI interrupt to avoid more interrupts before the end of
+	 * thread function
+	 */
+	rtw_pci_disable_interrupt(rtwdev, rtwpci);
+out:
+	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
+
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t rtw_pci_interrupt_threadfn(int irq, void *dev)
+{
+	struct rtw_dev *rtwdev = dev;
+	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
+	unsigned long flags;
+	u32 irq_status[4];
+
 	rtw_pci_irq_recognized(rtwdev, rtwpci, irq_status);
 
 	if (irq_status[0] & IMR_MGNTDOK)
@@ -891,8 +908,11 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq, void *dev)
 	if (irq_status[0] & IMR_ROK)
 		rtw_pci_rx_isr(rtwdev, rtwpci, RTW_RX_QUEUE_MPDU);
 
-out:
-	spin_unlock(&rtwpci->irq_lock);
+	/* all of the jobs for this interrupt have been done */
+	spin_lock_irqsave(&rtwpci->irq_lock, flags);
+	if (rtw_flag_check(rtwdev, RTW_FLAG_RUNNING))
+		rtw_pci_enable_interrupt(rtwdev, rtwpci);
+	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
 
 	return IRQ_HANDLED;
 }
@@ -1152,8 +1172,10 @@ static int rtw_pci_probe(struct pci_dev *pdev,
 		goto err_destroy_pci;
 	}
 
-	ret = request_irq(pdev->irq, &rtw_pci_interrupt_handler,
-			  IRQF_SHARED, KBUILD_MODNAME, rtwdev);
+	ret = devm_request_threaded_irq(rtwdev->dev, pdev->irq,
+					rtw_pci_interrupt_handler,
+					rtw_pci_interrupt_threadfn,
+					IRQF_SHARED, KBUILD_MODNAME, rtwdev);
 	if (ret) {
 		ieee80211_unregister_hw(hw);
 		goto err_destroy_pci;
@@ -1192,7 +1214,7 @@ static void rtw_pci_remove(struct pci_dev *pdev)
 	rtw_pci_disable_interrupt(rtwdev, rtwpci);
 	rtw_pci_destroy(rtwdev, pdev);
 	rtw_pci_declaim(rtwdev, pdev);
-	free_irq(rtwpci->pdev->irq, rtwdev);
+	devm_free_irq(rtwdev->dev, rtwpci->pdev->irq, rtwdev);
 	rtw_core_deinit(rtwdev);
 	ieee80211_free_hw(hw);
 }
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Samudrala, Sridhar @ 2019-08-16  6:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: magnus.karlsson, bjorn.topel, netdev, bpf, intel-wired-lan,
	maciej.fijalkowski, tom.herbert
In-Reply-To: <20190815122844.52eeda08@cakuba.netronome.com>



On 8/15/2019 12:28 PM, Jakub Kicinski wrote:
> On Wed, 14 Aug 2019 20:46:18 -0700, Sridhar Samudrala wrote:
>> This patch series introduces XDP_SKIP_BPF flag that can be specified
>> during the bind() call of an AF_XDP socket to skip calling the BPF
>> program in the receive path and pass the buffer directly to the socket.
>>
>> When a single AF_XDP socket is associated with a queue and a HW
>> filter is used to redirect the packets and the app is interested in
>> receiving all the packets on that queue, we don't need an additional
>> BPF program to do further filtering or lookup/redirect to a socket.
>>
>> Here are some performance numbers collected on
>>    - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
>>    - Intel 40Gb Ethernet NIC (i40e)
>>
>> All tests use 2 cores and the results are in Mpps.
>>
>> turbo on (default)
>> ---------------------------------------------	
>>                        no-skip-bpf    skip-bpf
>> ---------------------------------------------	
>> rxdrop zerocopy           21.9         38.5
>> l2fwd  zerocopy           17.0         20.5
>> rxdrop copy               11.1         13.3
>> l2fwd  copy                1.9          2.0
>>
>> no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>> ---------------------------------------------	
>>                        no-skip-bpf    skip-bpf
>> ---------------------------------------------	
>> rxdrop zerocopy           15.4         29.0
>> l2fwd  zerocopy           11.8         18.2
>> rxdrop copy                8.2         10.5
>> l2fwd  copy                1.7          1.7
>> ---------------------------------------------	
> 
> Could you include a third column here - namely the in-XDP performance?
> AFAIU the way to achieve better performance with AF_XDP is to move the
> fast path into the kernel's XDP program..

The in-xdp drop that can be measured with xdp1 is lower than rxdrop
zerocopy with skip-bpf although in-xdp drop uses only 1 core. af-xdp 
1-core performance would improve with need-wakeup or busypoll patches 
and based on early experiments so far af-xdp with need-wakeup/busypoll + 
skip-bpf perf is higher than in-xdp drop.

Will include in-xdp drop data too in the next revision.

> 
> Maciej's work on batching XDP program's execution should lower the
> retpoline overhead, without leaning close to the bypass model.
> 

^ permalink raw reply

* Re: [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Samudrala, Sridhar @ 2019-08-16  6:12 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, magnus.karlsson, bjorn.topel,
	netdev, bpf, intel-wired-lan, maciej.fijalkowski, tom.herbert
In-Reply-To: <87ftm2wdzk.fsf@toke.dk>



On 8/15/2019 10:11 AM, Toke Høiland-Jørgensen wrote:
> "Samudrala, Sridhar" <sridhar.samudrala@intel.com> writes:
> 
>> On 8/15/2019 4:12 AM, Toke Høiland-Jørgensen wrote:
>>> Sridhar Samudrala <sridhar.samudrala@intel.com> writes:
>>>
>>>> This patch series introduces XDP_SKIP_BPF flag that can be specified
>>>> during the bind() call of an AF_XDP socket to skip calling the BPF
>>>> program in the receive path and pass the buffer directly to the socket.
>>>>
>>>> When a single AF_XDP socket is associated with a queue and a HW
>>>> filter is used to redirect the packets and the app is interested in
>>>> receiving all the packets on that queue, we don't need an additional
>>>> BPF program to do further filtering or lookup/redirect to a socket.
>>>>
>>>> Here are some performance numbers collected on
>>>>     - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
>>>>     - Intel 40Gb Ethernet NIC (i40e)
>>>>
>>>> All tests use 2 cores and the results are in Mpps.
>>>>
>>>> turbo on (default)
>>>> ---------------------------------------------	
>>>>                         no-skip-bpf    skip-bpf
>>>> ---------------------------------------------	
>>>> rxdrop zerocopy           21.9         38.5
>>>> l2fwd  zerocopy           17.0         20.5
>>>> rxdrop copy               11.1         13.3
>>>> l2fwd  copy                1.9          2.0
>>>>
>>>> no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>>>> ---------------------------------------------	
>>>>                         no-skip-bpf    skip-bpf
>>>> ---------------------------------------------	
>>>> rxdrop zerocopy           15.4         29.0
>>>> l2fwd  zerocopy           11.8         18.2
>>>> rxdrop copy                8.2         10.5
>>>> l2fwd  copy                1.7          1.7
>>>> ---------------------------------------------
>>>
>>> You're getting this performance boost by adding more code in the fast
>>> path for every XDP program; so what's the performance impact of that for
>>> cases where we do run an eBPF program?
>>
>> The no-skip-bpf results are pretty close to what i see before the
>> patches are applied. As umem is cached in rx_ring for zerocopy the
>> overhead is much smaller compared to the copy scenario where i am
>> currently calling xdp_get_umem_from_qid().
> 
> I meant more for other XDP programs; what is the performance impact of
> XDP_DROP, for instance?

Will run xdp1 with and without the patches and include that data with 
the next revision.


^ permalink raw reply

* Re: [PATCH v4 11/14] net: phy: adin: implement Energy Detect Powerdown mode
From: Ardelean, Alexandru @ 2019-08-16  6:09 UTC (permalink / raw)
  To: devicetree@vger.kernel.org, netdev@vger.kernel.org,
	f.fainelli@gmail.com, linux-kernel@vger.kernel.org
  Cc: andrew@lunn.ch, davem@davemloft.net, mark.rutland@arm.com,
	robh+dt@kernel.org, hkallweit1@gmail.com
In-Reply-To: <f13feaee-0bad-a774-5527-296b6f74c91b@gmail.com>

On Wed, 2019-08-14 at 10:57 -0700, Florian Fainelli wrote:
> [External]
> 
> 
> 
> On 8/12/2019 4:23 AM, Alexandru Ardelean wrote:
> > The ADIN PHYs support Energy Detect Powerdown mode, which puts the PHY into
> > a low power mode when there is no signal on the wire (typically cable
> > unplugged).
> > This behavior is enabled by default, but can be disabled via device
> > property.
> 
> We could consider adding a PHY tunable, having this as a Device Tree
> property amounts to putting a policy inside DT, which is frowned upon.

That would be interesting actually, and I would also prefer it over static DT.
Maybe for this patch, I'll just enable EDPD by default and see about a tuna option.

> 
> > Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
> 
> Other than that, the code looks fine:
> 
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the kbuild tree
From: Stephen Rothwell @ 2019-08-16  6:01 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David Miller, Networking, Masahiro Yamada,
	Linux Next Mailing List, Linux Kernel Mailing List, Kees Cook,
	Andrii Nakryiko, Daniel Borkmann
In-Reply-To: <CAEf4BzY9dDZF-DBDmuQQz0Rcx3DNGvQn_GLr0Uar1PAbAf2iig@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 426 bytes --]

Hi Andrii,

On Thu, 15 Aug 2019 22:21:29 -0700 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> Thanks, Stephen! Looks good except one minor issue below.

Thanks for checking.

> >   vmlinux_link()
> >   {
> >  +      info LD ${2}  
> 
> This needs to be ${1}.

At least its only an information message and doesn't affect the build.
I will fix my resolution for Monday.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Song Liu @ 2019-08-16  5:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, Andy Lutomirski, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
	Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <B0364660-AD6A-4E5C-B04F-3B6DA78B4BBE@amacapital.net>



> On Aug 15, 2019, at 5:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> 
> 
> 
>> On Aug 15, 2019, at 4:46 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> 
>>> 
>>> I'm not sure why you draw the line for VMs -- they're just as buggy
>>> as anything else. Regardless, I reject this line of thinking: yes,
>>> all software is buggy, but that isn't a reason to give up.
>> 
>> hmm. are you saying you want kernel community to work towards
>> making containers (namespaces) being able to run arbitrary code
>> downloaded from the internet?
> 
> Yes.
> 
> As an example, Sandstorm uses a combination of namespaces (user, network, mount, ipc) and a moderately permissive seccomp policy to run arbitrary code. Not just little snippets, either — node.js, Mongo, MySQL, Meteor, and other fairly heavyweight stacks can all run under Sandstorm, with the whole stack (database engine binaries, etc) supplied by entirely untrusted customers.  During the time Sandstorm was under active development, I can recall *one* bug that would have allowed a sandbox escape. That’s a pretty good track record.  (Also, Meltdown and Spectre, sigh.)
> 
> To be clear, Sandstorm did not allow creation of a userns by the untrusted code, and Sandstorm would have heavily restricted bpf(), but that should only be necessary because of the possibility of kernel bugs, not because of the overall design.
> 
> Alexei, I’m trying to encourage you to aim for something even better than you have now. Right now, if you grant a user various very strong capabilities, that user’s systemd can use bpf network filters.  Your proposal would allow this with a different, but still very strong, set of capabilities. There’s nothing wrong with this per se, but I think you can aim much higher:
> 
> CAP_NET_ADMIN and your CAP_BPF both effectively allow the holder to take over the system, *by design*.  I’m suggesting that you engage the security community (Kees, myself, Aleksa, Jann, Serge, Christian, etc) to aim for something better: make it so that a normal Linux distro would be willing to relax its settings enough so that normal users can use bpf filtering in the systemd units and maybe eventually use even more bpf() capabilities. And let’s make is to that mainstream container managers (that use userns!) will be willing (as an option) to delegate bpf() to their containers. We’re happy to help design, review, and even write code, but we need you to be willing to work with us to make a design that seems like it will work and then to wait long enough to merge it for us to think about it, try to poke holes in it, and convince ourselves and each other that it has a good chance of being sound.
> 
> Obviously there will be many cases where an unprivileged program should *not* be able to use bpf() IP filtering, but let’s make it so that enabling these advanced features does not automatically give away the keys to the kingdom.
> 
> (Sandstorm still exists but is no longer as actively developed, sadly.)

I am trying to understand different perspectives here. 

Disclaimer: Alexei and I both work for Facebook. But he may disagree 
with everything I am about to say below, because we haven't sync'ed 
about this for a while. :)

I think there are two types of use cases here: 

    1. CAP_BPF_ADMIN: one big key to all sys_bpf(). 
    2. CAP_BPF: subset of sys_bpf() that is safe for containers.

IIUC, currently, CAP_BPF_ADMIN is (almost) same as CAP_SYS_ADMIN. 
And there aren't many real world use cases for CAP_BPF. 

The /dev/bpf patch tries to separate CAP_BPF_ADMIN from CAP_SYS_ADMIN.
On the other hand, Andy would like to introduce CAP_BPF and build
amazing use cases around it (chicken-egg problem). 

Did I misunderstand anything?

If not, I think these two use cases do not really conflict with each
other, and we probably need both of them. Then, the next question is 
do we really need both/either of them. Maybe having two separate 
discussions would make it easier?


The following are some questions I am trying to understand for 
the two cases. 

For CAP_BPF_ADMIN (or /dev/bpf):
Can we just use CAP_NET_ADMIN? It is safer than CAP_SYS_ADMIN, and
reuse existing CAP_ should be easier than introducing a new one? 

For CAP_BPF: 
Do we really need it for the containers? Is it possible to implement 
all container use cases with SUID? At this moment, I think SUID is 
the right way to go for this use case, because this is likely to 
start with a small set of functionalities. We can introduce CAP_BPF
when the container use case is too complicated for SUID. 


I hope some of these questions/thoughts would make some sense?

Thanks,
Song

^ permalink raw reply

* Re: [net-next v2 1/1] tipc: clean up skb list lock handling on send path
From: Xin Long @ 2019-08-16  5:29 UTC (permalink / raw)
  To: Jon Maloy
  Cc: davem, network dev, tung.q.nguyen, hoang.h.le, Long Xin, shuali,
	Ying Xue, Eric Dumazet, tipc-discussion
In-Reply-To: <1565880170-19548-1-git-send-email-jon.maloy@ericsson.com>

On Thu, Aug 15, 2019 at 11:36 PM Jon Maloy <jon.maloy@ericsson.com> wrote:
>
> The policy for handling the skb list locks on the send and receive paths
> is simple.
>
> - On the send path we never need to grab the lock on the 'xmitq' list
>   when the destination is an exernal node.
>
> - On the receive path we always need to grab the lock on the 'inputq'
>   list, irrespective of source node.
>
> However, when transmitting node local messages those will eventually
> end up on the receive path of a local socket, meaning that the argument
> 'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in  the
> function tipc_sk_rcv(). This has been handled by always initializing
> the spinlock of the 'xmitq' list at message creation, just in case it
> may end up on the receive path later, and despite knowing that the lock
> in most cases never will be used.
>
> This approach is inaccurate and confusing, and has also concealed the
> fact that the stated 'no lock grabbing' policy for the send path is
> violated in some cases.
>
> We now clean up this by never initializing the lock at message creation,
> instead doing this at the moment we find that the message actually will
> enter the receive path. At the same time we fix the four locations
> where we incorrectly access the spinlock on the send/error path.
>
> This patch also reverts commit d12cffe9329f ("tipc: ensure head->lock
> is initialised") which has now become redundant.
>
> CC: Eric Dumazet <edumazet@google.com>
> Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
> Acked-by: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
>
> ---
> v2: removed more unnecessary lock initializations after feedback
>     from Xin Long.
> ---
>  net/tipc/bcast.c      | 10 +++++-----
>  net/tipc/group.c      |  4 ++--
>  net/tipc/link.c       | 14 +++++++-------
>  net/tipc/name_distr.c |  2 +-
>  net/tipc/node.c       |  7 ++++---
>  net/tipc/socket.c     | 14 +++++++-------
>  6 files changed, 26 insertions(+), 25 deletions(-)
>
> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
> index 34f3e56..6ef1abd 100644
> --- a/net/tipc/bcast.c
> +++ b/net/tipc/bcast.c
> @@ -185,7 +185,7 @@ static void tipc_bcbase_xmit(struct net *net, struct sk_buff_head *xmitq)
>         }
>
>         /* We have to transmit across all bearers */
> -       skb_queue_head_init(&_xmitq);
> +       __skb_queue_head_init(&_xmitq);
>         for (bearer_id = 0; bearer_id < MAX_BEARERS; bearer_id++) {
>                 if (!bb->dests[bearer_id])
>                         continue;
> @@ -256,7 +256,7 @@ static int tipc_bcast_xmit(struct net *net, struct sk_buff_head *pkts,
>         struct sk_buff_head xmitq;
>         int rc = 0;
>
> -       skb_queue_head_init(&xmitq);
> +       __skb_queue_head_init(&xmitq);
>         tipc_bcast_lock(net);
>         if (tipc_link_bc_peers(l))
>                 rc = tipc_link_xmit(l, pkts, &xmitq);
> @@ -286,7 +286,7 @@ static int tipc_rcast_xmit(struct net *net, struct sk_buff_head *pkts,
>         u32 dnode, selector;
>
>         selector = msg_link_selector(buf_msg(skb_peek(pkts)));
> -       skb_queue_head_init(&_pkts);
> +       __skb_queue_head_init(&_pkts);
>
>         list_for_each_entry_safe(dst, tmp, &dests->list, list) {
>                 dnode = dst->node;
> @@ -344,7 +344,7 @@ static int tipc_mcast_send_sync(struct net *net, struct sk_buff *skb,
>         msg_set_size(_hdr, MCAST_H_SIZE);
>         msg_set_is_rcast(_hdr, !msg_is_rcast(hdr));
>
> -       skb_queue_head_init(&tmpq);
> +       __skb_queue_head_init(&tmpq);
>         __skb_queue_tail(&tmpq, _skb);
>         if (method->rcast)
>                 tipc_bcast_xmit(net, &tmpq, cong_link_cnt);
> @@ -378,7 +378,7 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head *pkts,
>         int rc = 0;
>
>         skb_queue_head_init(&inputq);
> -       skb_queue_head_init(&localq);
> +       __skb_queue_head_init(&localq);
>
>         /* Clone packets before they are consumed by next call */
>         if (dests->local && !tipc_msg_reassemble(pkts, &localq)) {
> diff --git a/net/tipc/group.c b/net/tipc/group.c
> index 5f98d38..89257e2 100644
> --- a/net/tipc/group.c
> +++ b/net/tipc/group.c
> @@ -199,7 +199,7 @@ void tipc_group_join(struct net *net, struct tipc_group *grp, int *sk_rcvbuf)
>         struct tipc_member *m, *tmp;
>         struct sk_buff_head xmitq;
>
> -       skb_queue_head_init(&xmitq);
> +       __skb_queue_head_init(&xmitq);
>         rbtree_postorder_for_each_entry_safe(m, tmp, tree, tree_node) {
>                 tipc_group_proto_xmit(grp, m, GRP_JOIN_MSG, &xmitq);
>                 tipc_group_update_member(m, 0);
> @@ -435,7 +435,7 @@ bool tipc_group_cong(struct tipc_group *grp, u32 dnode, u32 dport,
>                 return true;
>         if (state == MBR_PENDING && adv == ADV_IDLE)
>                 return true;
> -       skb_queue_head_init(&xmitq);
> +       __skb_queue_head_init(&xmitq);
>         tipc_group_proto_xmit(grp, m, GRP_ADV_MSG, &xmitq);
>         tipc_node_distr_xmit(grp->net, &xmitq);
>         return true;
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index dd3155b..289e848 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -959,7 +959,7 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list,
>                 pr_warn("Too large msg, purging xmit list %d %d %d %d %d!\n",
>                         skb_queue_len(list), msg_user(hdr),
>                         msg_type(hdr), msg_size(hdr), mtu);
> -               skb_queue_purge(list);
> +               __skb_queue_purge(list);
>                 return -EMSGSIZE;
>         }
>
> @@ -988,7 +988,7 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list,
>                 if (likely(skb_queue_len(transmq) < maxwin)) {
>                         _skb = skb_clone(skb, GFP_ATOMIC);
>                         if (!_skb) {
> -                               skb_queue_purge(list);
> +                               __skb_queue_purge(list);
>                                 return -ENOBUFS;
>                         }
>                         __skb_dequeue(list);
> @@ -1668,7 +1668,7 @@ void tipc_link_create_dummy_tnl_msg(struct tipc_link *l,
>         struct sk_buff *skb;
>         u32 dnode = l->addr;
>
> -       skb_queue_head_init(&tnlq);
> +       __skb_queue_head_init(&tnlq);
>         skb = tipc_msg_create(TUNNEL_PROTOCOL, FAILOVER_MSG,
>                               INT_H_SIZE, BASIC_H_SIZE,
>                               dnode, onode, 0, 0, 0);
> @@ -1708,9 +1708,9 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl,
>         if (!tnl)
>                 return;
>
> -       skb_queue_head_init(&tnlq);
> -       skb_queue_head_init(&tmpxq);
> -       skb_queue_head_init(&frags);
> +       __skb_queue_head_init(&tnlq);
> +       __skb_queue_head_init(&tmpxq);
> +       __skb_queue_head_init(&frags);
>
>         /* At least one packet required for safe algorithm => add dummy */
>         skb = tipc_msg_create(TIPC_LOW_IMPORTANCE, TIPC_DIRECT_MSG,
> @@ -1720,7 +1720,7 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl,
>                 pr_warn("%sunable to create tunnel packet\n", link_co_err);
>                 return;
>         }
> -       skb_queue_tail(&tnlq, skb);
> +       __skb_queue_tail(&tnlq, skb);
>         tipc_link_xmit(l, &tnlq, &tmpxq);
>         __skb_queue_purge(&tmpxq);
>
> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
> index 44abc8e..61219f0 100644
> --- a/net/tipc/name_distr.c
> +++ b/net/tipc/name_distr.c
> @@ -190,7 +190,7 @@ void tipc_named_node_up(struct net *net, u32 dnode)
>         struct name_table *nt = tipc_name_table(net);
>         struct sk_buff_head head;
>
> -       skb_queue_head_init(&head);
> +       __skb_queue_head_init(&head);
>
>         read_lock_bh(&nt->cluster_scope_lock);
>         named_distribute(net, &head, dnode, &nt->cluster_scope);
> diff --git a/net/tipc/node.c b/net/tipc/node.c
> index 1bdcf0f..c8f6177 100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -1444,13 +1444,14 @@ int tipc_node_xmit(struct net *net, struct sk_buff_head *list,
>
>         if (in_own_node(net, dnode)) {
>                 tipc_loopback_trace(net, list);
> +               spin_lock_init(&list->lock);
>                 tipc_sk_rcv(net, list);
>                 return 0;
>         }
>
>         n = tipc_node_find(net, dnode);
>         if (unlikely(!n)) {
> -               skb_queue_purge(list);
> +               __skb_queue_purge(list);
>                 return -EHOSTUNREACH;
>         }
>
> @@ -1459,7 +1460,7 @@ int tipc_node_xmit(struct net *net, struct sk_buff_head *list,
>         if (unlikely(bearer_id == INVALID_BEARER_ID)) {
>                 tipc_node_read_unlock(n);
>                 tipc_node_put(n);
> -               skb_queue_purge(list);
> +               __skb_queue_purge(list);
>                 return -EHOSTUNREACH;
>         }
>
> @@ -1491,7 +1492,7 @@ int tipc_node_xmit_skb(struct net *net, struct sk_buff *skb, u32 dnode,
>  {
>         struct sk_buff_head head;
>
> -       skb_queue_head_init(&head);
> +       __skb_queue_head_init(&head);
>         __skb_queue_tail(&head, skb);
>         tipc_node_xmit(net, &head, dnode, selector);
>         return 0;
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 83ae41d..3b9f8cc 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -809,7 +809,7 @@ static int tipc_sendmcast(struct  socket *sock, struct tipc_name_seq *seq,
>         msg_set_nameupper(hdr, seq->upper);
>
>         /* Build message as chain of buffers */
> -       skb_queue_head_init(&pkts);
> +       __skb_queue_head_init(&pkts);
>         rc = tipc_msg_build(hdr, msg, 0, dlen, mtu, &pkts);
>
>         /* Send message if build was successful */
> @@ -853,7 +853,7 @@ static int tipc_send_group_msg(struct net *net, struct tipc_sock *tsk,
>         msg_set_grp_bc_seqno(hdr, bc_snd_nxt);
>
>         /* Build message as chain of buffers */
> -       skb_queue_head_init(&pkts);
> +       __skb_queue_head_init(&pkts);
>         mtu = tipc_node_get_mtu(net, dnode, tsk->portid);
>         rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts);
>         if (unlikely(rc != dlen))
> @@ -1058,7 +1058,7 @@ static int tipc_send_group_bcast(struct socket *sock, struct msghdr *m,
>         msg_set_grp_bc_ack_req(hdr, ack);
>
>         /* Build message as chain of buffers */
> -       skb_queue_head_init(&pkts);
> +       __skb_queue_head_init(&pkts);
>         rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts);
>         if (unlikely(rc != dlen))
>                 return rc;
> @@ -1387,7 +1387,7 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dlen)
>         if (unlikely(rc))
>                 return rc;
>
> -       skb_queue_head_init(&pkts);
> +       __skb_queue_head_init(&pkts);
>         mtu = tipc_node_get_mtu(net, dnode, tsk->portid);
>         rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts);
>         if (unlikely(rc != dlen))
> @@ -1445,7 +1445,7 @@ static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dlen)
>         int send, sent = 0;
>         int rc = 0;
>
> -       skb_queue_head_init(&pkts);
> +       __skb_queue_head_init(&pkts);
>
>         if (unlikely(dlen > INT_MAX))
>                 return -EMSGSIZE;
> @@ -1805,7 +1805,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m,
>
>         /* Send group flow control advertisement when applicable */
>         if (tsk->group && msg_in_group(hdr) && !grp_evt) {
> -               skb_queue_head_init(&xmitq);
> +               __skb_queue_head_init(&xmitq);
>                 tipc_group_update_rcv_win(tsk->group, tsk_blocks(hlen + dlen),
>                                           msg_orignode(hdr), msg_origport(hdr),
>                                           &xmitq);
> @@ -2674,7 +2674,7 @@ static void tipc_sk_timeout(struct timer_list *t)
>         struct sk_buff_head list;
>         int rc = 0;
>
> -       skb_queue_head_init(&list);
> +       __skb_queue_head_init(&list);
>         bh_lock_sock(sk);
>
>         /* Try again later if socket is busy */
> --
> 2.1.4
>
Reviewed-by: Xin Long <lucien.xin@gmail.com>

^ permalink raw reply

* Re: [PATCH -next v2] btf: fix return value check in btf_vmlinux_init()
From: Alexei Starovoitov @ 2019-08-16  5:24 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, Network Development, bpf,
	kernel-janitors
In-Reply-To: <20190816024044.139761-1-weiyongjun1@huawei.com>

On Thu, Aug 15, 2019 at 7:36 PM Wei Yongjun <weiyongjun1@huawei.com> wrote:
>
> In case of error, the function kobject_create_and_add() returns NULL
> pointer not ERR_PTR(). The IS_ERR() test in the return value check
> should be replaced with NULL test.
>
> Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> Acked-by: Andrii Nakryiko <andriin@fb.com>

Applied. Thanks.

Please spell out [PATCH v2 bpf-next] in the subject next time.

^ permalink raw reply

* Re: [PATCH bpf-next 2/4] selftests/bpf: test_progs: test__skip
From: Andrii Nakryiko @ 2019-08-16  5:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Stanislav Fomichev, Stanislav Fomichev, Networking, bpf,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
In-Reply-To: <CAADnVQ+Bz6R17bassdr3xOR7rhbuw-HbdXYu-hHkxE8S2WiNrA@mail.gmail.com>

On Thu, Aug 15, 2019 at 10:16 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Aug 14, 2019 at 1:01 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > >
> > > Let me know if you see a value in highlighting test vs subtest skip.
> > >
> > > Other related question is: should we do verbose output in case
> > > of a skip? Right now we don't do it.
> >
> > It might be useful, I guess, especially if it's not too common. But
> > Alexei is way more picky about stuff like that, so I'd defer to him. I
> > have no problem with a clean "SKIPPED: <test>/<subtest> (maybe some
> > reason for skipping here)" message.
>
> Since test_progs prints single number for FAILED tests then single number
> for SKIPPED tests is fine as well.

I'm fine with single number, but it should count number of subtests
skipped, if there are subtests within test, same as for FAILED.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox