Netdev List
 help / color / mirror / Atom feed
* [PATCH V2 net-next 0/7] net: hns3: add a feature & bugfixes & cleanups
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski

This patch-set includes a VF feature, bugfixes and cleanups for the HNS3
ethernet controller driver.

[patch 01/07] adds ethtool_ops.set_channels support for HNS3 VF driver

[patch 02/07] adds a recovery for setting channel fail.

[patch 03/07] fixes an error related to shaper parameter algorithm.

[patch 04/07] fixes an error related to ksetting.

[patch 05/07] adds cleanups for some log pinting.

[patch 06/07] adds a NULL pointer check before function calling.

[patch 07/07] adds some debugging information for reset issue.

Change log:
V1->V2: fixes comment from David Miller.

Guangbin Huang (4):
  net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver
  net: hns3: fix port setting handle for fibre port
  net: hns3: modify some logs format
  net: hns3: check NULL pointer before use

Huazhong Tan (1):
  net: hns3: add some DFX info for reset issue

Peng Li (1):
  net: hns3: revert to old channel when setting new channel num fail

Yonglong Liu (1):
  net: hns3: fix shaper parameter algorithm

 drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c |  7 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 54 ++++++++++----
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 16 +++++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c |  2 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 32 ++++++---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 13 ++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  2 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  | 11 ++-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 83 ++++++++++++++++++++--
 9 files changed, 174 insertions(+), 46 deletions(-)

-- 
2.7.4


^ permalink raw reply

* [PATCH V2 net-next 4/7] net: hns3: fix port setting handle for fibre port
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski
In-Reply-To: <1568169639-43658-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin2@huawei.com>

For hardware doesn't support use specified speed and duplex
to negotiate, it's unnecessary to check and modify the port
speed and duplex for fibre port when autoneg is on.

Fixes: 22f48e24a23d ("net: hns3: add autoneg and change speed support for fibre port")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index f5a681d..680c350 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -726,6 +726,12 @@ static int hns3_check_ksettings_param(const struct net_device *netdev,
 	u8 duplex;
 	int ret;
 
+	/* hw doesn't support use specified speed and duplex to negotiate,
+	 * unnecessary to check them when autoneg on.
+	 */
+	if (cmd->base.autoneg)
+		return 0;
+
 	if (ops->get_ksettings_an_result) {
 		ops->get_ksettings_an_result(handle, &autoneg, &speed, &duplex);
 		if (cmd->base.autoneg == autoneg && cmd->base.speed == speed &&
@@ -787,6 +793,15 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
 			return ret;
 	}
 
+	/* hw doesn't support use specified speed and duplex to negotiate,
+	 * ignore them when autoneg on.
+	 */
+	if (cmd->base.autoneg) {
+		netdev_info(netdev,
+			    "autoneg is on, ignore the speed and duplex\n");
+		return 0;
+	}
+
 	if (ops->cfg_mac_speed_dup_h)
 		ret = ops->cfg_mac_speed_dup_h(handle, cmd->base.speed,
 					       cmd->base.duplex);
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 7/7] net: hns3: add some DFX info for reset issue
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski
In-Reply-To: <1568169639-43658-1-git-send-email-tanhuazhong@huawei.com>

This patch adds more information for reset DFX. Also, adds some
cleanups to reset info, move reset_fail_cnt into struct
hclge_rst_stats, and modifies some print formats.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 32 ++++++++++++++++------
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 11 ++++----
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  2 +-
 3 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
index 6dcce48..d0128d7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
@@ -931,22 +931,36 @@ static void hclge_dbg_fd_tcam(struct hclge_dev *hdev)
 
 static void hclge_dbg_dump_rst_info(struct hclge_dev *hdev)
 {
-	dev_info(&hdev->pdev->dev, "PF reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "PF reset count: %u\n",
 		 hdev->rst_stats.pf_rst_cnt);
-	dev_info(&hdev->pdev->dev, "FLR reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "FLR reset count: %u\n",
 		 hdev->rst_stats.flr_rst_cnt);
-	dev_info(&hdev->pdev->dev, "CORE reset count: %d\n",
-		 hdev->rst_stats.core_rst_cnt);
-	dev_info(&hdev->pdev->dev, "GLOBAL reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "GLOBAL reset count: %u\n",
 		 hdev->rst_stats.global_rst_cnt);
-	dev_info(&hdev->pdev->dev, "IMP reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "IMP reset count: %u\n",
 		 hdev->rst_stats.imp_rst_cnt);
-	dev_info(&hdev->pdev->dev, "reset done count: %d\n",
+	dev_info(&hdev->pdev->dev, "reset done count: %u\n",
 		 hdev->rst_stats.reset_done_cnt);
-	dev_info(&hdev->pdev->dev, "HW reset done count: %d\n",
+	dev_info(&hdev->pdev->dev, "HW reset done count: %u\n",
 		 hdev->rst_stats.hw_reset_done_cnt);
-	dev_info(&hdev->pdev->dev, "reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "reset count: %u\n",
 		 hdev->rst_stats.reset_cnt);
+	dev_info(&hdev->pdev->dev, "reset count: %u\n",
+		 hdev->rst_stats.reset_cnt);
+	dev_info(&hdev->pdev->dev, "reset fail count: %u\n",
+		 hdev->rst_stats.reset_fail_cnt);
+	dev_info(&hdev->pdev->dev, "vector0 interrupt enable status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_MISC_VECTOR_REG_BASE));
+	dev_info(&hdev->pdev->dev, "reset interrupt source: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_MISC_RESET_STS_REG));
+	dev_info(&hdev->pdev->dev, "reset interrupt status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_MISC_VECTOR_INT_STS));
+	dev_info(&hdev->pdev->dev, "hardware reset status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_GLOBAL_RESET_REG));
+	dev_info(&hdev->pdev->dev, "handshake status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_NIC_CSQ_DEPTH_REG));
+	dev_info(&hdev->pdev->dev, "function reset status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_FUN_RST_ING));
 }
 
 static void hclge_dbg_get_m7_stats_info(struct hclge_dev *hdev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index bc5bad3..fd7f943 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3547,12 +3547,12 @@ static bool hclge_reset_err_handle(struct hclge_dev *hdev)
 			 "reset failed because new reset interrupt\n");
 		hclge_clear_reset_cause(hdev);
 		return false;
-	} else if (hdev->reset_fail_cnt < MAX_RESET_FAIL_CNT) {
-		hdev->reset_fail_cnt++;
+	} else if (hdev->rst_stats.reset_fail_cnt < MAX_RESET_FAIL_CNT) {
+		hdev->rst_stats.reset_fail_cnt++;
 		set_bit(hdev->reset_type, &hdev->reset_pending);
 		dev_info(&hdev->pdev->dev,
 			 "re-schedule reset task(%d)\n",
-			 hdev->reset_fail_cnt);
+			 hdev->rst_stats.reset_fail_cnt);
 		return true;
 	}
 
@@ -3679,7 +3679,8 @@ static void hclge_reset(struct hclge_dev *hdev)
 	/* ignore RoCE notify error if it fails HCLGE_RESET_MAX_FAIL_CNT - 1
 	 * times
 	 */
-	if (ret && hdev->reset_fail_cnt < HCLGE_RESET_MAX_FAIL_CNT - 1)
+	if (ret &&
+	    hdev->rst_stats.reset_fail_cnt < HCLGE_RESET_MAX_FAIL_CNT - 1)
 		goto err_reset;
 
 	rtnl_lock();
@@ -3695,7 +3696,7 @@ static void hclge_reset(struct hclge_dev *hdev)
 		goto err_reset;
 
 	hdev->last_reset_time = jiffies;
-	hdev->reset_fail_cnt = 0;
+	hdev->rst_stats.reset_fail_cnt = 0;
 	hdev->rst_stats.reset_done_cnt++;
 	ae_dev->reset_type = HNAE3_NONE_RESET;
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 870550f..3e9574a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -659,6 +659,7 @@ struct hclge_rst_stats {
 	u32 global_rst_cnt;	/* the number of GLOBAL */
 	u32 imp_rst_cnt;	/* the number of IMP reset */
 	u32 reset_cnt;		/* the number of reset */
+	u32 reset_fail_cnt;	/* the number of reset fail */
 };
 
 /* time and register status when mac tunnel interruption occur */
@@ -725,7 +726,6 @@ struct hclge_dev {
 	unsigned long reset_request;	/* reset has been requested */
 	unsigned long reset_pending;	/* client rst is pending to be served */
 	struct hclge_rst_stats rst_stats;
-	u32 reset_fail_cnt;
 	u32 fw_version;
 	u16 num_vmdq_vport;		/* Num vmdq vport this PF has set up */
 	u16 num_tqps;			/* Num task queue pairs of this PF */
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 6/7] net: hns3: check NULL pointer before use
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski
In-Reply-To: <1568169639-43658-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin2@huawei.com>

This patch checks ops->set_default_reset_request whether is NULL
before using it in function hns3_slot_reset.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 8dbaf36..616cad0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2006,7 +2006,8 @@ static pci_ers_result_t hns3_slot_reset(struct pci_dev *pdev)
 
 	ops = ae_dev->ops;
 	/* request the reset */
-	if (ops->reset_event && ops->get_reset_level) {
+	if (ops->reset_event && ops->get_reset_level &&
+	    ops->set_default_reset_request) {
 		if (ae_dev->hw_err_reset_req) {
 			reset_type = ops->get_reset_level(ae_dev,
 						&ae_dev->hw_err_reset_req);
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 5/7] net: hns3: modify some logs format
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski
In-Reply-To: <1568169639-43658-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin2@huawei.com>

The pfc_en and pfc_map need to be displayed in hexadecimal notation,
printing dma address should use %pad, and the end of printed string
needs to be add "\n".

This patch modifies them.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c      | 7 +++++--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c  | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
index 5cf4c1e..28961a6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
@@ -166,6 +166,7 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
 	struct hns3_enet_ring *ring;
 	u32 tx_index, rx_index;
 	u32 q_num, value;
+	dma_addr_t addr;
 	int cnt;
 
 	cnt = sscanf(&cmd_buf[8], "%u %u", &q_num, &tx_index);
@@ -194,8 +195,9 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
 	}
 
 	tx_desc = &ring->desc[tx_index];
+	addr = le64_to_cpu(tx_desc->addr);
 	dev_info(dev, "TX Queue Num: %u, BD Index: %u\n", q_num, tx_index);
-	dev_info(dev, "(TX)addr: 0x%llx\n", tx_desc->addr);
+	dev_info(dev, "(TX)addr: %pad\n", &addr);
 	dev_info(dev, "(TX)vlan_tag: %u\n", tx_desc->tx.vlan_tag);
 	dev_info(dev, "(TX)send_size: %u\n", tx_desc->tx.send_size);
 	dev_info(dev, "(TX)vlan_tso: %u\n", tx_desc->tx.type_cs_vlan_tso);
@@ -217,8 +219,9 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
 	rx_index = (cnt == 1) ? value : tx_index;
 	rx_desc	 = &ring->desc[rx_index];
 
+	addr = le64_to_cpu(rx_desc->addr);
 	dev_info(dev, "RX Queue Num: %u, BD Index: %u\n", q_num, rx_index);
-	dev_info(dev, "(RX)addr: 0x%llx\n", rx_desc->addr);
+	dev_info(dev, "(RX)addr: %pad\n", &addr);
 	dev_info(dev, "(RX)l234_info: %u\n", rx_desc->rx.l234_info);
 	dev_info(dev, "(RX)pkt_len: %u\n", rx_desc->rx.pkt_len);
 	dev_info(dev, "(RX)size: %u\n", rx_desc->rx.size);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
index 816f920..c063301 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
@@ -342,7 +342,7 @@ static int hclge_ieee_setpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
 	hdev->tm_info.pfc_en = pfc->pfc_en;
 
 	netif_dbg(h, drv, netdev,
-		  "set pfc: pfc_en=%u, pfc_map=%u, num_tc=%u\n",
+		  "set pfc: pfc_en=%x, pfc_map=%x, num_tc=%u\n",
 		  pfc->pfc_en, pfc_map, hdev->tm_info.num_tc);
 
 	hclge_tm_pfc_info_update(hdev);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 8d4dc1b..bc5bad3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3751,7 +3751,7 @@ static void hclge_reset_event(struct pci_dev *pdev, struct hnae3_handle *handle)
 	else if (time_after(jiffies, (hdev->last_reset_time + 4 * 5 * HZ)))
 		hdev->reset_level = HNAE3_FUNC_RESET;
 
-	dev_info(&hdev->pdev->dev, "received reset event , reset type is %d",
+	dev_info(&hdev->pdev->dev, "received reset event, reset type is %d\n",
 		 hdev->reset_level);
 
 	/* request reset & schedule reset task */
-- 
2.7.4


^ permalink raw reply related

* Re: [RFC PATCH 3/4] virtio: introudce a mdev based transport
From: Jason Wang @ 2019-09-11  2:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, kwankhede,
	alex.williamson, cohuck, tiwei.bie, maxime.coquelin,
	cunming.liang, zhihong.wang, rob.miller, idos, xiao.w.wang,
	haotian.wang
In-Reply-To: <20190910094807-mutt-send-email-mst@kernel.org>


On 2019/9/10 下午9:52, Michael S. Tsirkin wrote:
> On Tue, Sep 10, 2019 at 09:13:02PM +0800, Jason Wang wrote:
>> On 2019/9/10 下午6:01, Michael S. Tsirkin wrote:
>>>> +#ifndef _LINUX_VIRTIO_MDEV_H
>>>> +#define _LINUX_VIRTIO_MDEV_H
>>>> +
>>>> +#include <linux/interrupt.h>
>>>> +#include <linux/vringh.h>
>>>> +#include <uapi/linux/virtio_net.h>
>>>> +
>>>> +/*
>>>> + * Ioctls
>>>> + */
>>> Pls add a bit more content here. It's redundant to state these
>>> are ioctls. Much better to document what does each one do.
>>
>> Ok.
>>
>>
>>>> +
>>>> +struct virtio_mdev_callback {
>>>> +	irqreturn_t (*callback)(void *);
>>>> +	void *private;
>>>> +};
>>>> +
>>>> +#define VIRTIO_MDEV 0xAF
>>>> +#define VIRTIO_MDEV_SET_VQ_CALLBACK _IOW(VIRTIO_MDEV, 0x00, \
>>>> +					 struct virtio_mdev_callback)
>>>> +#define VIRTIO_MDEV_SET_CONFIG_CALLBACK _IOW(VIRTIO_MDEV, 0x01, \
>>>> +					struct virtio_mdev_callback)
>>> Function pointer in an ioctl parameter? How does this ever make sense?
>>
>> I admit this is hacky (casting).
>>
>>
>>> And can't we use a couple of registers for this, and avoid ioctls?
>>
>> Yes, how about something like interrupt numbers for each virtqueue and
>> config?
> Should we just reuse VIRTIO_PCI_COMMON_Q_XXX then?


You mean something like VIRTIO_PCI_COMMON_Q_MSIX? Then it becomes a PCI 
transport in fact. And using either MSIX or irq number is actually 
another layer of indirection. So I think we can just write callback 
function and parameter through registers.


>
>
>>>> +
>>>> +#define VIRTIO_MDEV_DEVICE_API_STRING		"virtio-mdev"
>>>> +
>>>> +/*
>>>> + * Control registers
>>>> + */
>>>> +
>>>> +/* Magic value ("virt" string) - Read Only */
>>>> +#define VIRTIO_MDEV_MAGIC_VALUE		0x000
>>>> +
>>>> +/* Virtio device version - Read Only */
>>>> +#define VIRTIO_MDEV_VERSION		0x004
>>>> +
>>>> +/* Virtio device ID - Read Only */
>>>> +#define VIRTIO_MDEV_DEVICE_ID		0x008
>>>> +
>>>> +/* Virtio vendor ID - Read Only */
>>>> +#define VIRTIO_MDEV_VENDOR_ID		0x00c
>>>> +
>>>> +/* Bitmask of the features supported by the device (host)
>>>> + * (32 bits per set) - Read Only */
>>>> +#define VIRTIO_MDEV_DEVICE_FEATURES	0x010
>>>> +
>>>> +/* Device (host) features set selector - Write Only */
>>>> +#define VIRTIO_MDEV_DEVICE_FEATURES_SEL	0x014
>>>> +
>>>> +/* Bitmask of features activated by the driver (guest)
>>>> + * (32 bits per set) - Write Only */
>>>> +#define VIRTIO_MDEV_DRIVER_FEATURES	0x020
>>>> +
>>>> +/* Activated features set selector - Write Only */
>>>> +#define VIRTIO_MDEV_DRIVER_FEATURES_SEL	0x024
>>>> +
>>>> +/* Queue selector - Write Only */
>>>> +#define VIRTIO_MDEV_QUEUE_SEL		0x030
>>>> +
>>>> +/* Maximum size of the currently selected queue - Read Only */
>>>> +#define VIRTIO_MDEV_QUEUE_NUM_MAX	0x034
>>>> +
>>>> +/* Queue size for the currently selected queue - Write Only */
>>>> +#define VIRTIO_MDEV_QUEUE_NUM		0x038
>>>> +
>>>> +/* Ready bit for the currently selected queue - Read Write */
>>>> +#define VIRTIO_MDEV_QUEUE_READY		0x044
>>> Is this same as started?
>>
>> Do you mean "status"?
> I really meant "enabled", didn't remember the correct name.
> As in:  VIRTIO_PCI_COMMON_Q_ENABLE


Yes, it's the same.

Thanks


>
>>>> +
>>>> +/* Alignment of virtqueue - Read Only */
>>>> +#define VIRTIO_MDEV_QUEUE_ALIGN		0x048
>>>> +
>>>> +/* Queue notifier - Write Only */
>>>> +#define VIRTIO_MDEV_QUEUE_NOTIFY	0x050
>>>> +
>>>> +/* Device status register - Read Write */
>>>> +#define VIRTIO_MDEV_STATUS		0x060
>>>> +
>>>> +/* Selected queue's Descriptor Table address, 64 bits in two halves */
>>>> +#define VIRTIO_MDEV_QUEUE_DESC_LOW	0x080
>>>> +#define VIRTIO_MDEV_QUEUE_DESC_HIGH	0x084
>>>> +
>>>> +/* Selected queue's Available Ring address, 64 bits in two halves */
>>>> +#define VIRTIO_MDEV_QUEUE_AVAIL_LOW	0x090
>>>> +#define VIRTIO_MDEV_QUEUE_AVAIL_HIGH	0x094
>>>> +
>>>> +/* Selected queue's Used Ring address, 64 bits in two halves */
>>>> +#define VIRTIO_MDEV_QUEUE_USED_LOW	0x0a0
>>>> +#define VIRTIO_MDEV_QUEUE_USED_HIGH	0x0a4
>>>> +
>>>> +/* Configuration atomicity value */
>>>> +#define VIRTIO_MDEV_CONFIG_GENERATION	0x0fc
>>>> +
>>>> +/* The config space is defined by each driver as
>>>> + * the per-driver configuration space - Read Write */
>>>> +#define VIRTIO_MDEV_CONFIG		0x100
>>> Mixing device and generic config space is what virtio pci did,
>>> caused lots of problems with extensions.
>>> It would be better to reserve much more space.
>>
>> I see, will do this.
>>
>> Thanks
>>
>>
>>>
>>>> +
>>>> +#endif
>>>> +
>>>> +
>>>> +/* Ready bit for the currently selected queue - Read Write */
>>>> -- 
>>>> 2.19.1

^ permalink raw reply

* Re: [PATCH net-next 1/7] net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver
From: tanhuazhong @ 2019-09-11  2:01 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, huangguangbin2
In-Reply-To: <20190910.192516.1686418457520996592.davem@davemloft.net>



On 2019/9/11 1:25, David Miller wrote:
> From: Huazhong Tan <tanhuazhong@huawei.com>
> Date: Tue, 10 Sep 2019 16:58:22 +0800
> 
>> +	/* Set to user value, no larger than max_rss_size. */
>> +	if (kinfo->req_rss_size != kinfo->rss_size && kinfo->req_rss_size &&
>> +	    kinfo->req_rss_size <= max_rss_size) {
>> +		dev_info(&hdev->pdev->dev, "rss changes from %u to %u\n",
>> +			 kinfo->rss_size, kinfo->req_rss_size);
>> +		kinfo->rss_size = kinfo->req_rss_size;
> 
> Please do not emit kernel log messages for normal operations.
> 

Will remove this log in V2.
Thanks.

> .
> 


^ permalink raw reply

* Re: [RFC PATCH 3/4] virtio: introudce a mdev based transport
From: Tiwei Bie @ 2019-09-11  1:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, kvm, virtualization, netdev, linux-kernel, kwankhede,
	alex.williamson, cohuck, maxime.coquelin, cunming.liang,
	zhihong.wang, rob.miller, idos, xiao.w.wang, haotian.wang
In-Reply-To: <20190910081935.30516-4-jasowang@redhat.com>

On Tue, Sep 10, 2019 at 04:19:34PM +0800, Jason Wang wrote:
> This path introduces a new mdev transport for virtio. This is used to
> use kernel virtio driver to drive the mediated device that is capable
> of populating virtqueue directly.
> 
> A new virtio-mdev driver will be registered to the mdev bus, when a
> new virtio-mdev device is probed, it will register the device with
> mdev based config ops. This means, unlike the exist hardware
> transport, this is a software transport between mdev driver and mdev
> device. The transport was implemented through:
> 
> - configuration access was implemented through parent_ops->read()/write()
> - vq/config callback was implemented through parent_ops->ioctl()
> 
> This transport is derived from virtio MMIO protocol and was wrote for
> kernel driver. But for the transport itself, but the design goal is to
> be generic enough to support userspace driver (this part will be added
> in the future).
> 
> Note:
> - current mdev assume all the parameter of parent_ops was from
>   userspace. This prevents us from implementing the kernel mdev
>   driver. For a quick POC, this patch just abuse those parameter and
>   assume the mdev device implementation will treat them as kernel
>   pointer. This should be addressed in the formal series by extending
>   mdev_parent_ops.
> - for a quick POC, I just drive the transport from MMIO, I'm pretty
>   there's lot of optimization space for this.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vfio/mdev/Kconfig        |   7 +
>  drivers/vfio/mdev/Makefile       |   1 +
>  drivers/vfio/mdev/virtio_mdev.c  | 500 +++++++++++++++++++++++++++++++
>  include/uapi/linux/virtio_mdev.h | 131 ++++++++
>  4 files changed, 639 insertions(+)
>  create mode 100644 drivers/vfio/mdev/virtio_mdev.c
>  create mode 100644 include/uapi/linux/virtio_mdev.h
> 
[...]
> diff --git a/include/uapi/linux/virtio_mdev.h b/include/uapi/linux/virtio_mdev.h
> new file mode 100644
> index 000000000000..8040de6b960a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_mdev.h
> @@ -0,0 +1,131 @@
> +/*
> + * Virtio mediated device driver
> + *
> + * Copyright 2019, Red Hat Corp.
> + *
> + * Based on Virtio MMIO driver by ARM Ltd, copyright ARM Ltd. 2011
> + *
> + * This header is BSD licensed so anyone can use the definitions to implement
> + * compatible drivers/servers.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + *    may be used to endorse or promote products derived from this software
> + *    without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#ifndef _LINUX_VIRTIO_MDEV_H
> +#define _LINUX_VIRTIO_MDEV_H
> +
> +#include <linux/interrupt.h>
> +#include <linux/vringh.h>
> +#include <uapi/linux/virtio_net.h>
> +
> +/*
> + * Ioctls
> + */
> +
> +struct virtio_mdev_callback {
> +	irqreturn_t (*callback)(void *);
> +	void *private;
> +};
> +
> +#define VIRTIO_MDEV 0xAF
> +#define VIRTIO_MDEV_SET_VQ_CALLBACK _IOW(VIRTIO_MDEV, 0x00, \
> +					 struct virtio_mdev_callback)
> +#define VIRTIO_MDEV_SET_CONFIG_CALLBACK _IOW(VIRTIO_MDEV, 0x01, \
> +					struct virtio_mdev_callback)
> +
> +#define VIRTIO_MDEV_DEVICE_API_STRING		"virtio-mdev"
> +
> +/*
> + * Control registers
> + */
> +
> +/* Magic value ("virt" string) - Read Only */
> +#define VIRTIO_MDEV_MAGIC_VALUE		0x000
> +
> +/* Virtio device version - Read Only */
> +#define VIRTIO_MDEV_VERSION		0x004
> +
> +/* Virtio device ID - Read Only */
> +#define VIRTIO_MDEV_DEVICE_ID		0x008
> +
> +/* Virtio vendor ID - Read Only */
> +#define VIRTIO_MDEV_VENDOR_ID		0x00c
> +
> +/* Bitmask of the features supported by the device (host)
> + * (32 bits per set) - Read Only */
> +#define VIRTIO_MDEV_DEVICE_FEATURES	0x010
> +
> +/* Device (host) features set selector - Write Only */
> +#define VIRTIO_MDEV_DEVICE_FEATURES_SEL	0x014
> +
> +/* Bitmask of features activated by the driver (guest)
> + * (32 bits per set) - Write Only */
> +#define VIRTIO_MDEV_DRIVER_FEATURES	0x020
> +
> +/* Activated features set selector - Write Only */
> +#define VIRTIO_MDEV_DRIVER_FEATURES_SEL	0x024
> +
> +/* Queue selector - Write Only */
> +#define VIRTIO_MDEV_QUEUE_SEL		0x030
> +
> +/* Maximum size of the currently selected queue - Read Only */
> +#define VIRTIO_MDEV_QUEUE_NUM_MAX	0x034
> +
> +/* Queue size for the currently selected queue - Write Only */
> +#define VIRTIO_MDEV_QUEUE_NUM		0x038
> +
> +/* Ready bit for the currently selected queue - Read Write */
> +#define VIRTIO_MDEV_QUEUE_READY		0x044
> +
> +/* Alignment of virtqueue - Read Only */
> +#define VIRTIO_MDEV_QUEUE_ALIGN		0x048
> +
> +/* Queue notifier - Write Only */
> +#define VIRTIO_MDEV_QUEUE_NOTIFY	0x050
> +
> +/* Device status register - Read Write */
> +#define VIRTIO_MDEV_STATUS		0x060
> +
> +/* Selected queue's Descriptor Table address, 64 bits in two halves */
> +#define VIRTIO_MDEV_QUEUE_DESC_LOW	0x080
> +#define VIRTIO_MDEV_QUEUE_DESC_HIGH	0x084
> +
> +/* Selected queue's Available Ring address, 64 bits in two halves */
> +#define VIRTIO_MDEV_QUEUE_AVAIL_LOW	0x090
> +#define VIRTIO_MDEV_QUEUE_AVAIL_HIGH	0x094
> +
> +/* Selected queue's Used Ring address, 64 bits in two halves */
> +#define VIRTIO_MDEV_QUEUE_USED_LOW	0x0a0
> +#define VIRTIO_MDEV_QUEUE_USED_HIGH	0x0a4
> +
> +/* Configuration atomicity value */
> +#define VIRTIO_MDEV_CONFIG_GENERATION	0x0fc
> +
> +/* The config space is defined by each driver as
> + * the per-driver configuration space - Read Write */
> +#define VIRTIO_MDEV_CONFIG		0x100

IIUC, we can use above registers with virtio-mdev parent's
read()/write() to access the mdev device from kernel driver.
As you suggested, it's a choice to build vhost-mdev on top
of this abstraction as well. But virtio is the frontend
device which lacks some vhost backend features, e.g. get
vring base, set vring base, negotiate vhost features, etc.
So I'm wondering, does it make sense to reserve some space
for vhost-mdev in kernel to do vhost backend specific setups?
Or do you have any other thoughts?

Besides, I'm also wondering, what's the purpose of making
above registers part of UAPI? And if we make them part
of UAPI, do we also need to make them part of virtio spec?

Thanks!
Tiwei

> +
> +#endif
> +
> +
> +/* Ready bit for the currently selected queue - Read Write */
> -- 
> 2.19.1
> 

^ permalink raw reply

* Re: [PATCH 0/3] rtlwifi: use generic rtl_evm_db_to_percentage
From: Pkshih @ 2019-09-11  1:31 UTC (permalink / raw)
  To: kvalo@codeaurora.org, straube.linux@gmail.com
  Cc: linux-wireless@vger.kernel.org, davem@davemloft.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190910190422.63378-1-straube.linux@gmail.com>

On Tue, 2019-09-10 at 21:04 +0200, Michael Straube wrote:
> Functions _rtl92{c,d}_evm_db_to_percentage are functionally identical
> to the generic version rtl_evm_db_to percentage. This series converts
> rtl8192ce, rtl8192cu and rtl8192de to use the generic version.
> 
> Michael Straube (3):
>   rtlwifi: rtl8192ce: replace _rtl92c_evm_db_to_percentage with generic
>     version
>   rtlwifi: rtl8192cu: replace _rtl92c_evm_db_to_percentage with generic
>     version
>   rtlwifi: rtl8192de: replace _rtl92d_evm_db_to_percentage with generic
>     version
> 
>  .../wireless/realtek/rtlwifi/rtl8192ce/trx.c  | 23 +------------------
>  .../wireless/realtek/rtlwifi/rtl8192cu/mac.c  | 18 +--------------
>  .../wireless/realtek/rtlwifi/rtl8192de/trx.c  | 18 ++-------------
>  3 files changed, 4 insertions(+), 55 deletions(-)
> 

I checked the generic version and removed functions, and they are indeed
identical. Thanks for your patches.

Acked-by: Ping-Ke Shih <pkshih@realtek.com>



^ permalink raw reply

* Re: [PATCH net 1/2] sctp: remove redundant assignment when call sctp_get_port_local
From: maowenan @ 2019-09-11  1:30 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: vyasevich, nhorman, marcelo.leitner, davem, linux-sctp, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <20190910192207.GE20699@kadam>



On 2019/9/11 3:22, Dan Carpenter wrote:
> On Tue, Sep 10, 2019 at 09:57:10PM +0300, Dan Carpenter wrote:
>> On Tue, Sep 10, 2019 at 03:13:42PM +0800, Mao Wenan wrote:
>>> There are more parentheses in if clause when call sctp_get_port_local
>>> in sctp_do_bind, and redundant assignment to 'ret'. This patch is to
>>> do cleanup.
>>>
>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>> ---
>>>  net/sctp/socket.c | 3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>> index 9d1f83b10c0a..766b68b55ebe 100644
>>> --- a/net/sctp/socket.c
>>> +++ b/net/sctp/socket.c
>>> @@ -399,9 +399,8 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
>>>  	 * detection.
>>>  	 */
>>>  	addr->v4.sin_port = htons(snum);
>>> -	if ((ret = sctp_get_port_local(sk, addr))) {
>>> +	if (sctp_get_port_local(sk, addr))
>>>  		return -EADDRINUSE;
>>
>> sctp_get_port_local() returns a long which is either 0,1 or a pointer
>> casted to long.  It's not documented what it means and neither of the
>> callers use the return since commit 62208f12451f ("net: sctp: simplify
>> sctp_get_port").
> 
> Actually it was commit 4e54064e0a13 ("sctp: Allow only 1 listening
> socket with SO_REUSEADDR") from 11 years ago.  That patch fixed a bug,
> because before the code assumed that a pointer casted to an int was the
> same as a pointer casted to a long.

commit 4e54064e0a13 treated non-zero return value as unexpected, so the current
cleanup is ok?

> 
> regards,
> dan carpenter
> 
> 
> .
> 


^ permalink raw reply

* [PATCH v2 net] net: sonic: replace dev_kfree_skb in sonic_send_packet
From: Mao Wenan @ 2019-09-11  1:36 UTC (permalink / raw)
  To: tsbogend, davem; +Cc: netdev, linux-kernel, kernel-janitors, Mao Wenan
In-Reply-To: <a48a6690-eeb4-91d2-bed8-439d14b63e2f@cogentembedded.com>

sonic_send_packet will be processed in irq or non-irq 
context, so it would better use dev_kfree_skb_any
instead of dev_kfree_skb.

Fixes: d9fb9f384292 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
 v2: change 'none irq' to 'non-irq'.
 drivers/net/ethernet/natsemi/sonic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/natsemi/sonic.c b/drivers/net/ethernet/natsemi/sonic.c
index 18fd62fbfb64..b339125b2f09 100644
--- a/drivers/net/ethernet/natsemi/sonic.c
+++ b/drivers/net/ethernet/natsemi/sonic.c
@@ -233,7 +233,7 @@ static int sonic_send_packet(struct sk_buff *skb, struct net_device *dev)
 	laddr = dma_map_single(lp->device, skb->data, length, DMA_TO_DEVICE);
 	if (!laddr) {
 		pr_err_ratelimited("%s: failed to map tx DMA buffer.\n", dev->name);
-		dev_kfree_skb(skb);
+		dev_kfree_skb_any(skb);
 		return NETDEV_TX_OK;
 	}
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next 11/11] net: aquantia: add support for PIN funcs
From: kbuild test robot @ 2019-09-10  2:45 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: kbuild-all, netdev@vger.kernel.org, richardcochran@gmail.com,
	davem@davemloft.net, Egor Pomozov, Sergey Samoilenko,
	Dmitry Bezrukov, Igor Russkikh, Nikita Danilov
In-Reply-To: <b3eeb5dd7d38dab79ded5f4b7cca2a84505c8fac.1568034880.git.igor.russkikh@aquantia.com>

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Igor-Russkikh/net-aquantia-PTP-support-for-AQC-devices/20190909-233041
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-11) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: drivers/net/ethernet/aquantia/atlantic/aq_ptp.o: in function `aq_ptp_gpio_feature_enable':
>> aq_ptp.c:(.text+0x7fe): undefined reference to `__umoddi3'
   ld: drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.o: in function `hw_atl_b0_adj_clock_freq':
   hw_atl_b0.c:(.text+0xdd): undefined reference to `__udivdi3'
   ld: hw_atl_b0.c:(.text+0xf2): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x14b): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x15b): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x195): undefined reference to `__udivdi3'
   ld: hw_atl_b0.c:(.text+0x1a7): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x1fa): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x20a): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x277): undefined reference to `__divdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 68904 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 03/11] net: aquantia: add basic ptp_clock callbacks
From: kbuild test robot @ 2019-09-10  2:45 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: kbuild-all, netdev@vger.kernel.org, richardcochran@gmail.com,
	davem@davemloft.net, Egor Pomozov, Sergey Samoilenko,
	Dmitry Bezrukov, Igor Russkikh
In-Reply-To: <8868449ec5508f498131ee141399149bf801ea94.1568034880.git.igor.russkikh@aquantia.com>

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Igor-Russkikh/net-aquantia-PTP-support-for-AQC-devices/20190909-233041
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-11) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ERROR: "aq_ptp_clock_init" [drivers/net/ethernet/aquantia/atlantic/atlantic.ko] undefined!
>> ERROR: "__udivdi3" [drivers/net/ethernet/aquantia/atlantic/atlantic.ko] undefined!
>> ERROR: "__divdi3" [drivers/net/ethernet/aquantia/atlantic/atlantic.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69589 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 03/11] net: aquantia: add basic ptp_clock callbacks
From: kbuild test robot @ 2019-09-09 23:27 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: kbuild-all, netdev@vger.kernel.org, richardcochran@gmail.com,
	davem@davemloft.net, Egor Pomozov, Sergey Samoilenko,
	Dmitry Bezrukov, Igor Russkikh
In-Reply-To: <8868449ec5508f498131ee141399149bf801ea94.1568034880.git.igor.russkikh@aquantia.com>

[-- Attachment #1: Type: text/plain, Size: 1734 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Igor-Russkikh/net-aquantia-PTP-support-for-AQC-devices/20190909-233041
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-11) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: drivers/net/ethernet/aquantia/atlantic/aq_nic.o: in function `aq_nic_update_link_status':
   aq_nic.c:(.text+0xe1): undefined reference to `aq_ptp_clock_init'
   ld: drivers/net/ethernet/aquantia/atlantic/aq_ptp.o: in function `aq_ptp_init':
   aq_ptp.c:(.text+0x367): undefined reference to `aq_ptp_clock_init'
   ld: drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.o: in function `hw_atl_b0_adj_clock_freq':
>> hw_atl_b0.c:(.text+0xad): undefined reference to `__udivdi3'
>> ld: hw_atl_b0.c:(.text+0xc2): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x11b): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x12b): undefined reference to `__divdi3'
>> ld: hw_atl_b0.c:(.text+0x165): undefined reference to `__udivdi3'
   ld: hw_atl_b0.c:(.text+0x177): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x1ca): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x1da): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x247): undefined reference to `__divdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 68904 bytes --]

^ permalink raw reply

* Re: Strange routing with VRF and 5.2.7+
From: Ben Greear @ 2019-09-11  1:08 UTC (permalink / raw)
  To: netdev
In-Reply-To: <91749b17-7800-44c0-d137-5242b8ceb819@candelatech.com>

On 9/10/19 3:17 PM, Ben Greear wrote:
> Today we were testing creating 200 virtual station vdevs on ath9k, and using
> VRF for the routing.

Looks like the same issue happens w/out VRF, but there I have oodles of routing
rules, so it is an area ripe for failure.

Will upgrade to 5.2.14+ and retest, and try 4.20 as well....

Thanks,
Ben

> 
> This really slows down the machine in question.
> 
> During the minutes that it takes to bring these up and configure them,
> we loose network connectivity on the management port.
> 
> If I do 'ip route show', it just shows the default route out of eth0, and
> the subnet route.  But, if I try to ping the gateway, I get an ICMP error
> coming back from the gateway of one of the virtual stations (which should be
> safely using VRFs and so not in use when I do a plain 'ping' from the shell).
> 
> I tried running tshark on eth0 in the background and running ping, and it captures
> no packets leaving eth0.
> 
> After some time (and during this time, my various scripts will be (re)configuring
> vrfs and stations and related vrf routing tables and such,
> but should *not* be messing with the main routing table, then suddenly
> things start working again.
> 
> I am curious if anyone has seen anything similar or has suggestions for more
> ways to debug this.  It seems reproducible, but it is a pain to
> debug.
> 
> Thanks,
> Ben
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [PATCH net-next] tcp: force a PSH flag on TSO packets
From: Neal Cardwell @ 2019-09-11  0:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eric Dumazet, Soheil Hassas Yeganeh,
	Yuchung Cheng, Daniel Borkmann, Tariq Toukan
In-Reply-To: <20190910214928.220727-1-edumazet@google.com>

On Tue, Sep 10, 2019 at 5:49 PM Eric Dumazet <edumazet@google.com> wrote:
>
> When tcp sends a TSO packet, adding a PSH flag on it
> reduces the sojourn time of GRO packet in GRO receivers.
>
> This is particularly the case under pressure, since RX queues
> receive packets for many concurrent flows.
>
> A sender can give a hint to GRO engines when it is
> appropriate to flush a super-packet, especially when pacing
> is in the picture, since next packet is probably delayed by
> one ms.
>
> Having less packets in GRO engine reduces chance
> of LRU eviction or inflated RTT, and reduces GRO cost.
>
> We found recently that we must not set the PSH flag on
> individual full-size MSS segments [1] :
>
>  Under pressure (CWR state), we better let the packet sit
>  for a small delay (depending on NAPI logic) so that the
>  ACK packet is delayed, and thus next packet we send is
>  also delayed a bit. Eventually the bottleneck queue can
>  be drained. DCTCP flows with CWND=1 have demonstrated
>  the issue.
>
> This patch allows to slowdown the aggregate traffic without
> involving high resolution timers on senders and/or
> receivers.
>
> It has been used at Google for about four years,
> and has been discussed at various networking conferences.
>
> [1] segments smaller than MSS already have PSH flag set
>     by tcp_sendmsg() / tcp_mark_push(), unless MSG_MORE
>     has been requested by the user.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> ---
>  net/ipv4/tcp_output.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 42abc9bd687a5fea627cbc7cfa750d022f393d84..fec6d67bfd146dc78f0f25173fd71b8b8cc752fe 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1050,11 +1050,22 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
>         tcb = TCP_SKB_CB(skb);
>         memset(&opts, 0, sizeof(opts));
>
> -       if (unlikely(tcb->tcp_flags & TCPHDR_SYN))
> +       if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) {
>                 tcp_options_size = tcp_syn_options(sk, skb, &opts, &md5);
> -       else
> +       } else {
>                 tcp_options_size = tcp_established_options(sk, skb, &opts,
>                                                            &md5);
> +               /* Force a PSH flag on all (GSO) packets to expedite GRO flush
> +                * at receiver : This slightly improve GRO performance.
> +                * Note that we do not force the PSH flag for non GSO packets,
> +                * because they might be sent under high congestion events,
> +                * and in this case it is better to delay the delivery of 1-MSS
> +                * packets and thus the corresponding ACK packet that would
> +                * release the following packet.
> +                */
> +               if (tcp_skb_pcount(skb) > 1)
> +                       tcb->tcp_flags |= TCPHDR_PSH;
> +       }
>         tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
>
>         /* if no packet is in qdisc/device queue, then allow XPS to select

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!

neal

^ permalink raw reply

* RE: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
From: Gomes, Vinicius @ 2019-09-11  0:45 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: David Miller, f.fainelli@gmail.com, vivien.didelot@gmail.com,
	andrew@lunn.ch, Patel, Vedang, richardcochran@gmail.com,
	Voon, Weifeng, jiri@mellanox.com, m-karicheri2@ti.com,
	Jose.Abreu@synopsys.com, ilias.apalodimas@linaro.org,
	jhs@mojatatu.com, xiyou.wangcong@gmail.com,
	kurt.kanzenbach@linutronix.de, netdev@vger.kernel.org
In-Reply-To: <CA+h21hoQ-DaFGzALVmGo2mDJancUp5Fndc=o0f4LfD_9yaNi0g@mail.gmail.com>

Hi Vladimir,

[...]

> 
> I'll make sure this subtlety is more clearly formulated in the next version of the
> patch.
> 

Ack.

> Actually let me ask you a few questions as well:
> 
> - I'm trying to understand what is the correct use of the tc-mqprio "queues"
> argument. I've only tested it with "1@0 1@1 1@2 1@3 1@4 1@5
> 1@6 1@7", which I believe is equivalent to not specifying it at all? I believe it
> should be interpreted as: "allocate this many netdev queues for each traffic
> class", where "traffic class" means a group of queues having the same priority
> (equal to the traffic class's number), but engaged in a strict priority scheme with
> other groups of queues (traffic classes). Right?

Specifying the "queues" is mandatory, IIRC. Yeah, your reading of those arguments
for you example matches mine.

So you mean, that you only tested situations when only one queue is "open" at a time?
I think this is another good thing to test.

> 
> - DSA can only formally support multi-queue, because its connection to the Linux
> host is through an Ethernet MAC (FIFO). Even if the DSA master netdevice may
> be multi-queue, allocating and separating those queues for each front-panel
> switch port is a task best left to the user/administrator. This means that DSA
> should reject all other "queues" mappings except the trivial one I pointed to
> above?
> 
> - I'm looking at the "tc_mask_to_queue_mask" function that I'm carrying along
> from your initial offload RFC. Are you sure this is the right approach? I don't feel
> a need to translate from traffic class to netdev queues, considering that in the
> general case, a traffic class is a group of queues, and 802.1Qbv doesn't really
> specify that you can gate individual queues from a traffic class. In the software
> implementation you are only looking at netdev_get_prio_tc_map, which is not
> equivalent as far as my understanding goes, but saner.
> Actually 802.1Q-2018 does not really clarify this either. It looks to me like they
> use the term "queue" and "traffic class" interchangeably.
> See two examples below (emphasis mine):

I spent quite a long time thinking about this, still not sure that I got it right. Let me begin
with the objective for that "translation". Scheduled traffic only makes sense when
the whole network shares the same schedule, so, I wanted a way so I minimize the
amount of information of each schedule that's controller dependent, Linux already 
does most of it with the separation of traffic classes and queues (you are right that 
802.1Q is confusing on this), the idea is that the only thing that needs to change from 
one node to another in the network is the "queues" parameter. Because each node might 
have different number of queues, or assign different priorities to different queues.  

So, that's the idea of doing that intermediate "transformation" step: taprio knows about
traffic classes and HW queues, but the driver only knows about HW queues. And unless I made
a mistake, tc_mask_to_queue_mask() should be equivalent to:  

netdev_get_prio_tc_map() + scanning the gatemask for BIT(tc).

(Thinking more about this, I am having a few ideas about ways to simplify software mode :-)

> 
> Q.2 Using gate operations to create protected windows The enhancements for
> scheduled traffic described in 8.6.8.4 allow transmission to be switched on and
> off on a timed basis for each _traffic class_ that is implemented on a port. This
> switching is achieved by means of individual on/off transmission gates
> associated with each _traffic class_ and a list of gate operations that control the
> gates; an individual SetGateStates operation has a time delay parameter that
> indicates the delay after the gate operation is executed until the next operation
> is to occur, and a GateState parameter that defines a vector of up to eight state
> values (open or
> closed) that is to be applied to each gate when the operation is executed. The
> gate operations allow any combination of open/closed states to be defined, and
> the mechanism makes no assumptions about which _traffic classes_ are being
> “protected” and which are “unprotected”; any such assumptions are left to the
> designer of the sequence of gate operations.
> 
> Table 8-7—Gate operations
> The GateState parameter indicates a value, open or closed, for each of the
> Port’s _queues_.
> 
> - What happens with the "clockid" argument now that hardware offload is
> possible? Do we allow "/dev/ptp0" to be specified as input?
> Actually this question is relevant to your txtime-assist mode as well:
> doesn't it assume that there is an implicit phc2sys instance running to keep the
> system time in sync with the PHC?

That's a very interesting question. I think, for now, allowing specifying /dev/ptp* clocks
won't work "always": if the driver or something needs to add a timer to be able to run 
the schedule, it won't be able to use /dev/ptp* clocks (hrtimers and ptp clocks don’t mix).
But for "full" offloads, it should work.

So, you are right, taprio and txtime-assisted (and ETF) require the system clock and phc 
clock to be synchronized, via something like phc2sys.

Hope I got all your questions.

Cheers,
--
Vinicius


^ permalink raw reply

* [PATCH] net: qrtr: fix memort leak in qrtr_tun_write_iter
From: Navid Emamdoost @ 2019-09-11  0:37 UTC (permalink / raw)
  Cc: emamd001, smccaman, kjlu, Navid Emamdoost, David S. Miller,
	netdev, linux-kernel

In qrtr_tun_write_iter the allocated kbuf should be release in case of
error happening.

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
---
 net/qrtr/tun.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/qrtr/tun.c b/net/qrtr/tun.c
index ccff1e544c21..1dba8b92560e 100644
--- a/net/qrtr/tun.c
+++ b/net/qrtr/tun.c
@@ -84,12 +84,18 @@ static ssize_t qrtr_tun_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (!kbuf)
 		return -ENOMEM;
 
-	if (!copy_from_iter_full(kbuf, len, from))
+	if (!copy_from_iter_full(kbuf, len, from)) {
+		kfree(kbuf);
 		return -EFAULT;
+	}
 
 	ret = qrtr_endpoint_post(&tun->ep, kbuf, len);
+	if (ret < 0) {
+		kfree(kbuf);
+		return ret;
+	}
 
-	return ret < 0 ? ret : len;
+	return len;
 }
 
 static __poll_t qrtr_tun_poll(struct file *filp, poll_table *wait)
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH] net: Remove the source address setting in connect() for UDP
From: Enke Chen (enkechen) @ 2019-09-10 23:55 UTC (permalink / raw)
  To: David Miller
  Cc: kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	xe-linux-external(mailer list), Enke Chen (enkechen)
In-Reply-To: <1DCD31CA-E94F-4127-876F-8DD355E6CF9A@cisco.com>

Hi, David:

Do you still have concerns about backward compatibility of the fix?

I really do not see how existing, working applications would be negatively impacted
by the fix.

Thanks.   -- Enke

-----Original Message-----
From: "Enke Chen (enkechen)" <enkechen@cisco.com>
Date: Friday, September 6, 2019 at 12:23 AM
To: David Miller <davem@davemloft.net>
Cc: "kuznet@ms2.inr.ac.ru" <kuznet@ms2.inr.ac.ru>, "yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "xe-linux-external(mailer list)" <xe-linux-external@cisco.com>, "Enke Chen (enkechen)" <enkechen@cisco.com>
Subject: Re: [PATCH] net: Remove the source address setting in connect() for UDP

Hi, David:

Yes, I understand the code has been there for a long time.  But the issues are real, and it's really nasty when
You run into them.  As I described in the patch log, there is no backward compatibility Issue for fixing it.

---
There is no backward compatibility issue here as the source address setting
in connect() is not needed anyway.

  - No impact on the source address selection when the source address
    is explicitly specified by "bind()", or by the "IP_PKTINFO" option.

  - In the case that the source address is not explicitly specified,
    the selection of the source address would be more accurate and
    reliable based on the up-to-date routing table.
---

Thanks.  -- Enke

-----Original Message-----
From: <linux-kernel-owner@vger.kernel.org> on behalf of David Miller <davem@davemloft.net>
Date: Friday, September 6, 2019 at 12:14 AM
To: "Enke Chen (enkechen)" <enkechen@cisco.com>
Cc: "kuznet@ms2.inr.ac.ru" <kuznet@ms2.inr.ac.ru>, "yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "xe-linux-external(mailer list)" <xe-linux-external@cisco.com>
Subject: Re: [PATCH] net: Remove the source address setting in connect() for UDP

From: Enke Chen <enkechen@cisco.com>
Date: Thu,  5 Sep 2019 19:54:37 -0700

> The connect() system call for a UDP socket is for setting the destination
> address and port. But the current code mistakenly sets the source address
> for the socket as well. Remove the source address setting in connect() for
> UDP in this patch.

Do you have any idea how many decades of precedence this behavior has and
therefore how much you potentially will break userspace?

This boat has sailed a long time ago I'm afraid.



^ permalink raw reply

* Re: KASAN: use-after-free Read in rxrpc_send_keepalive
From: syzbot @ 2019-09-10 23:35 UTC (permalink / raw)
  To: MAILER_DAEMON, davem, dhowells, linux-afs, linux-kernel, netdev,
	syzkaller-bugs
In-Reply-To: <000000000000e695c1058fb26925@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    3120b9a6 Merge tag 'ipc-fixes' of git://git.kernel.org/pub..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=107d1ca5600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=ed2b148cd67382ec
dashboard link: https://syzkaller.appspot.com/bug?extid=d850c266e3df14da1d31
compiler:       clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17347095600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=143bcca5600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d850c266e3df14da1d31@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in rxrpc_send_keepalive+0xe2/0x3c0  
net/rxrpc/output.c:634
Read of size 8 at addr ffff8880a859a058 by task kworker/0:2/3016

CPU: 0 PID: 3016 Comm: kworker/0:2 Not tainted 5.3.0-rc8+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: krxrpcd rxrpc_peer_keepalive_worker
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1d8/0x2f8 lib/dump_stack.c:113
  print_address_description+0x75/0x5b0 mm/kasan/report.c:351
  __kasan_report+0x14b/0x1c0 mm/kasan/report.c:482
  kasan_report+0x26/0x50 mm/kasan/common.c:618
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
  rxrpc_send_keepalive+0xe2/0x3c0 net/rxrpc/output.c:634
  rxrpc_peer_keepalive_dispatch net/rxrpc/peer_event.c:369 [inline]
  rxrpc_peer_keepalive_worker+0x76e/0xb40 net/rxrpc/peer_event.c:430
  process_one_work+0x7ef/0x10e0 kernel/workqueue.c:2269
  worker_thread+0xc01/0x1630 kernel/workqueue.c:2415
  kthread+0x332/0x350 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Allocated by task 9378:
  save_stack mm/kasan/common.c:69 [inline]
  set_track mm/kasan/common.c:77 [inline]
  __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:493
  kasan_kmalloc+0x9/0x10 mm/kasan/common.c:507
  kmem_cache_alloc_trace+0x221/0x2f0 mm/slab.c:3550
  kmalloc include/linux/slab.h:552 [inline]
  kzalloc include/linux/slab.h:748 [inline]
  rxrpc_alloc_connection+0x79/0x490 net/rxrpc/conn_object.c:41
  rxrpc_alloc_client_connection net/rxrpc/conn_client.c:176 [inline]
  rxrpc_get_client_conn net/rxrpc/conn_client.c:339 [inline]
  rxrpc_connect_call+0xb30/0x2c40 net/rxrpc/conn_client.c:697
  rxrpc_new_client_call+0x6d5/0xb60 net/rxrpc/call_object.c:289
  rxrpc_new_client_call_for_sendmsg net/rxrpc/sendmsg.c:595 [inline]
  rxrpc_do_sendmsg+0xf2b/0x19b0 net/rxrpc/sendmsg.c:652
  rxrpc_sendmsg+0x5eb/0x8b0 net/rxrpc/af_rxrpc.c:585
  sock_sendmsg_nosec net/socket.c:637 [inline]
  sock_sendmsg net/socket.c:657 [inline]
  ___sys_sendmsg+0x60d/0x910 net/socket.c:2311
  __sys_sendmmsg+0x239/0x470 net/socket.c:2413
  __do_sys_sendmmsg net/socket.c:2442 [inline]
  __se_sys_sendmmsg net/socket.c:2439 [inline]
  __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2439
  do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 16:
  save_stack mm/kasan/common.c:69 [inline]
  set_track mm/kasan/common.c:77 [inline]
  __kasan_slab_free+0x12a/0x1e0 mm/kasan/common.c:455
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:463
  __cache_free mm/slab.c:3425 [inline]
  kfree+0x115/0x200 mm/slab.c:3756
  rxrpc_destroy_connection+0x1ec/0x240 net/rxrpc/conn_object.c:372
  __rcu_reclaim kernel/rcu/rcu.h:222 [inline]
  rcu_do_batch kernel/rcu/tree.c:2114 [inline]
  rcu_core+0x892/0xf10 kernel/rcu/tree.c:2314
  rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2323
  __do_softirq+0x333/0x7c4 arch/x86/include/asm/paravirt.h:778

The buggy address belongs to the object at ffff8880a859a040
  which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 24 bytes inside of
  1024-byte region [ffff8880a859a040, ffff8880a859a440)
The buggy address belongs to the page:
page:ffffea0002a16680 refcount:1 mapcount:0 mapping:ffff8880aa400c40  
index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea00024cc688 ffffea0002684d88 ffff8880aa400c40
raw: 0000000000000000 ffff8880a859a040 0000000100000007 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8880a8599f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8880a8599f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8880a859a000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                     ^
  ffff8880a859a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8880a859a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


^ permalink raw reply

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
From: Al Viro @ 2019-09-10 23:15 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Carlos Antonio Neira Bustos, netdev@vger.kernel.org,
	ebiederm@xmission.com, brouer@redhat.com, bpf@vger.kernel.org
In-Reply-To: <dadf3657-2648-14ef-35ee-e09efb2cdb3e@fb.com>

On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
> 
> Carlos,
> 
> Discussed with Eric today for what is the best way to get
> the device number for a namespace. The following patch seems
> a reasonable start although Eric would like to see
> how the helper is used in order to decide whether the
> interface looks right.
> 
> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
> Author: Yonghong Song <yhs@fb.com>
> Date:   Mon Sep 9 21:50:51 2019 -0700
> 
>      nsfs: add an interface function ns_get_inum_dev()
> 
>      This patch added an interface function
>      ns_get_inum_dev(). Given a ns_common structure,
>      the function returns the inode and device
>      numbers. The function will be used later
>      by a newly added bpf helper.
> 
>      Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index a0431642c6b5..a603c6fc3f54 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
>          return ERR_PTR(-EINVAL);
>   }
> 
> +/* Get the device number for the current task pidns.
> + */
> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
> +{
> +       *inum = ns->inum;
> +       *dev = nsfs_mnt->mnt_sb->s_dev;
> +}

Umm...  Where would it get the device number once we get (hell knows
what for) multiple nsfs instances?  I still don't understand what
would that be about, TBH...  Is it really per-userns?  Or something
else entirely?  Eric, could you give some context?

^ permalink raw reply

* Re: [PATCH] ftgmac100: Disable HW checksum generation on AST2500
From: Vijay Khemka @ 2019-09-10 23:07 UTC (permalink / raw)
  To: Florian Fainelli, David S. Miller, YueHaibing, Andrew Lunn,
	Kate Stewart, Mauro Carvalho Chehab, Luis Chamberlain,
	Thomas Gleixner, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: openbmc @ lists . ozlabs . org, Sai Dasari,
	linux-aspeed@lists.ozlabs.org
In-Reply-To: <0797B1F1-883D-4129-AC16-794957ACCF1B@fb.com>



On 9/10/19, 3:50 PM, "Linux-aspeed on behalf of Vijay Khemka" <linux-aspeed-bounces+vijaykhemka=fb.com@lists.ozlabs.org on behalf of vijaykhemka@fb.com> wrote:

    
    
    On 9/10/19, 3:05 PM, "Florian Fainelli" <f.fainelli@gmail.com> wrote:
    
        On 9/10/19 2:37 PM, Vijay Khemka wrote:
        > HW checksum generation is not working for AST2500, specially with IPV6
        > over NCSI. All TCP packets with IPv6 get dropped. By disabling this
        > it works perfectly fine with IPV6.
        > 
        > Verified with IPV6 enabled and can do ssh.
        
        How about IPv4, do these packets have problem? If not, can you continue
        advertising NETIF_F_IP_CSUM but take out NETIF_F_IPV6_CSUM?
    
    I changed code from (netdev->hw_features &= ~NETIF_F_HW_CSUM) to 
    (netdev->hw_features &= ~NETIF_F_ IPV6_CSUM). And it is not working. 
    Don't know why. IPV4 works without any change but IPv6 needs HW_CSUM
    Disabled.

Now I changed to
netdev->hw_features &= (~NETIF_F_HW_CSUM) | NETIF_F_IP_CSUM;
And it works.
        
        > 
        > Signed-off-by: Vijay Khemka <vijaykhemka@fb.com>
        > ---
        >  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++--
        >  1 file changed, 3 insertions(+), 2 deletions(-)
        > 
        > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
        > index 030fed65393e..591c9725002b 100644
        > --- a/drivers/net/ethernet/faraday/ftgmac100.c
        > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
        > @@ -1839,8 +1839,9 @@ static int ftgmac100_probe(struct platform_device *pdev)
        >  	if (priv->use_ncsi)
        >  		netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER;
        >  
        > -	/* AST2400  doesn't have working HW checksum generation */
        > -	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
        > +	/* AST2400  and AST2500 doesn't have working HW checksum generation */
        > +	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac") ||
        > +		   of_device_is_compatible(np, "aspeed,ast2500-mac")))
        >  		netdev->hw_features &= ~NETIF_F_HW_CSUM;
        >  	if (np && of_get_property(np, "no-hw-checksum", NULL))
        >  		netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
        > 
        
        
        -- 
        Florian
        
    
    


^ permalink raw reply

* [PATCH] wimax: i2400: fix memory leak
From: Navid Emamdoost @ 2019-09-10 23:01 UTC (permalink / raw)
  Cc: emamd001, smccaman, kjlu, Navid Emamdoost, Inaky Perez-Gonzalez,
	linux-wimax, David S. Miller, netdev, linux-kernel

In i2400m_op_rfkill_sw_toggle cmd buffer should be released along with
skb response.

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
---
 drivers/net/wimax/i2400m/op-rfkill.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wimax/i2400m/op-rfkill.c b/drivers/net/wimax/i2400m/op-rfkill.c
index 6642bcb27761..8efb493ceec2 100644
--- a/drivers/net/wimax/i2400m/op-rfkill.c
+++ b/drivers/net/wimax/i2400m/op-rfkill.c
@@ -127,6 +127,7 @@ int i2400m_op_rfkill_sw_toggle(struct wimax_dev *wimax_dev,
 			"%d\n", result);
 	result = 0;
 error_cmd:
+	kfree(cmd);
 	kfree_skb(ack_skb);
 error_msg_to_dev:
 error_alloc:
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH bpf-next v10 4/4] tools/testing/selftests/bpf: Add self-tests for helper bpf_get_pidns_info.
From: Yonghong Song @ 2019-09-10 22:55 UTC (permalink / raw)
  To: Carlos Neira, netdev@vger.kernel.org
  Cc: ebiederm@xmission.com, brouer@redhat.com, bpf@vger.kernel.org
In-Reply-To: <20190906150952.23066-5-cneirabustos@gmail.com>



On 9/6/19 4:09 PM, Carlos Neira wrote:
> Added 2 selftest:
> 
> bpf_get_current_pidns_info helper is called in an interrupt
> context and also in a non interrupt context.
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
>   tools/testing/selftests/bpf/Makefile               |   2 +-
>   tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
>   .../testing/selftests/bpf/progs/test_pidns_kern.c  |  52 ++++++++
>   .../selftests/bpf/progs/test_pidns_nmi_kern.c      |  52 ++++++++
>   tools/testing/selftests/bpf/test_pidns.c           | 146 +++++++++++++++++++++
>   tools/testing/selftests/bpf/test_pidns_nmi.c       | 139 ++++++++++++++++++++
>   6 files changed, 393 insertions(+), 1 deletion(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
>   create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c
>   create mode 100644 tools/testing/selftests/bpf/test_pidns.c
>   create mode 100644 tools/testing/selftests/bpf/test_pidns_nmi.c
> 
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 1faad0c3c3c9..8507c89141f5 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
>   	test_cgroup_storage test_select_reuseport test_section_names \
>   	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
>   	test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
> -	test_sockopt_multi test_sockopt_inherit test_tcp_rtt
> +	test_sockopt_multi test_sockopt_inherit test_tcp_rtt test_pidns test_pidns_nmi
>   
>   BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
>   TEST_GEN_FILES = $(BPF_OBJ_FILES)
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 6c4930bc6e2e..30a70ebe583a 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -313,6 +313,9 @@ static unsigned int (*bpf_set_hash)(void *ctx, __u32 hash) =
>   static int (*bpf_skb_adjust_room)(void *ctx, __s32 len_diff, __u32 mode,
>   				  unsigned long long flags) =
>   	(void *) BPF_FUNC_skb_adjust_room;
> +static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info *buf,
> +					 unsigned int buf_size) =
> +	(void *) BPF_FUNC_get_current_pidns_info;
>   
>   /* Scan the ARCH passed in from ARCH env variable (see Makefile) */
>   #if defined(__TARGET_ARCH_x86)
> diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> new file mode 100644
> index 000000000000..a4c0bde1608b
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <linux/bpf.h>
> +#include <errno.h>
> +#include "bpf_helpers.h"
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 1);
> +	__type(key, __u32);
> +	__type(value, struct bpf_pidns_info);
> +} nsidmap SEC(".maps");
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 1);
> +	__type(key, __u32);
> +	__type(value, __u32);
> +} pidmap SEC(".maps");
> +
> +SEC("tracepoint/syscalls/sys_enter_nanosleep")
> +int trace(void *ctx)
> +{
> +	struct bpf_pidns_info nsinfo;
> +	__u32 key = 0, *expected_pid;
> +	struct bpf_pidns_info  *val;
> +
> +	if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
> +		return -EINVAL;
> +
> +	expected_pid = bpf_map_lookup_elem(&pidmap, &key);
> +	__u64 tgid = bpf_get_current_pid_tgid();
> +
> +	if (!expected_pid || *expected_pid != nsinfo.pid ||
> +			nsinfo.tgid != (__u32)tgid)
> +		return 0;

The logic in the above is not right.
bpf_get_current_pidns_info() retrieved
   dev, nsid, tgid and pid.

You should create a map, populate dev/nsid you
got from the user space. And then update maybe
another map with tgid/pid and passed back to
user space for verification.

> +
> +	val = bpf_map_lookup_elem(&nsidmap, &key);
> +	if (val)
> +		*val = nsinfo;
> +
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +__u32 _version SEC("version") = 1;
> diff --git a/tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c
> new file mode 100644
> index 000000000000..7f02e4e29021
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <linux/bpf.h>
> +#include <errno.h>
> +#include "bpf_helpers.h"
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 1);
> +	__type(key, __u32);
> +	__type(value, struct bpf_pidns_info);
> +} nsidmap SEC(".maps");
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 1);
> +	__type(key, __u32);
> +	__type(value, __u32);
> +} pidmap SEC(".maps");
> +
> +SEC("tracepoint/net/netif_receive_skb")
> +int trace(void *ctx)
> +{
> +	struct bpf_pidns_info nsinfo;
> +	__u32 key = 0, *expected_pid;
> +	struct bpf_pidns_info  *val;
> +
> +	if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
> +		return -EINVAL;
> +
> +	expected_pid = bpf_map_lookup_elem(&pidmap, &key);
> +	__u64 tgid = bpf_get_current_pid_tgid();
> +
> +	if (!expected_pid || *expected_pid != nsinfo.pid ||
> +			nsinfo.tgid != (__u32)tgid)
> +		return 0;
> +
> +	val = bpf_map_lookup_elem(&nsidmap, &key);
> +	if (val)
> +		*val = nsinfo;
> +
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +__u32 _version SEC("version") = 1;

With the new nsfs interface function(), the file test_pidns_nmi_kern.c
is not needed any more.

Please also remove the in_interrupt() checking in the kernel helper 
implementation.

> diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c

Please restructure and put this file under 
tools/testing/selftests/bpf/prog_tests directory.

> new file mode 100644
> index 000000000000..0c579305da53
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_pidns.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <linux/perf_event.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include "cgroup_helpers.h"
> +#include "bpf_rlimit.h"
> +
> +#define CHECK(condition, tag, format...) ({		\
> +	int __ret = !!(condition);			\
> +	if (__ret) {					\
> +		printf("%s:FAIL:%s ", __func__, tag);	\
> +		printf(format);				\
> +	} else {					\
> +		printf("%s:PASS:%s\n", __func__, tag);	\
> +	}						\
> +	__ret;						\
> +})
> +
> +static int bpf_find_map(const char *test, struct bpf_object *obj,
> +			const char *name)
> +{
> +	struct bpf_map *map;
> +
> +	map = bpf_object__find_map_by_name(obj, name);
> +	if (!map)
> +		return -1;
> +	return bpf_map__fd(map);
> +}
> +
> +
> +int main(int argc, char **argv)
> +{
> +	const char *probe_name = "syscalls/sys_enter_nanosleep";
> +	const char *file = "test_pidns_kern.o";
> +	int err, bytes, efd, prog_fd, pmu_fd;
> +	struct bpf_pidns_info knsid = {};
> +	int pidmap_fd, nsidmap_fd;
> +	struct perf_event_attr attr = {};
> +	struct bpf_object *obj;
> +	__u32 key = 0, pid;
> +	int exit_code = 1;
> +	struct stat st;
> +	char buf[256];
> +
> +	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
> +	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
> +		goto cleanup_cgroup_env;
> +
> +	nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
> +	if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  nsidmap_fd, errno))
> +		goto close_prog;
> +
> +	pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
> +	if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  pidmap_fd, errno))
> +		goto close_prog;
> +
> +	pid = getpid();
> +	bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
> +
> +	snprintf(buf, sizeof(buf),
> +		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
> +	efd = open(buf, O_RDONLY, 0);
> +	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
> +		goto close_prog;
> +	bytes = read(efd, buf, sizeof(buf));
> +	close(efd);
> +	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
> +		  "bytes %d errno %d\n", bytes, errno))
> +		goto close_prog;
> +
> +	attr.config = strtol(buf, NULL, 0);
> +	attr.type = PERF_TYPE_TRACEPOINT;
> +	attr.sample_type = PERF_SAMPLE_RAW;
> +	attr.sample_period = 1;
> +	attr.wakeup_events = 1;
> +
> +	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
> +	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
> +		  errno))
> +		goto close_prog;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> +	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> +	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	/* trigger some syscalls */
> +	sleep(1);
> +
> +	err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
> +	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
> +		goto close_pmu;
> +
> +	if (stat("/proc/self/ns/pid", &st))
> +		goto close_pmu;
> +
> +	if (CHECK(knsid.nsid != (__u32) st.st_ino, "namespace id",
> +		  "bpf %u user %u\n", knsid.nsid, (__u32) st.st_ino))
> +		goto close_pmu;
> +
> +	if (CHECK(major(knsid.dev) != major(st.st_rdev), "dev major",
> +		  "bpf %u user %u\n", major(knsid.dev), major(st.st_rdev)))
> +		goto close_pmu;
> +
> +	if (CHECK(minor(knsid.dev) != minor(st.st_rdev), "dev minor",
> +		  "bpf %u user %u\n", minor(knsid.dev), minor(st.st_rdev)))
> +		goto close_pmu;
> +
> +	exit_code = 0;
> +	printf("%s:PASS\n", argv[0]);
> +
> +close_pmu:
> +	close(pmu_fd);
> +close_prog:
> +	bpf_object__close(obj);
> +cleanup_cgroup_env:
> +	return exit_code;
> +}
> diff --git a/tools/testing/selftests/bpf/test_pidns_nmi.c b/tools/testing/selftests/bpf/test_pidns_nmi.c
> new file mode 100644
> index 000000000000..bb8075bbe7d8
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_pidns_nmi.c
> @@ -0,0 +1,139 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <linux/perf_event.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include "cgroup_helpers.h"
> +#include "bpf_rlimit.h"
> +
> +#define CHECK(condition, tag, format...) ({		\
> +	int __ret = !!(condition);			\
> +	if (__ret) {					\
> +		printf("%s:FAIL:%s ", __func__, tag);	\
> +		printf(format);				\
> +	} else {					\
> +		printf("%s:PASS:%s\n", __func__, tag);	\
> +	}						\
> +	__ret;						\
> +})
> +
> +static int bpf_find_map(const char *test, struct bpf_object *obj,
> +			const char *name)
> +{
> +	struct bpf_map *map;
> +
> +	map = bpf_object__find_map_by_name(obj, name);
> +	if (!map)
> +		return -1;
> +	return bpf_map__fd(map);
> +}
> +
> +
> +int main(int argc, char **argv)
> +{
> +	const char *probe_name = "net/netif_receive_skb";
> +	const char *file = "test_pidns_kern.o";
> +	int err, bytes, efd, prog_fd, pmu_fd;
> +	struct bpf_pidns_info knsid = {};
> +	int pidmap_fd, nsidmap_fd;
> +	struct perf_event_attr attr = {};
> +	struct bpf_object *obj;
> +	__u32 key = 0, pid;
> +	int exit_code = 1;
> +	struct stat st;
> +	char buf[256];
> +
> +	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
> +	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
> +		goto cleanup_cgroup_env;
> +
> +	nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
> +	if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  nsidmap_fd, errno))
> +		goto close_prog;
> +
> +	pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
> +	if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  pidmap_fd, errno))
> +		goto close_prog;
> +
> +	pid = getpid();
> +	bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
> +
> +	snprintf(buf, sizeof(buf),
> +		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
> +	efd = open(buf, O_RDONLY, 0);
> +	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
> +		goto close_prog;
> +	bytes = read(efd, buf, sizeof(buf));
> +	close(efd);
> +	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
> +		  "bytes %d errno %d\n", bytes, errno))
> +		goto close_prog;
> +
> +	attr.config = strtol(buf, NULL, 0);
> +	attr.type = PERF_TYPE_TRACEPOINT;
> +	attr.sample_type = PERF_SAMPLE_RAW;
> +	attr.sample_period = 1;
> +	attr.wakeup_events = 1;
> +
> +	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
> +	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
> +		  errno))
> +		goto close_prog;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> +	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> +	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	/* trigger some syscalls */
> +	sleep(1);
> +
> +	err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
> +	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
> +		goto close_pmu;
> +
> +	if (stat("/proc/self/ns/pid", &st))
> +		goto close_pmu;
> +
> +	if (CHECK(knsid.nsid != (__u32) st.st_ino, "namespace_id",
> +				"Called in interrupt context bpf %u user %u\n",
> +				knsid.nsid, (__u32) st.st_ino))
> +		goto close_pmu;
> +
> +	exit_code = 0;
> +	printf("%s:PASS\n", argv[0]);
> +
> +close_pmu:
> +	close(pmu_fd);
> +close_prog:
> +	bpf_object__close(obj);
> +cleanup_cgroup_env:
> +	return exit_code;
> +}

the file test_pidns_nmi.c is not needed any more.

^ permalink raw reply

* Re: ixgbe: driver drops packets routed from an IPSec interface with a "bad sa_idx" error
From: Michael Marley @ 2019-09-10 22:53 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, Jeff Kirsher, steffen.klassert
In-Reply-To: <90dd9f8c-57fa-14c7-5d09-207b84ec3292@pensando.io>

On 9/10/19 5:43 PM, Shannon Nelson wrote:

> On 9/9/19 11:45 AM, Michael Marley wrote:
>> On 2019-09-09 14:21, Shannon Nelson wrote:
>>> On 9/6/19 11:13 AM, Michael Marley wrote:
>>>> (This is also reported at
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=204551, but it was
>>>> recommended that I send it to this list as well.)
>>>>
>>>> I have a put together a router that routes traffic from several
>>>> local subnets from a switch attached to an i82599ES card through an
>>>> IPSec VPN interface set up with StrongSwan. (The VPN is running on
>>>> an unrelated second interface with a different driver.)  Traffic
>>>> from the local interfaces to the VPN works as it should and
>>>> eventually makes it through the VPN server and out to the
>>>> Internet.  The return traffic makes it back to the router and
>>>> tcpdump shows it leaving by the i82599, but the traffic never
>>>> actually makes it onto the wire and I instead get one of
>>>>
>>>> enp1s0: ixgbe_ipsec_tx: bad sa_idx=64512 handle=0
>>>>
>>>> for each packet that should be transmitted.  (The sa_idx and handle
>>>> values are always the same.)
>>>>
>>>> I realized this was probably related to ixgbe's IPSec offloading
>>>> feature, so I tried with the motherboard's integrated e1000e device
>>>> and didn't have the problem.  I tried using ethtool to disable all
>>>> the IPSec-related offloads (tx-esp-segmentation, esp-hw-offload,
>>>> esp-tx-csum-hw-offload), but the problem persisted.  I then tried
>>>> recompiling the kernel with CONFIG_IXGBE_IPSEC=n and that worked
>>>> around the problem.
>>>>
>>>> I was also able to find another instance of the same problem
>>>> reported in Debian at
>>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=930443. That
>>>> person seems to be having exactly the same issue as me, down to the
>>>> sa_idx and handle values being the same.
>>>>
>>>> If there are any more details I can provide to make this easier to
>>>> track down, please let me know.
>>>>
>>>> Thanks,
>>>>
>>>> Michael Marley
>>>
>>> Hi Michael,
>>>
>>> Thanks for pointing this out.  The issue this error message is
>>> complaining about is that the handle given to the driver is a bad
>>> value.  The handle is what helps the driver find the right encryption
>>> information, and in this case is an index into an array, one array for
>>> Rx and one for Tx, each of which have up to 1024 entries.  In order to
>>> encode them into a single value, 1024 is added to the Tx values to
>>> make the handle, and 1024 is subtracted to use the handle later.  Note
>>> that the bad sa_idx is 64512, which happens to also be -1024; if the
>>> Tx handle given to ixgbe for xmit is 0, we subtract 1024 from that and
>>> get this bad sa_idx value.
>>>
>>> That handle is supposed to be an opaque value only used by the
>>> driver.  It looks to me like either (a) the driver is not setting up
>>> the handle correctly when the SA is first set up, or (b) something in
>>> the upper levels of the ipsec code is clearing the handle value. We
>>> would need to know more about all the bits in your SA set up to have a
>>> better idea what parts of the ipsec code are being exercised when this
>>> problem happens.
>>>
>>> I currently don't have access to a good ixgbe setup on which to
>>> test/debug this, and I haven't been paying much attention lately to
>>> what's happening in the upper ipsec layers, so my help will be
>>> somewhat limited.  I'm hoping the the Intel folks can add a little
>>> help, so I've copied Jeff Kirsher on this (they'll probably point back
>>> to me since I wrote this chunk :-) ).  I've also copied Stephen
>>> Klassert for his ipsec thoughts.
>>>
>>> In the meantime, can you give more details on the exact ipsec rules
>>> that are used here, and are there any error messages coming from ixgbe
>>> regarding the ipsec rule setup that might help us identify what's
>>> happening?
>>>
>>> Thanks,
>>> sln
>>
>> Hi Shannon,
>>
>> Thanks for your response!  I apologize, I am a bit of a newbie to
>> IPSec myself, so I'm not 100% sure what is the best way to provide
>> the information you need, but here is the (slightly-redacted) output
>> of swanctl --list-sas first from the server and then from the client:
>>
>> <servername>: #24, ESTABLISHED, IKEv2, 3cb75c180ee5dc68_i
>> cc7dae551b603bb7_r*
>>   local  '<serverip>' @ <serverip>[4500]
>>   remote '<clientip>' @ <clientip>[4500]
>>   AES_GCM_16-256/PRF_HMAC_SHA2_512/ECP_384
>>   established 174180s ago
>>   <servername>: #110, reqid 12, INSTALLED, TUNNEL-in-UDP,
>> ESP:AES_GCM_16-256/ECP_384
>>     installed 469s ago
>>     in  c51a0f11 (-|0x00000064), 1548864 bytes, 19575 packets, 6s ago
>>     out c3bd9741 (-|0x00000064), 23618807 bytes, 22865 packets,    
>> 7s ago
>>     local  0.0.0.0/0 ::/0
>>     remote 0.0.0.0/0 ::/0
>>
>> <clientname>: #1, ESTABLISHED, IKEv2, 3cb75c180ee5dc68_i*
>> cc7dae551b603bb7_r
>>   local  '<clientip>' @ <clientip>[4500]
>>   remote '<serverip>' @ <serverip>[4500]
>>   AES_GCM_16-256/PRF_HMAC_SHA2_512/ECP_384
>>   established 174013s ago
>>   <clientname>: #54, reqid 1, INSTALLED, TUNNEL-in-UDP,
>> ESP:AES_GCM_16-256/ECP_384
>>     installed 303s ago, rekeying in 2979s, expires in 3657s
>>     in  c3bd9741 (-|0x00000064), 23178523 bytes, 20725 packets,    
>> 0s ago
>>     out c51a0f11 (-|0x00000064), 1429124 bytes, 17719 packets, 0s ago
>>     local  0.0.0.0/0 ::/0
>>     remote 0.0.0.0/0 ::/0
>>
>> It might also be worth mentioning that I am using an xfrm interface
>> to do "regular" routing rather than the policy-based routing that
>> StrongSwan/IPSec normally uses. If there is anything else that would
>> help more, I would be happy to provide it.
>>
>> Just to be clear though, I'm not trying to run IPSec on the ixgbe
>> interface at all.  The ixgbe adapter is being used to connect the
>> router to the switch on the LAN side of the network.  IPSec is
>> running on the WAN interface without any hardware acceleration
>> (besides AES-NI).  The problem occurs when a computer on the LAN
>> tries to access the WAN.  The outgoing packets work as expected and
>> the incoming packets are routed back out through the ixgbe device
>> toward the LAN client, but the driver drops the packets with the
>> sa_idx error.
>>
>> I hope this helps.
>>
>> Thanks,
>>
>> Michael
>
> I'm not familiar with StrongSwan and its configurations, but I'm
> guessing that if you didn't expressly enable it, perhaps StrongSwan
> enabled the ipsec offload capability.  I would suggest turning it off
> to at least get you passed the immediate issue.  If there isn't an
> obvious configuration knob in StrongSwan, perhaps you can at least use
> ethtool to disable the offload, which should be off be default anyway.
>
> You can check it with "ethtool -k ethX | grep esp-hw-offload" and see
> if it is set.  You can disable it with "ethtool -K ethX esp-hw-offload
> off"
>
> Meanwhile, can you please send the output of the following commands:
> uname -a
> ip xfrm s
> ip xfrm p
> dmesg | grep ixgbe
>
> And any other /var/log/syslog or /var/log/messages that look
> suspicious and might give any more insight to what's happening.
>
> Thanks,
> sln
>
StrongSwan has hardware offload disabled by default, and I didn't enable
it explicitly.  I also already tried turning off all those switches with
ethtool and it has no effect.  This doesn't surprise me though, because
as I said, I don't actually have the IPSec connection running over the
ixgbe device.  The IPSec connection runs over another network adapter
that doesn't support IPSec offload at all.  The problem comes when
traffic received over the IPSec interface is then routed back out
(unencrypted) through the ixgbe device into the local network.

Here is the rest of the data for which you asked:

michael@soapstone:~$ uname -a
Linux soapstone 5.3.0-10-generic #11-Ubuntu SMP Mon Sep 9 15:12:17 UTC
2019 x86_64 x86_64 x86_64 GNU/Linux
michael@soapstone:~$ sudo ip xfrm s
src <srcip> dst <dstip>
        proto esp spi 0xcf6f90d3 reqid 1 mode tunnel
        replay-window 0 flag af-unspec
        aead rfc4106(gcm(aes))
0x254c6298b27ad65f61387c39e060698db777a335081d145ca6706d65b6be95770d2622b4
128
        encap type espinudp sport 4500 dport 4500 addr 0.0.0.0
        anti-replay context: seq 0x0, oseq 0xaaac, bitmap 0x00000000
        if_id 0x64
src <dstip> dst <srcip>
        proto esp spi 0xc6e02140 reqid 1 mode tunnel
        replay-window 32 flag af-unspec
        aead rfc4106(gcm(aes))
0x05473bd76e1b7268b54825b019d19c13a360193bc9aa20137204ea566409356da47fc7d7
128
        encap type espinudp sport 4500 dport 4500 addr 0.0.0.0
        anti-replay context: seq 0xab11, oseq 0x0, bitmap 0xffffffff
        if_id 0x64
michael@soapstone:~$ sudo ip xfrm p
src ::/0 dst ::/0
        dir out priority 399999
        tmpl src <srcip> dst <dstip>
                proto esp spi 0xcf6f90d3 reqid 1 mode tunnel
        if_id 0x64
src 0.0.0.0/0 dst 0.0.0.0/0
        dir out priority 399999
        tmpl src <srcip> dst <dstip>
                proto esp spi 0xcf6f90d3 reqid 1 mode tunnel
        if_id 0x64
src ::/0 dst ::/0
        dir fwd priority 399999
        tmpl src <dstip> dst <srcip>
                proto esp reqid 1 mode tunnel
        if_id 0x64
src ::/0 dst ::/0
        dir in priority 399999
        tmpl src <dstip> dst <srcip>
                proto esp reqid 1 mode tunnel
        if_id 0x64
src 0.0.0.0/0 dst 0.0.0.0/0
        dir fwd priority 399999
        tmpl src <dstip> dst <srcip>
                proto esp reqid 1 mode tunnel
        if_id 0x64
src 0.0.0.0/0 dst 0.0.0.0/0
        dir in priority 399999
        tmpl src <dstip> dst <srcip>
                proto esp reqid 1 mode tunnel
        if_id 0x64
src 0.0.0.0/0 dst 0.0.0.0/0
        socket in priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
        socket out priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
        socket in priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
        socket out priority 0
src ::/0 dst ::/0
        socket in priority 0
src ::/0 dst ::/0
        socket out priority 0
src ::/0 dst ::/0
        socket in priority 0
src ::/0 dst ::/0
        socket out priority 0
michael@soapstone:~$ dmesg | grep -i ixgbe
[    0.780400] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver -
version 5.1.0-k
[    0.781606] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[    0.954093] ixgbe 0000:01:00.0: Multiqueue Enabled: Rx Queue count =
8, Tx Queue count = 8 XDP Queue count = 0
[    0.955081] ixgbe 0000:01:00.0: 32.000 Gb/s available PCIe bandwidth
(5 GT/s x8 link)
[    0.955860] ixgbe 0000:01:00.0: MAC: 2, PHY: 14, SFP+: 3, PBA No: Unknown
[    0.956519] ixgbe 0000:01:00.0: 00:1b:21:c0:00:1e
[    0.958079] ixgbe 0000:01:00.0: Intel(R) 10 Gigabit Network Connection
[    0.958884] libphy: ixgbe-mdio: probed
[    0.960220] ixgbe 0000:01:00.0 enp1s0: renamed from eth0
[    2.788208] ixgbe 0000:01:00.0: registered PHC device on enp1s0
[    2.966290] ixgbe 0000:01:00.0 enp1s0: detected SFP+: 3
[    3.110132] ixgbe 0000:01:00.0 enp1s0: NIC Link is Up 10 Gbps, Flow
Control: RX/TX

Thanks,

Michael



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox