netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver
@ 2024-11-07 13:30 Jijie Shao
  2024-11-07 13:30 ` [PATCH RESEND net 1/7] net: hns3: fixed reset failure issues caused by the incorrect reset type Jijie Shao
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Jijie Shao @ 2024-11-07 13:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, salil.mehta, liuyonglong, wangpeiyang1, shaojijie,
	chenhao418, netdev, linux-kernel

There's a series of bugfix that's been accepted:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=d80a3091308491455b6501b1c4b68698c4a7cd24

However, The series is making the driver poke into IOMMU internals instead of
implementing appropriate IOMMU workarounds. After discussion, the series was reverted:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=249cfa318fb1b77eb726c2ff4f74c9685f04e568

But only two patches are related to the IOMMU.
Other patches involve only the modification of the driver.
This series resends other patches.

Hao Lan (4):
  net: hns3: fixed reset failure issues caused by the incorrect reset
    type
  net: hns3: fix missing features due to dev->features configuration too
    early
  net: hns3: Resolved the issue that the debugfs query result is
    inconsistent.
  net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds
    issue

Jian Shen (2):
  net: hns3: don't auto enable misc vector
  net: hns3: initialize reset_timer before hclgevf_misc_irq_init()

Jie Wang (1):
  net: hns3: fix kernel crash when 1588 is sent on HIP08 devices

 .../ethernet/hisilicon/hns3/hns3_debugfs.c    |  4 +-
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |  1 -
 .../hisilicon/hns3/hns3pf/hclge_main.c        | 45 +++++++++++++++----
 .../hisilicon/hns3/hns3pf/hclge_ptp.c         |  3 ++
 .../hisilicon/hns3/hns3pf/hclge_regs.c        |  9 ++--
 .../hisilicon/hns3/hns3vf/hclgevf_main.c      | 40 ++++++++++++++---
 .../hisilicon/hns3/hns3vf/hclgevf_regs.c      |  9 ++--
 7 files changed, 85 insertions(+), 26 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RESEND net 1/7] net: hns3: fixed reset failure issues caused by the incorrect reset type
  2024-11-07 13:30 [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Jijie Shao
@ 2024-11-07 13:30 ` Jijie Shao
  2024-11-07 13:30 ` [PATCH RESEND net 2/7] net: hns3: fix missing features due to dev->features configuration too early Jijie Shao
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Jijie Shao @ 2024-11-07 13:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, salil.mehta, liuyonglong, wangpeiyang1, shaojijie,
	chenhao418, netdev, linux-kernel

From: Hao Lan <lanhao@huawei.com>

When a reset type that is not supported by the driver is input, a reset
pending flag bit of the HNAE3_NONE_RESET type is generated in
reset_pending. The driver does not have a mechanism to clear this type
of error. As a result, the driver considers that the reset is not
complete. This patch provides a mechanism to clear the
HNAE3_NONE_RESET flag and the parameter of
hnae3_ae_ops.set_default_reset_request is verified.

The error message:
hns3 0000:39:01.0: cmd failed -16
hns3 0000:39:01.0: hclge device re-init failed, VF is disabled!
hns3 0000:39:01.0: failed to reset VF stack
hns3 0000:39:01.0: failed to reset VF(4)
hns3 0000:39:01.0: prepare reset(2) wait done
hns3 0000:39:01.0 eth4: already uninitialized

Use the crash tool to view struct hclgevf_dev:
struct hclgevf_dev {
...
	default_reset_request = 0x20,
	reset_level = HNAE3_NONE_RESET,
	reset_pending = 0x100,
	reset_type = HNAE3_NONE_RESET,
...
};

Fixes: 720bd5837e37 ("net: hns3: add set_default_reset_request in the hnae3_ae_ops")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 .../hisilicon/hns3/hns3pf/hclge_main.c        | 33 ++++++++++++++--
 .../hisilicon/hns3/hns3vf/hclgevf_main.c      | 38 ++++++++++++++++---
 2 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index bd86efd92a5a..35c618c794be 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3584,6 +3584,17 @@ static int hclge_set_vf_link_state(struct hnae3_handle *handle, int vf,
 	return ret;
 }
 
+static void hclge_set_reset_pending(struct hclge_dev *hdev,
+				    enum hnae3_reset_type reset_type)
+{
+	/* When an incorrect reset type is executed, the get_reset_level
+	 * function generates the HNAE3_NONE_RESET flag. As a result, this
+	 * type do not need to pending.
+	 */
+	if (reset_type != HNAE3_NONE_RESET)
+		set_bit(reset_type, &hdev->reset_pending);
+}
+
 static u32 hclge_check_event_cause(struct hclge_dev *hdev, u32 *clearval)
 {
 	u32 cmdq_src_reg, msix_src_reg, hw_err_src_reg;
@@ -3604,7 +3615,7 @@ static u32 hclge_check_event_cause(struct hclge_dev *hdev, u32 *clearval)
 	 */
 	if (BIT(HCLGE_VECTOR0_IMPRESET_INT_B) & msix_src_reg) {
 		dev_info(&hdev->pdev->dev, "IMP reset interrupt\n");
-		set_bit(HNAE3_IMP_RESET, &hdev->reset_pending);
+		hclge_set_reset_pending(hdev, HNAE3_IMP_RESET);
 		set_bit(HCLGE_COMM_STATE_CMD_DISABLE, &hdev->hw.hw.comm_state);
 		*clearval = BIT(HCLGE_VECTOR0_IMPRESET_INT_B);
 		hdev->rst_stats.imp_rst_cnt++;
@@ -3614,7 +3625,7 @@ static u32 hclge_check_event_cause(struct hclge_dev *hdev, u32 *clearval)
 	if (BIT(HCLGE_VECTOR0_GLOBALRESET_INT_B) & msix_src_reg) {
 		dev_info(&hdev->pdev->dev, "global reset interrupt\n");
 		set_bit(HCLGE_COMM_STATE_CMD_DISABLE, &hdev->hw.hw.comm_state);
-		set_bit(HNAE3_GLOBAL_RESET, &hdev->reset_pending);
+		hclge_set_reset_pending(hdev, HNAE3_GLOBAL_RESET);
 		*clearval = BIT(HCLGE_VECTOR0_GLOBALRESET_INT_B);
 		hdev->rst_stats.global_rst_cnt++;
 		return HCLGE_VECTOR0_EVENT_RST;
@@ -4062,7 +4073,7 @@ static void hclge_do_reset(struct hclge_dev *hdev)
 	case HNAE3_FUNC_RESET:
 		dev_info(&pdev->dev, "PF reset requested\n");
 		/* schedule again to check later */
-		set_bit(HNAE3_FUNC_RESET, &hdev->reset_pending);
+		hclge_set_reset_pending(hdev, HNAE3_FUNC_RESET);
 		hclge_reset_task_schedule(hdev);
 		break;
 	default:
@@ -4096,6 +4107,8 @@ static enum hnae3_reset_type hclge_get_reset_level(struct hnae3_ae_dev *ae_dev,
 		clear_bit(HNAE3_FLR_RESET, addr);
 	}
 
+	clear_bit(HNAE3_NONE_RESET, addr);
+
 	if (hdev->reset_type != HNAE3_NONE_RESET &&
 	    rst_level < hdev->reset_type)
 		return HNAE3_NONE_RESET;
@@ -4237,7 +4250,7 @@ static bool hclge_reset_err_handle(struct hclge_dev *hdev)
 		return false;
 	} else if (hdev->rst_stats.reset_fail_cnt < MAX_RESET_FAIL_CNT) {
 		hdev->rst_stats.reset_fail_cnt++;
-		set_bit(hdev->reset_type, &hdev->reset_pending);
+		hclge_set_reset_pending(hdev, hdev->reset_type);
 		dev_info(&hdev->pdev->dev,
 			 "re-schedule reset task(%u)\n",
 			 hdev->rst_stats.reset_fail_cnt);
@@ -4480,8 +4493,20 @@ static void hclge_reset_event(struct pci_dev *pdev, struct hnae3_handle *handle)
 static void hclge_set_def_reset_request(struct hnae3_ae_dev *ae_dev,
 					enum hnae3_reset_type rst_type)
 {
+#define HCLGE_SUPPORT_RESET_TYPE \
+	(BIT(HNAE3_FLR_RESET) | BIT(HNAE3_FUNC_RESET) | \
+	BIT(HNAE3_GLOBAL_RESET) | BIT(HNAE3_IMP_RESET))
+
 	struct hclge_dev *hdev = ae_dev->priv;
 
+	if (!(BIT(rst_type) & HCLGE_SUPPORT_RESET_TYPE)) {
+		/* To prevent reset triggered by hclge_reset_event */
+		set_bit(HNAE3_NONE_RESET, &hdev->default_reset_request);
+		dev_warn(&hdev->pdev->dev, "unsupported reset type %d\n",
+			 rst_type);
+		return;
+	}
+
 	set_bit(rst_type, &hdev->default_reset_request);
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 094a7c7b5592..ab54e6155e93 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1395,6 +1395,17 @@ static int hclgevf_notify_roce_client(struct hclgevf_dev *hdev,
 	return ret;
 }
 
+static void hclgevf_set_reset_pending(struct hclgevf_dev *hdev,
+				      enum hnae3_reset_type reset_type)
+{
+	/* When an incorrect reset type is executed, the get_reset_level
+	 * function generates the HNAE3_NONE_RESET flag. As a result, this
+	 * type do not need to pending.
+	 */
+	if (reset_type != HNAE3_NONE_RESET)
+		set_bit(reset_type, &hdev->reset_pending);
+}
+
 static int hclgevf_reset_wait(struct hclgevf_dev *hdev)
 {
 #define HCLGEVF_RESET_WAIT_US	20000
@@ -1544,7 +1555,7 @@ static void hclgevf_reset_err_handle(struct hclgevf_dev *hdev)
 		hdev->rst_stats.rst_fail_cnt);
 
 	if (hdev->rst_stats.rst_fail_cnt < HCLGEVF_RESET_MAX_FAIL_CNT)
-		set_bit(hdev->reset_type, &hdev->reset_pending);
+		hclgevf_set_reset_pending(hdev, hdev->reset_type);
 
 	if (hclgevf_is_reset_pending(hdev)) {
 		set_bit(HCLGEVF_RESET_PENDING, &hdev->reset_state);
@@ -1664,6 +1675,8 @@ static enum hnae3_reset_type hclgevf_get_reset_level(unsigned long *addr)
 		clear_bit(HNAE3_FLR_RESET, addr);
 	}
 
+	clear_bit(HNAE3_NONE_RESET, addr);
+
 	return rst_level;
 }
 
@@ -1673,14 +1686,15 @@ static void hclgevf_reset_event(struct pci_dev *pdev,
 	struct hnae3_ae_dev *ae_dev = pci_get_drvdata(pdev);
 	struct hclgevf_dev *hdev = ae_dev->priv;
 
-	dev_info(&hdev->pdev->dev, "received reset request from VF enet\n");
-
 	if (hdev->default_reset_request)
 		hdev->reset_level =
 			hclgevf_get_reset_level(&hdev->default_reset_request);
 	else
 		hdev->reset_level = HNAE3_VF_FUNC_RESET;
 
+	dev_info(&hdev->pdev->dev, "received reset request from VF enet, reset level is %d\n",
+		 hdev->reset_level);
+
 	/* reset of this VF requested */
 	set_bit(HCLGEVF_RESET_REQUESTED, &hdev->reset_state);
 	hclgevf_reset_task_schedule(hdev);
@@ -1691,8 +1705,20 @@ static void hclgevf_reset_event(struct pci_dev *pdev,
 static void hclgevf_set_def_reset_request(struct hnae3_ae_dev *ae_dev,
 					  enum hnae3_reset_type rst_type)
 {
+#define HCLGEVF_SUPPORT_RESET_TYPE \
+	(BIT(HNAE3_VF_RESET) | BIT(HNAE3_VF_FUNC_RESET) | \
+	BIT(HNAE3_VF_PF_FUNC_RESET) | BIT(HNAE3_VF_FULL_RESET) | \
+	BIT(HNAE3_FLR_RESET) | BIT(HNAE3_VF_EXP_RESET))
+
 	struct hclgevf_dev *hdev = ae_dev->priv;
 
+	if (!(BIT(rst_type) & HCLGEVF_SUPPORT_RESET_TYPE)) {
+		/* To prevent reset triggered by hclge_reset_event */
+		set_bit(HNAE3_NONE_RESET, &hdev->default_reset_request);
+		dev_info(&hdev->pdev->dev, "unsupported reset type %d\n",
+			 rst_type);
+		return;
+	}
 	set_bit(rst_type, &hdev->default_reset_request);
 }
 
@@ -1849,14 +1875,14 @@ static void hclgevf_reset_service_task(struct hclgevf_dev *hdev)
 		 */
 		if (hdev->reset_attempts > HCLGEVF_MAX_RESET_ATTEMPTS_CNT) {
 			/* prepare for full reset of stack + pcie interface */
-			set_bit(HNAE3_VF_FULL_RESET, &hdev->reset_pending);
+			hclgevf_set_reset_pending(hdev, HNAE3_VF_FULL_RESET);
 
 			/* "defer" schedule the reset task again */
 			set_bit(HCLGEVF_RESET_PENDING, &hdev->reset_state);
 		} else {
 			hdev->reset_attempts++;
 
-			set_bit(hdev->reset_level, &hdev->reset_pending);
+			hclgevf_set_reset_pending(hdev, hdev->reset_level);
 			set_bit(HCLGEVF_RESET_PENDING, &hdev->reset_state);
 		}
 		hclgevf_reset_task_schedule(hdev);
@@ -1979,7 +2005,7 @@ static enum hclgevf_evt_cause hclgevf_check_evt_cause(struct hclgevf_dev *hdev,
 		rst_ing_reg = hclgevf_read_dev(&hdev->hw, HCLGEVF_RST_ING);
 		dev_info(&hdev->pdev->dev,
 			 "receive reset interrupt 0x%x!\n", rst_ing_reg);
-		set_bit(HNAE3_VF_RESET, &hdev->reset_pending);
+		hclgevf_set_reset_pending(hdev, HNAE3_VF_RESET);
 		set_bit(HCLGEVF_RESET_PENDING, &hdev->reset_state);
 		set_bit(HCLGE_COMM_STATE_CMD_DISABLE, &hdev->hw.hw.comm_state);
 		*clearval = ~(1U << HCLGEVF_VECTOR0_RST_INT_B);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RESEND net 2/7] net: hns3: fix missing features due to dev->features configuration too early
  2024-11-07 13:30 [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Jijie Shao
  2024-11-07 13:30 ` [PATCH RESEND net 1/7] net: hns3: fixed reset failure issues caused by the incorrect reset type Jijie Shao
@ 2024-11-07 13:30 ` Jijie Shao
  2024-11-07 13:30 ` [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent Jijie Shao
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Jijie Shao @ 2024-11-07 13:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, salil.mehta, liuyonglong, wangpeiyang1, shaojijie,
	chenhao418, netdev, linux-kernel

From: Hao Lan <lanhao@huawei.com>

Currently, the netdev->features is configured in hns3_nic_set_features.
As a result, __netdev_update_features considers that there is no feature
difference, and the procedures of the real features are missing.

Fixes: 2a7556bb2b73 ("net: hns3: implement ndo_features_check ops for hns3 driver")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 4cbc4d069a1f..73825b6bd485 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2452,7 +2452,6 @@ static int hns3_nic_set_features(struct net_device *netdev,
 			return ret;
 	}
 
-	netdev->features = features;
 	return 0;
 }
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
  2024-11-07 13:30 [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Jijie Shao
  2024-11-07 13:30 ` [PATCH RESEND net 1/7] net: hns3: fixed reset failure issues caused by the incorrect reset type Jijie Shao
  2024-11-07 13:30 ` [PATCH RESEND net 2/7] net: hns3: fix missing features due to dev->features configuration too early Jijie Shao
@ 2024-11-07 13:30 ` Jijie Shao
  2024-11-12  1:25   ` Jakub Kicinski
  2024-11-07 13:30 ` [PATCH RESEND net 4/7] net: hns3: don't auto enable misc vector Jijie Shao
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Jijie Shao @ 2024-11-07 13:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, salil.mehta, liuyonglong, wangpeiyang1, shaojijie,
	chenhao418, netdev, linux-kernel

From: Hao Lan <lanhao@huawei.com>

This patch modifies the implementation of debugfs:
When the user process stops unexpectedly, not all data of the file system
is read. In this case, the save_buf pointer is not released. When the user
process is called next time, save_buf is used to copy the cached data
to the user space. As a result, the queried data is inconsistent. To solve
this problem, determine whether the function is invoked for the first time
based on the value of *ppos. If *ppos is 0, obtain the actual data.

Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Guangwei Zhang <zhangwangwei6@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
index 807eb3bbb11c..841e5af7b2be 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
@@ -1293,8 +1293,10 @@ static ssize_t hns3_dbg_read(struct file *filp, char __user *buffer,
 
 		/* save the buffer addr until the last read operation */
 		*save_buf = read_buf;
+	}
 
-		/* get data ready for the first time to read */
+	/* get data ready for the first time to read */
+	if (!*ppos) {
 		ret = hns3_dbg_read_cmd(dbg_data, hns3_dbg_cmd[index].cmd,
 					read_buf, hns3_dbg_cmd[index].buf_len);
 		if (ret)
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RESEND net 4/7] net: hns3: don't auto enable misc vector
  2024-11-07 13:30 [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Jijie Shao
                   ` (2 preceding siblings ...)
  2024-11-07 13:30 ` [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent Jijie Shao
@ 2024-11-07 13:30 ` Jijie Shao
  2024-11-07 13:30 ` [PATCH RESEND net 5/7] net: hns3: initialize reset_timer before hclgevf_misc_irq_init() Jijie Shao
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Jijie Shao @ 2024-11-07 13:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, salil.mehta, liuyonglong, wangpeiyang1, shaojijie,
	chenhao418, netdev, linux-kernel

From: Jian Shen <shenjian15@huawei.com>

Currently, there is a time window between misc irq enabled
and service task inited. If an interrupte is reported at
this time, it will cause warning like below:

[   16.324639] Call trace:
[   16.324641]  __queue_delayed_work+0xb8/0xe0
[   16.324643]  mod_delayed_work_on+0x78/0xd0
[   16.324655]  hclge_errhand_task_schedule+0x58/0x90 [hclge]
[   16.324662]  hclge_misc_irq_handle+0x168/0x240 [hclge]
[   16.324666]  __handle_irq_event_percpu+0x64/0x1e0
[   16.324667]  handle_irq_event+0x80/0x170
[   16.324670]  handle_fasteoi_edge_irq+0x110/0x2bc
[   16.324671]  __handle_domain_irq+0x84/0xfc
[   16.324673]  gic_handle_irq+0x88/0x2c0
[   16.324674]  el1_irq+0xb8/0x140
[   16.324677]  arch_cpu_idle+0x18/0x40
[   16.324679]  default_idle_call+0x5c/0x1bc
[   16.324682]  cpuidle_idle_call+0x18c/0x1c4
[   16.324684]  do_idle+0x174/0x17c
[   16.324685]  cpu_startup_entry+0x30/0x6c
[   16.324687]  secondary_start_kernel+0x1a4/0x280
[   16.324688] ---[ end trace 6aa0bff672a964aa ]---

So don't auto enable misc vector when request irq..

Fixes: 7be1b9f3e99f ("net: hns3: make hclge_service use delayed workqueue")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c  | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 35c618c794be..9a67fe0554a5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -6,6 +6,7 @@
 #include <linux/etherdevice.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/netdevice.h>
@@ -3780,7 +3781,7 @@ static int hclge_misc_irq_init(struct hclge_dev *hdev)
 	snprintf(hdev->misc_vector.name, HNAE3_INT_NAME_LEN, "%s-misc-%s",
 		 HCLGE_NAME, pci_name(hdev->pdev));
 	ret = request_irq(hdev->misc_vector.vector_irq, hclge_misc_irq_handle,
-			  0, hdev->misc_vector.name, hdev);
+			  IRQF_NO_AUTOEN, hdev->misc_vector.name, hdev);
 	if (ret) {
 		hclge_free_vector(hdev, 0);
 		dev_err(&hdev->pdev->dev, "request misc irq(%d) fail\n",
@@ -11916,9 +11917,6 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 
 	hclge_init_rxd_adv_layout(hdev);
 
-	/* Enable MISC vector(vector0) */
-	hclge_enable_vector(&hdev->misc_vector, true);
-
 	ret = hclge_init_wol(hdev);
 	if (ret)
 		dev_warn(&pdev->dev,
@@ -11931,6 +11929,10 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 	hclge_state_init(hdev);
 	hdev->last_reset_time = jiffies;
 
+	/* Enable MISC vector(vector0) */
+	enable_irq(hdev->misc_vector.vector_irq);
+	hclge_enable_vector(&hdev->misc_vector, true);
+
 	dev_info(&hdev->pdev->dev, "%s driver initialization finished.\n",
 		 HCLGE_DRIVER_NAME);
 
@@ -12336,7 +12338,7 @@ static void hclge_uninit_ae_dev(struct hnae3_ae_dev *ae_dev)
 
 	/* Disable MISC vector(vector0) */
 	hclge_enable_vector(&hdev->misc_vector, false);
-	synchronize_irq(hdev->misc_vector.vector_irq);
+	disable_irq(hdev->misc_vector.vector_irq);
 
 	/* Disable all hw interrupts */
 	hclge_config_mac_tnl_int(hdev, false);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RESEND net 5/7] net: hns3: initialize reset_timer before hclgevf_misc_irq_init()
  2024-11-07 13:30 [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Jijie Shao
                   ` (3 preceding siblings ...)
  2024-11-07 13:30 ` [PATCH RESEND net 4/7] net: hns3: don't auto enable misc vector Jijie Shao
@ 2024-11-07 13:30 ` Jijie Shao
  2024-11-07 13:30 ` [PATCH RESEND net 6/7] net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue Jijie Shao
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Jijie Shao @ 2024-11-07 13:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, salil.mehta, liuyonglong, wangpeiyang1, shaojijie,
	chenhao418, netdev, linux-kernel

From: Jian Shen <shenjian15@huawei.com>

Currently the misc irq is initialized before reset_timer setup. But
it will access the reset_timer in the irq handler. So initialize
the reset_timer earlier.

Fixes: ff200099d271 ("net: hns3: remove unnecessary work in hclgevf_main")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index ab54e6155e93..896f1eb172d3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -2315,6 +2315,7 @@ static void hclgevf_state_init(struct hclgevf_dev *hdev)
 	clear_bit(HCLGEVF_STATE_RST_FAIL, &hdev->state);
 
 	INIT_DELAYED_WORK(&hdev->service_task, hclgevf_service_task);
+	timer_setup(&hdev->reset_timer, hclgevf_reset_timer, 0);
 
 	mutex_init(&hdev->mbx_resp.mbx_mutex);
 	sema_init(&hdev->reset_sem, 1);
@@ -3014,7 +3015,6 @@ static int hclgevf_init_hdev(struct hclgevf_dev *hdev)
 		 HCLGEVF_DRIVER_NAME);
 
 	hclgevf_task_schedule(hdev, round_jiffies_relative(HZ));
-	timer_setup(&hdev->reset_timer, hclgevf_reset_timer, 0);
 
 	return 0;
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RESEND net 6/7] net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue
  2024-11-07 13:30 [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Jijie Shao
                   ` (4 preceding siblings ...)
  2024-11-07 13:30 ` [PATCH RESEND net 5/7] net: hns3: initialize reset_timer before hclgevf_misc_irq_init() Jijie Shao
@ 2024-11-07 13:30 ` Jijie Shao
  2024-11-07 13:30 ` [PATCH RESEND net 7/7] net: hns3: fix kernel crash when 1588 is sent on HIP08 devices Jijie Shao
  2024-11-12 17:08 ` [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Simon Horman
  7 siblings, 0 replies; 17+ messages in thread
From: Jijie Shao @ 2024-11-07 13:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, salil.mehta, liuyonglong, wangpeiyang1, shaojijie,
	chenhao418, netdev, linux-kernel

From: Hao Lan <lanhao@huawei.com>

The TQP BAR space is divided into two segments. TQPs 0-1023 and TQPs
1024-1279 are in different BAR space addresses. However,
hclge_fetch_pf_reg does not distinguish the tqp space information when
reading the tqp space information. When the number of TQPs is greater
than 1024, access bar space overwriting occurs.
The problem of different segments has been considered during the
initialization of tqp.io_base. Therefore, tqp.io_base is directly used
when the queue is read in hclge_fetch_pf_reg.

The error message:

Unable to handle kernel paging request at virtual address ffff800037200000
pc : hclge_fetch_pf_reg+0x138/0x250 [hclge]
lr : hclge_get_regs+0x84/0x1d0 [hclge]
Call trace:
 hclge_fetch_pf_reg+0x138/0x250 [hclge]
 hclge_get_regs+0x84/0x1d0 [hclge]
 hns3_get_regs+0x2c/0x50 [hns3]
 ethtool_get_regs+0xf4/0x270
 dev_ethtool+0x674/0x8a0
 dev_ioctl+0x270/0x36c
 sock_do_ioctl+0x110/0x2a0
 sock_ioctl+0x2ac/0x530
 __arm64_sys_ioctl+0xa8/0x100
 invoke_syscall+0x4c/0x124
 el0_svc_common.constprop.0+0x140/0x15c
 do_el0_svc+0x30/0xd0
 el0_svc+0x1c/0x2c
 el0_sync_handler+0xb0/0xb4
 el0_sync+0x168/0x180

Fixes: 939ccd107ffc ("net: hns3: move dump regs function to a separate file")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_regs.c  | 9 +++++----
 .../net/ethernet/hisilicon/hns3/hns3vf/hclgevf_regs.c    | 9 +++++----
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_regs.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_regs.c
index 43c1c18fa81f..8c057192aae6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_regs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_regs.c
@@ -510,9 +510,9 @@ static int hclge_get_dfx_reg(struct hclge_dev *hdev, void *data)
 static int hclge_fetch_pf_reg(struct hclge_dev *hdev, void *data,
 			      struct hnae3_knic_private_info *kinfo)
 {
-#define HCLGE_RING_REG_OFFSET		0x200
 #define HCLGE_RING_INT_REG_OFFSET	0x4
 
+	struct hnae3_queue *tqp;
 	int i, j, reg_num;
 	int data_num_sum;
 	u32 *reg = data;
@@ -533,10 +533,11 @@ static int hclge_fetch_pf_reg(struct hclge_dev *hdev, void *data,
 	reg_num = ARRAY_SIZE(ring_reg_addr_list);
 	for (j = 0; j < kinfo->num_tqps; j++) {
 		reg += hclge_reg_get_tlv(HCLGE_REG_TAG_RING, reg_num, reg);
+		tqp = kinfo->tqp[j];
 		for (i = 0; i < reg_num; i++)
-			*reg++ = hclge_read_dev(&hdev->hw,
-						ring_reg_addr_list[i] +
-						HCLGE_RING_REG_OFFSET * j);
+			*reg++ = readl_relaxed(tqp->io_base -
+					       HCLGE_TQP_REG_OFFSET +
+					       ring_reg_addr_list[i]);
 	}
 	data_num_sum += (reg_num + HCLGE_REG_TLV_SPACE) * kinfo->num_tqps;
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_regs.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_regs.c
index 6db415d8b917..7d9d9dbc7560 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_regs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_regs.c
@@ -123,10 +123,10 @@ int hclgevf_get_regs_len(struct hnae3_handle *handle)
 void hclgevf_get_regs(struct hnae3_handle *handle, u32 *version,
 		      void *data)
 {
-#define HCLGEVF_RING_REG_OFFSET		0x200
 #define HCLGEVF_RING_INT_REG_OFFSET	0x4
 
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+	struct hnae3_queue *tqp;
 	int i, j, reg_um;
 	u32 *reg = data;
 
@@ -147,10 +147,11 @@ void hclgevf_get_regs(struct hnae3_handle *handle, u32 *version,
 	reg_um = ARRAY_SIZE(ring_reg_addr_list);
 	for (j = 0; j < hdev->num_tqps; j++) {
 		reg += hclgevf_reg_get_tlv(HCLGEVF_REG_TAG_RING, reg_um, reg);
+		tqp = &hdev->htqp[j].q;
 		for (i = 0; i < reg_um; i++)
-			*reg++ = hclgevf_read_dev(&hdev->hw,
-						  ring_reg_addr_list[i] +
-						  HCLGEVF_RING_REG_OFFSET * j);
+			*reg++ = readl_relaxed(tqp->io_base -
+					       HCLGEVF_TQP_REG_OFFSET +
+					       ring_reg_addr_list[i]);
 	}
 
 	reg_um = ARRAY_SIZE(tqp_intr_reg_addr_list);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RESEND net 7/7] net: hns3: fix kernel crash when 1588 is sent on HIP08 devices
  2024-11-07 13:30 [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Jijie Shao
                   ` (5 preceding siblings ...)
  2024-11-07 13:30 ` [PATCH RESEND net 6/7] net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue Jijie Shao
@ 2024-11-07 13:30 ` Jijie Shao
  2024-11-12 17:08 ` [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Simon Horman
  7 siblings, 0 replies; 17+ messages in thread
From: Jijie Shao @ 2024-11-07 13:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, salil.mehta, liuyonglong, wangpeiyang1, shaojijie,
	chenhao418, netdev, linux-kernel

From: Jie Wang <wangjie125@huawei.com>

Currently, HIP08 devices does not register the ptp devices, so the
hdev->ptp is NULL. But the tx process would still try to set hardware time
stamp info with SKBTX_HW_TSTAMP flag and cause a kernel crash.

[  128.087798] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
...
[  128.280251] pc : hclge_ptp_set_tx_info+0x2c/0x140 [hclge]
[  128.286600] lr : hclge_ptp_set_tx_info+0x20/0x140 [hclge]
[  128.292938] sp : ffff800059b93140
[  128.297200] x29: ffff800059b93140 x28: 0000000000003280
[  128.303455] x27: ffff800020d48280 x26: ffff0cb9dc814080
[  128.309715] x25: ffff0cb9cde93fa0 x24: 0000000000000001
[  128.315969] x23: 0000000000000000 x22: 0000000000000194
[  128.322219] x21: ffff0cd94f986000 x20: 0000000000000000
[  128.328462] x19: ffff0cb9d2a166c0 x18: 0000000000000000
[  128.334698] x17: 0000000000000000 x16: ffffcf1fc523ed24
[  128.340934] x15: 0000ffffd530a518 x14: 0000000000000000
[  128.347162] x13: ffff0cd6bdb31310 x12: 0000000000000368
[  128.353388] x11: ffff0cb9cfbc7070 x10: ffff2cf55dd11e02
[  128.359606] x9 : ffffcf1f85a212b4 x8 : ffff0cd7cf27dab0
[  128.365831] x7 : 0000000000000a20 x6 : ffff0cd7cf27d000
[  128.372040] x5 : 0000000000000000 x4 : 000000000000ffff
[  128.378243] x3 : 0000000000000400 x2 : ffffcf1f85a21294
[  128.384437] x1 : ffff0cb9db520080 x0 : ffff0cb9db500080
[  128.390626] Call trace:
[  128.393964]  hclge_ptp_set_tx_info+0x2c/0x140 [hclge]
[  128.399893]  hns3_nic_net_xmit+0x39c/0x4c4 [hns3]
[  128.405468]  xmit_one.constprop.0+0xc4/0x200
[  128.410600]  dev_hard_start_xmit+0x54/0xf0
[  128.415556]  sch_direct_xmit+0xe8/0x634
[  128.420246]  __dev_queue_xmit+0x224/0xc70
[  128.425101]  dev_queue_xmit+0x1c/0x40
[  128.429608]  ovs_vport_send+0xac/0x1a0 [openvswitch]
[  128.435409]  do_output+0x60/0x17c [openvswitch]
[  128.440770]  do_execute_actions+0x898/0x8c4 [openvswitch]
[  128.446993]  ovs_execute_actions+0x64/0xf0 [openvswitch]
[  128.453129]  ovs_dp_process_packet+0xa0/0x224 [openvswitch]
[  128.459530]  ovs_vport_receive+0x7c/0xfc [openvswitch]
[  128.465497]  internal_dev_xmit+0x34/0xb0 [openvswitch]
[  128.471460]  xmit_one.constprop.0+0xc4/0x200
[  128.476561]  dev_hard_start_xmit+0x54/0xf0
[  128.481489]  __dev_queue_xmit+0x968/0xc70
[  128.486330]  dev_queue_xmit+0x1c/0x40
[  128.490856]  ip_finish_output2+0x250/0x570
[  128.495810]  __ip_finish_output+0x170/0x1e0
[  128.500832]  ip_finish_output+0x3c/0xf0
[  128.505504]  ip_output+0xbc/0x160
[  128.509654]  ip_send_skb+0x58/0xd4
[  128.513892]  udp_send_skb+0x12c/0x354
[  128.518387]  udp_sendmsg+0x7a8/0x9c0
[  128.522793]  inet_sendmsg+0x4c/0x8c
[  128.527116]  __sock_sendmsg+0x48/0x80
[  128.531609]  __sys_sendto+0x124/0x164
[  128.536099]  __arm64_sys_sendto+0x30/0x5c
[  128.540935]  invoke_syscall+0x50/0x130
[  128.545508]  el0_svc_common.constprop.0+0x10c/0x124
[  128.551205]  do_el0_svc+0x34/0xdc
[  128.555347]  el0_svc+0x20/0x30
[  128.559227]  el0_sync_handler+0xb8/0xc0
[  128.563883]  el0_sync+0x160/0x180

Fixes: 0bf5eb788512 ("net: hns3: add support for PTP")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_ptp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_ptp.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_ptp.c
index 5505caea88e9..bab16c2191b2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_ptp.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_ptp.c
@@ -58,6 +58,9 @@ bool hclge_ptp_set_tx_info(struct hnae3_handle *handle, struct sk_buff *skb)
 	struct hclge_dev *hdev = vport->back;
 	struct hclge_ptp *ptp = hdev->ptp;
 
+	if (!ptp)
+		return false;
+
 	if (!test_bit(HCLGE_PTP_FLAG_TX_EN, &ptp->flags) ||
 	    test_and_set_bit(HCLGE_STATE_PTP_TX_HANDLING, &hdev->state)) {
 		ptp->tx_skipped++;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
  2024-11-07 13:30 ` [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent Jijie Shao
@ 2024-11-12  1:25   ` Jakub Kicinski
  2024-11-13  5:59     ` Jijie Shao
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-11-12  1:25 UTC (permalink / raw)
  To: Jijie Shao
  Cc: davem, edumazet, pabeni, andrew+netdev, horms, shenjian15,
	salil.mehta, liuyonglong, wangpeiyang1, chenhao418, netdev,
	linux-kernel

On Thu, 7 Nov 2024 21:30:19 +0800 Jijie Shao wrote:
> Subject: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
> Date: Thu, 7 Nov 2024 21:30:19 +0800
> X-Mailer: git-send-email 2.30.0
> 
> From: Hao Lan <lanhao@huawei.com>
> 
> This patch modifies the implementation of debugfs:
> When the user process stops unexpectedly, not all data of the file system
> is read. In this case, the save_buf pointer is not released. When the user
> process is called next time, save_buf is used to copy the cached data
> to the user space. As a result, the queried data is inconsistent. To solve
> this problem, determine whether the function is invoked for the first time
> based on the value of *ppos. If *ppos is 0, obtain the actual data.

What do you mean by "inconsistent" ?
What if two processes read the file at once?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver
  2024-11-07 13:30 [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Jijie Shao
                   ` (6 preceding siblings ...)
  2024-11-07 13:30 ` [PATCH RESEND net 7/7] net: hns3: fix kernel crash when 1588 is sent on HIP08 devices Jijie Shao
@ 2024-11-12 17:08 ` Simon Horman
  2024-11-13  3:21   ` Jijie Shao
  7 siblings, 1 reply; 17+ messages in thread
From: Simon Horman @ 2024-11-12 17:08 UTC (permalink / raw)
  To: Jijie Shao
  Cc: davem, edumazet, kuba, pabeni, andrew+netdev, shenjian15,
	salil.mehta, liuyonglong, wangpeiyang1, chenhao418, netdev,
	linux-kernel

On Thu, Nov 07, 2024 at 09:30:16PM +0800, Jijie Shao wrote:
> There's a series of bugfix that's been accepted:
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=d80a3091308491455b6501b1c4b68698c4a7cd24
> 
> However, The series is making the driver poke into IOMMU internals instead of
> implementing appropriate IOMMU workarounds. After discussion, the series was reverted:
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=249cfa318fb1b77eb726c2ff4f74c9685f04e568
> 
> But only two patches are related to the IOMMU.
> Other patches involve only the modification of the driver.
> This series resends other patches.

Hi Jijie Shao,

Cover letters for patch-sets for Networking do make it into git history,
e.g. This cover letter [1] became this commit [2]. So please consider a
subject that will be more meaningful there.

e.g. [PATCH net v2 0/7] net: hns3: implement IOMMU workarounds

Thanks!

[1] [PATCH net v2 0/3] virtio/vsock: Fix memory leaks
    https://lore.kernel.org/netdev/20241107-vsock-mem-leaks-v2-0-4e21bfcfc818@rbox.co/
[2] 20bbe5b80249 ("Merge branch 'virtio-vsock-fix-memory-leaks'")
    https://git.kernel.org/netdev/net/c/20bbe5b80249

...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver
  2024-11-12 17:08 ` [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Simon Horman
@ 2024-11-13  3:21   ` Jijie Shao
  0 siblings, 0 replies; 17+ messages in thread
From: Jijie Shao @ 2024-11-13  3:21 UTC (permalink / raw)
  To: Simon Horman
  Cc: shaojijie, davem, edumazet, kuba, pabeni, andrew+netdev,
	shenjian15, salil.mehta, liuyonglong, wangpeiyang1, chenhao418,
	netdev, linux-kernel


on 2024/11/13 1:08, Simon Horman wrote:
> On Thu, Nov 07, 2024 at 09:30:16PM +0800, Jijie Shao wrote:
>> There's a series of bugfix that's been accepted:
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=d80a3091308491455b6501b1c4b68698c4a7cd24
>>
>> However, The series is making the driver poke into IOMMU internals instead of
>> implementing appropriate IOMMU workarounds. After discussion, the series was reverted:
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=249cfa318fb1b77eb726c2ff4f74c9685f04e568
>>
>> But only two patches are related to the IOMMU.
>> Other patches involve only the modification of the driver.
>> This series resends other patches.
> Hi Jijie Shao,
>
> Cover letters for patch-sets for Networking do make it into git history,
> e.g. This cover letter [1] became this commit [2]. So please consider a
> subject that will be more meaningful there.
>
> e.g. [PATCH net v2 0/7] net: hns3: implement IOMMU workarounds


Okay, no problem. I'll pay more attention to this.

Thanks
Jijie Shao

>
> Thanks!
>
> [1] [PATCH net v2 0/3] virtio/vsock: Fix memory leaks
>      https://lore.kernel.org/netdev/20241107-vsock-mem-leaks-v2-0-4e21bfcfc818@rbox.co/
> [2] 20bbe5b80249 ("Merge branch 'virtio-vsock-fix-memory-leaks'")
>      https://git.kernel.org/netdev/net/c/20bbe5b80249
>
> ...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
  2024-11-12  1:25   ` Jakub Kicinski
@ 2024-11-13  5:59     ` Jijie Shao
  2024-11-14  0:31       ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Jijie Shao @ 2024-11-13  5:59 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: shaojijie, davem, edumazet, pabeni, andrew+netdev, horms,
	shenjian15, salil.mehta, liuyonglong, wangpeiyang1, chenhao418,
	netdev, linux-kernel


on 2024/11/12 9:25, Jakub Kicinski wrote:
> On Thu, 7 Nov 2024 21:30:19 +0800 Jijie Shao wrote:
>> Subject: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
>> Date: Thu, 7 Nov 2024 21:30:19 +0800
>> X-Mailer: git-send-email 2.30.0
>>
>> From: Hao Lan <lanhao@huawei.com>
>>
>> This patch modifies the implementation of debugfs:
>> When the user process stops unexpectedly, not all data of the file system
>> is read. In this case, the save_buf pointer is not released. When the user
>> process is called next time, save_buf is used to copy the cached data
>> to the user space. As a result, the queried data is inconsistent. To solve
>> this problem, determine whether the function is invoked for the first time
>> based on the value of *ppos. If *ppos is 0, obtain the actual data.
> What do you mean by "inconsistent" ?
> What if two processes read the file at once?

inconsistent:
Before this modification,
if the previous read operation is stopped before complete, the buffer is not released.
In the next read operation (perhaps after a long time), the driver does not read again.
Instead, the driver returns the bufffer content, which causes outdated data to be obtained.
As a result, the obtained data is inconsistent with the actual data.

In this patch, ppos is used to determine whether a new read operation is performed.
If yes, the driver updates the data in the buffer to ensure that the queried data is fresh.
But, if two processes read the same file at once, The read operation that ends first releases the buffer.
As a result, the other read operation re-alloc buffer memory. However, because the value of ppos is not 0,
the data is not updated again. As a result, the queried data is truncated.

This is a bug and I will fix it in the next version.

Thanks
Jijie Shao




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
  2024-11-13  5:59     ` Jijie Shao
@ 2024-11-14  0:31       ` Jakub Kicinski
  2024-12-09 14:14         ` Jijie Shao
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-11-14  0:31 UTC (permalink / raw)
  To: Jijie Shao
  Cc: davem, edumazet, pabeni, andrew+netdev, horms, shenjian15,
	salil.mehta, liuyonglong, wangpeiyang1, chenhao418, netdev,
	linux-kernel

On Wed, 13 Nov 2024 13:59:32 +0800 Jijie Shao wrote:
> inconsistent:
> Before this modification,
> if the previous read operation is stopped before complete, the buffer is not released.
> In the next read operation (perhaps after a long time), the driver does not read again.
> Instead, the driver returns the bufffer content, which causes outdated data to be obtained.
> As a result, the obtained data is inconsistent with the actual data.

I think the word "stale" would fit the situation better.

> In this patch, ppos is used to determine whether a new read operation is performed.
> If yes, the driver updates the data in the buffer to ensure that the queried data is fresh.
> But, if two processes read the same file at once, The read operation that ends first releases the buffer.
> As a result, the other read operation re-alloc buffer memory. However, because the value of ppos is not 0,
> the data is not updated again. As a result, the queried data is truncated.
> 
> This is a bug and I will fix it in the next version.

Let's say two reads are necessary to read the data:

 reader A                  reader B
  read()
   - alloc
   - hns3_dbg_read_cmd()
                           read()
                           read()
                           read(EOF) 
                            - free
  read()
   - alloc
   - hns3_dbg_read_cmd()
  read(EOF) 
   - free

The data for read A is half from one hns3_dbg_read_cmd() and half from
another. Does it not cause any actual inconsistency?

Also, just to be sure, it's not possible to lseek on these files, right?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
  2024-11-14  0:31       ` Jakub Kicinski
@ 2024-12-09 14:14         ` Jijie Shao
  2024-12-09 21:13           ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Jijie Shao @ 2024-12-09 14:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: shaojijie, davem, edumazet, pabeni, andrew+netdev, horms,
	shenjian15, salil.mehta, liuyonglong, wangpeiyang1, chenhao418,
	netdev, linux-kernel


on 2024/11/14 8:31, Jakub Kicinski wrote:
> On Wed, 13 Nov 2024 13:59:32 +0800 Jijie Shao wrote:
>> inconsistent:
>> Before this modification,
>> if the previous read operation is stopped before complete, the buffer is not released.
>> In the next read operation (perhaps after a long time), the driver does not read again.
>> Instead, the driver returns the bufffer content, which causes outdated data to be obtained.
>> As a result, the obtained data is inconsistent with the actual data.
> I think the word "stale" would fit the situation better.
>
>> In this patch, ppos is used to determine whether a new read operation is performed.
>> If yes, the driver updates the data in the buffer to ensure that the queried data is fresh.
>> But, if two processes read the same file at once, The read operation that ends first releases the buffer.
>> As a result, the other read operation re-alloc buffer memory. However, because the value of ppos is not 0,
>> the data is not updated again. As a result, the queried data is truncated.
>>
>> This is a bug and I will fix it in the next version.
> Let's say two reads are necessary to read the data:
>
>   reader A                  reader B
>    read()
>     - alloc
>     - hns3_dbg_read_cmd()
>                             read()
>                             read()
>                             read(EOF)
>                              - free
>    read()
>     - alloc
>     - hns3_dbg_read_cmd()
>    read(EOF)
>     - free
>
> The data for read A is half from one hns3_dbg_read_cmd() and half from
> another. Does it not cause any actual inconsistency?
>
> Also, just to be sure, it's not possible to lseek on these files, right?

The patch of this version has the following problems:
  reader A                       reader B
   read()
    - alloc
    - *ppos == 0
     - hns3_dbg_read_cmd()
                                 read()
                                 read()
                                 read(EOF)
                                  - free
   read()
    - alloc
    - *ppos != 0
   read(EOF)
    - free
    
if reader B free the buffer, reader A will alloc a new buffer again,
but *ppos != 0, so hns3_dbg_read_cmd() will not be called.
So reader A cannot get the left data.

I plan to introduce the "need_update" variable in the next version,
with the default value is false. Run the alloc command to change the value to true:
  reader A                           reader B
   read()
    - need_update = false
    - alloc
       - need_update = true
    - *ppos == 0 || need_update
       - hns3_dbg_read_cmd()
                                     read()
                                     read()
                                     read(EOF)
                                     - free
   read()
    - alloc
      - need_update = true
    - *ppos == 0 || need_update
      - hns3_dbg_read_cmd()
   read(EOF)
    - free
So,  after reader A alloc a new buffer again, need_update will set to true.
hns3_dbg_read_cmd() will be called again to update data.

But there's still a problem:
The data for reader A is half from one hns3_dbg_read_cmd() and half from another.
However, due to the short interval between calls to hns3_dbg_read_cmd(),
and the fact that users do little to do like so, this problem is relatively acceptable.

We're also trying to fix the problem completely.
For example, an independent buffer is allocated for each reader:
  reader A                   reader B
   read()                    read()
   - alloc                   - alloc
     - hns3_dbg_read_cmd()    -hns3_dbg_read_cmd()
   read()                    read()
   read()                    read()
   read(EOF)                 read(EOF)
   - free                    - free
  
But, driver needs to maintain the mapping between the buffer and file.
And if the reader exits before read EOF, a large amount of memory is not free:
  reader
   read()
   - alloc
     - hns3_dbg_read_cmd()
   read()
   read()
   == reader exit ==
Maybe it's not a good way

As for lseek, driver needs to call lseek at the right time to reread the data.
  reader A                           reader B
    read()
    alloc
    hns3_dbg_read_cmd()
    *ppos == 0
    read()
                                     read()
                                     read()
                                     read(EOF)
                                     - free
    alloc
    hns3_dbg_read_cmd()
    - *ppos != 0
    - lseek()
    - *ppos == 0
    reread()
    read()
    read(EOF)
    free

I can't find any examples of similar implementations.
I'm not sure if there's a suitable lseek interface that the driver can call directly.


Another way is seq_file, which may be a solution,
as far as I know, each seq_file has a separate buffer and can be expanded automatically.
So it might be possible to solve the problem
But even if the solution is feasible, this will require a major refactoring of hns3 debugfs

Thanks
Jijie Shao



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
  2024-12-09 14:14         ` Jijie Shao
@ 2024-12-09 21:13           ` Jakub Kicinski
  2024-12-13 13:11             ` Jijie Shao
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-12-09 21:13 UTC (permalink / raw)
  To: Jijie Shao
  Cc: davem, edumazet, pabeni, andrew+netdev, horms, shenjian15,
	salil.mehta, liuyonglong, wangpeiyang1, chenhao418, netdev,
	linux-kernel

On Mon, 9 Dec 2024 22:14:37 +0800 Jijie Shao wrote:
> Another way is seq_file, which may be a solution,
> as far as I know, each seq_file has a separate buffer and can be expanded automatically.
> So it might be possible to solve the problem
> But even if the solution is feasible, this will require a major refactoring of hns3 debugfs

seq_file is generally used for text output

can you not hook in the allocation and execution of the cmd into the
.open handler and freeing in to the .close handler? You already use
explicit file_ops for this file.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
  2024-12-09 21:13           ` Jakub Kicinski
@ 2024-12-13 13:11             ` Jijie Shao
  2024-12-13 14:10               ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Jijie Shao @ 2024-12-13 13:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: shaojijie, davem, edumazet, pabeni, andrew+netdev, horms,
	shenjian15, salil.mehta, liuyonglong, wangpeiyang1, chenhao418,
	netdev, linux-kernel


on 2024/12/10 5:13, Jakub Kicinski wrote:
> On Mon, 9 Dec 2024 22:14:37 +0800 Jijie Shao wrote:
>> Another way is seq_file, which may be a solution,
>> as far as I know, each seq_file has a separate buffer and can be expanded automatically.
>> So it might be possible to solve the problem
>> But even if the solution is feasible, this will require a major refactoring of hns3 debugfs
> seq_file is generally used for text output
>
> can you not hook in the allocation and execution of the cmd into the
> .open handler and freeing in to the .close handler? You already use
> explicit file_ops for this file.


Thank you very much for your advice.

When I modified the code according to your comments,
I found that the problem mentioned in this path can be solved.
I implement .open() and .release() handler for debugfs file_operations,
Move allocation buffer and execution of the cmd to the .open() handler
and freeing in to the .release() handler.
Also allocate separate buffer for each reader and associate the buffer with the file pointer.
In this case, there is no shared buffer, which causes data inconsistency.

However, a new problem is introduced:
If the framework does not call .release() for some reason, the buffer cannot be freed, causing memory leakage.
Maybe it's acceptable?


  static ssize_t hns3_dbg_read(struct file *filp, char __user *buffer,
  			     size_t count, loff_t *ppos)
  {
-	struct hns3_dbg_data *dbg_data = filp->private_data;
+	char *buf = filp->private_data;
+
+	return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
+}
+
+static int hns3_dbg_open(struct inode *inode, struct file *filp)
+{
+	struct hns3_dbg_data *dbg_data = inode->i_private;
  	struct hnae3_handle *handle = dbg_data->handle;
  	struct hns3_nic_priv *priv = handle->priv;
-	ssize_t size = 0;
-	char **save_buf;
-	char *read_buf;
  	u32 index;
+	char *buf;
  	int ret;
  
+	if (!test_bit(HNS3_NIC_STATE_INITED, &priv->state) ||
+	    test_bit(HNS3_NIC_STATE_RESETTING, &priv->state))
+		return -EBUSY;
+
  	ret = hns3_dbg_get_cmd_index(dbg_data, &index);
  	if (ret)
  		return ret;
  
-	mutex_lock(&handle->dbgfs_lock);
-	save_buf = &handle->dbgfs_buf[index];
-
-	if (!test_bit(HNS3_NIC_STATE_INITED, &priv->state) ||
-	    test_bit(HNS3_NIC_STATE_RESETTING, &priv->state)) {
-		ret = -EBUSY;
-		goto out;
-	}
-
-	if (*save_buf) {
-		read_buf = *save_buf;
-	} else {
-		read_buf = kvzalloc(hns3_dbg_cmd[index].buf_len, GFP_KERNEL);
-		if (!read_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
-
-		/* save the buffer addr until the last read operation */
-		*save_buf = read_buf;
-
-		/* get data ready for the first time to read */
-		ret = hns3_dbg_read_cmd(dbg_data, hns3_dbg_cmd[index].cmd,
-					read_buf, hns3_dbg_cmd[index].buf_len);
-		if (ret)
-			goto out;
-	}
+	buf = kvzalloc(hns3_dbg_cmd[index].buf_len, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
  
-	size = simple_read_from_buffer(buffer, count, ppos, read_buf,
-				       strlen(read_buf));
-	if (size > 0) {
-		mutex_unlock(&handle->dbgfs_lock);
-		return size;
+	ret = hns3_dbg_read_cmd(dbg_data, hns3_dbg_cmd[index].cmd,
+				buf, hns3_dbg_cmd[index].buf_len);
+	if (ret) {
+		kvfree(buf);
+		return ret;
  	}
  
-out:
-	/* free the buffer for the last read operation */
-	if (*save_buf) {
-		kvfree(*save_buf);
-		*save_buf = NULL;
-	}
+	filp->private_data = buf;
+	return 0;
+}
  
-	mutex_unlock(&handle->dbgfs_lock);
-	return ret;
+static int hns3_dbg_release(struct inode *inode, struct file *filp)
+{
+	kvfree(filp->private_data);
+	filp->private_data = NULL;
+	return 0;
  }
  
  static const struct file_operations hns3_dbg_fops = {
  	.owner = THIS_MODULE,
-	.open  = simple_open,
+	.open  = hns3_dbg_open,
  	.read  = hns3_dbg_read,
+	.release = hns3_dbg_release,
  };



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent.
  2024-12-13 13:11             ` Jijie Shao
@ 2024-12-13 14:10               ` Jakub Kicinski
  0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2024-12-13 14:10 UTC (permalink / raw)
  To: Jijie Shao
  Cc: davem, edumazet, pabeni, andrew+netdev, horms, shenjian15,
	salil.mehta, liuyonglong, wangpeiyang1, chenhao418, netdev,
	linux-kernel

On Fri, 13 Dec 2024 21:11:49 +0800 Jijie Shao wrote:
> If the framework does not call .release() for some reason, the buffer
> cannot be freed, causing memory leakage. Maybe it's acceptable?

Are you sure? How did you test?
Looking at the code debugfs itself uses release in a similar way
(u32_array_fops, for example), so I think it should work.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-12-13 14:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-07 13:30 [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Jijie Shao
2024-11-07 13:30 ` [PATCH RESEND net 1/7] net: hns3: fixed reset failure issues caused by the incorrect reset type Jijie Shao
2024-11-07 13:30 ` [PATCH RESEND net 2/7] net: hns3: fix missing features due to dev->features configuration too early Jijie Shao
2024-11-07 13:30 ` [PATCH RESEND net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent Jijie Shao
2024-11-12  1:25   ` Jakub Kicinski
2024-11-13  5:59     ` Jijie Shao
2024-11-14  0:31       ` Jakub Kicinski
2024-12-09 14:14         ` Jijie Shao
2024-12-09 21:13           ` Jakub Kicinski
2024-12-13 13:11             ` Jijie Shao
2024-12-13 14:10               ` Jakub Kicinski
2024-11-07 13:30 ` [PATCH RESEND net 4/7] net: hns3: don't auto enable misc vector Jijie Shao
2024-11-07 13:30 ` [PATCH RESEND net 5/7] net: hns3: initialize reset_timer before hclgevf_misc_irq_init() Jijie Shao
2024-11-07 13:30 ` [PATCH RESEND net 6/7] net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue Jijie Shao
2024-11-07 13:30 ` [PATCH RESEND net 7/7] net: hns3: fix kernel crash when 1588 is sent on HIP08 devices Jijie Shao
2024-11-12 17:08 ` [PATCH RESEND net 0/7] There are some bugfix for the HNS3 ethernet driver Simon Horman
2024-11-13  3:21   ` Jijie Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).